Onboarding Multiple CSVs¶
Prerequisite:
- Installed Neat, see Installation
- Launched a notebook environment.
- Familiar with the
NeatSession
object, see introduction - Access to
NeatEngine
.
In this tutorial, we will load data from two csv
files, connect the data, infer a data model from the data and push the model with data to CDF.
Reading Metadata¶
We will start by instansiating a NeatSession
and read the data from multiple URLs.
from cognite.neat import NeatSession, get_cognite_client
# Note that we use Oxigraph in this example, this will not work in a CDF notebook
neat = NeatSession(get_cognite_client(".env"), storage="oxigraph")
Found .env file in repository root. Loaded variables from .env file. Neat Engine 2.0.3 loaded.
base_url = "https://apps-cdn.cogniteapp.com/toolkit/publicdata/"
asset = "assets.Table.csv"
activity = "workitem.Table.csv"
neat.read.csv(f"{base_url}{asset}", type="Asset", primary_key="WMT_TAG_GLOBALID")
neat.read.csv(f"{base_url}{activity}", type="Activity", primary_key="sourceId")
neat
Instances
Overview:
- 1 named graphs
- Total of 2 unique types
- 14105 instances in default graph
default graph:
Type | Occurrence | |
---|---|---|
0 | Activity | 13002 |
1 | Asset | 1103 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to named graph urn:x-rdflib:default using CSVExtractor
- Extracted triples to named graph urn:x-rdflib:default using CSVExtractor
Studying the output above, we see that we succesfully read the assets and activities.
Connecting Data¶
By studying the source data, we notice that the WORKORDER_ITEMNAME
column in the activity CSV is referencing the WMT_TAG_NAME
in the asset CSV.
We use neat to connect the activities to assets, through a new property we call asset
in the activity.
source = ("Activity", "WORKORDER_ITEMNAME")
target = ("Asset", "WMT_TAG_NAME")
connection = "asset"
neat.prepare.instances.make_connection_on_exact_match(source, target, connection, limit=None)
Adds property that contains id of reference to all references of given class in Rules: 100%|█████████████████████████████████████████████████████████████████████████████████████| 12914/12914 [00:00<00:00, 16266.93it/s]
Infer Data Model¶
We can infer a data model from data in the NeatSession
by calling .infer()
.
neat.infer()
count | |
---|---|
NeatIssue | |
ResourceRegexViolationWarning | 2 |
neat.inspect.issues()
2 issues found¶
- ResourceRegexViolationWarning: The Property with identifier source.1 in the Properties sheet, Property column is violating the CDF regex (?!^(property|space|externalId|createdTime|lastUpdatedTime|deletedTime|edge_id|node_id|project_id|property_group|seq|tg_table_name|extensions)$)(^[a-zA-Z][a-zA-Z0-9_]{0,253}[a-zA-Z0-9]?$). This will lead to errors when converting to DMS data model.
Fix: Either export the data model and make the necessary changes manually or run fix.data_model.cdf_compliant_external_ids.
- ResourceRegexViolationWarning: The Property with identifier externalId in the Properties sheet, Property column is violating the CDF regex (?!^(property|space|externalId|createdTime|lastUpdatedTime|deletedTime|edge_id|node_id|project_id|property_group|seq|tg_table_name|extensions)$)(^[a-zA-Z][a-zA-Z0-9_]{0,253}[a-zA-Z0-9]?$). This will lead to errors when converting to DMS data model.
Fix: Either export the data model and make the necessary changes manually or run fix.data_model.cdf_compliant_external_ids.
We notice that one of the properies in the Activity
contains a .
which is illegal for CDF data models. Looking at the type hint we see that we can usea prepare method to fix this issue.
neat.fix.data_model.cdf_compliant_external_ids()
Success: NEAT(verified,logical,neat_space,NeatInferredDataModel,v1) → NEAT(verified,logical,neat_space,NeatInferredDataModel,v1)
The model is now ready to be converted to the physical format
In addition, we see that one of the properties is named externalId
which is a reserved word. In the conversion, we skip this
neat.convert(reserved_properties="skip")
Rules converted to dms. You can inspect the issues with the .inspect.issues(...) method.
Succeeded with warnings: NEAT(verified,logical,neat_space,NeatInferredDataModel,v1) → NEAT(verified,physical,neat_space,NeatInferredDataModel,v1)
count | |
---|---|
NeatIssue | |
NeatValueWarning | 1 |
Hint: Use the .inspect.issues() for more details.
neat.inspect.issues()
1 issues found¶
- NeatValueWarning: Property externalId is a reserved property in DMS. Skipping...
Inspect Data Model¶
We use the .show
to inspect the data model and see that the activity is connected to the asset
neat.set.data_model_id(("sp_doctrino2", "DoctrinoAssetActivityModel", "v1"))
Success: NEAT(verified,physical,neat_space,NeatInferredDataModel,v1) → NEAT(verified,physical,sp_doctrino2,DoctrinoAssetActivityModel,v1)
neat.show.data_model()
http_purl.org_cognite_neat_data-model_verified_physical_sp_doctrino2_DoctrinoAssetActivityModel_v1.html
Inspecting the model, we see that we have successfully linked the activity with the asset.
Publishing Data Model to CDF¶
Now we are ready to publish this to CDF.
neat.to.cdf.data_model()
You can inspect the details with the .inspect.outcome.data_model(...) method.
name | unchanged | |
---|---|---|
0 | spaces | 1 |
1 | containers | 2 |
2 | views | 2 |
3 | data_models | 1 |
4 | nodes | 0 |
Populating Data Model¶
Neat keeps track of the data, so we can immidiately populate this data model with the original data
neat.to.cdf.instances()
INFO | 2025-01-26 14:33:59,464 | Staring DMSLoader and will process 2 views. INFO | 2025-01-26 14:33:59,476 | Starting ViewId(space='sp_doctrino2', external_id='Activity', version='v1') 1/2. Loading ViewId(space='sp_doctrino2', external_id='Activity', version='v1'): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 13002/13002 [00:40<00:00, 322.44it/s] INFO | 2025-01-26 14:34:39,807 | Finished ViewId(space='sp_doctrino2', external_id='Activity', version='v1'). INFO | 2025-01-26 14:34:39,941 | Starting ViewId(space='sp_doctrino2', external_id='Asset', version='v1') 2/2. Loading ViewId(space='sp_doctrino2', external_id='Asset', version='v1'): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1103/1103 [00:03<00:00, 354.32it/s] INFO | 2025-01-26 14:34:43,059 | Finished ViewId(space='sp_doctrino2', external_id='Asset', version='v1').
You can inspect the details with the .inspect.outcome.instances(...) method.
name | changed | created | |
---|---|---|---|
0 | Activity | 13,002 | 0 |
1 | Asset | 587 | 516 |