Onboarding Multiple CSVs¶
Prerequisite:
- Installed Neat, see Installation
- Launched a notebook environment.
- Familiar with the
NeatSession
object, see introduction - Access to
NeatEngine
.
In this tutorial, we will load data from two csv
files, connect the data, infer a data model from the data and push the model with data to CDF.
Reading Metadata¶
We will start by instansiating a NeatSession
and read the data from multiple URLs.
from cognite.neat import NeatSession, get_cognite_client
# Note that we use Oxigraph in this example, this will not work in a CDF notebook
neat = NeatSession(get_cognite_client(".env"), storage="oxigraph")
Found .env file in repository root. Loaded variables from .env file. Neat Engine 2.0.3 loaded.
base_url = "https://apps-cdn.cogniteapp.com/toolkit/publicdata/"
asset = "assets.Table.csv"
activity = "workitem.Table.csv"
neat.read.csv(f"{base_url}{asset}", type="Asset", primary_key="WMT_TAG_GLOBALID")
neat.read.csv(f"{base_url}{activity}", type="Activity", primary_key="sourceId")
neat
Instances
Overview:
- 2 types
- 14105 instances
Type | Occurrence | |
---|---|---|
0 | Activity | 13002 |
1 | Asset | 1103 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to graph store using CSVExtractor
- Extracted triples to graph store using CSVExtractor
Studying the output above, we see that we succesfully read the assets and activities.
Connecting Data¶
By studying the source data, we notice that the WORKORDER_ITEMNAME
column in the activity CSV is referencing the WMT_TAG_NAME
in the asset CSV.
We use neat to connect the activities to assets, through a new property we call asset
in the activity.
source = ("Activity", "WORKORDER_ITEMNAME")
target = ("Asset", "WMT_TAG_NAME")
connection = "asset"
neat.prepare.instances.make_connection_on_exact_match(source, target, connection)
Found 100 connections. Adding them to the graph...
Infer Data Model¶
We can infer a data model from data in the NeatSession
by calling .infer()
.
neat.infer()
Succeeded with warnings: Inferred UnverifiedInformationModel
count | |
---|---|
NeatIssue | |
PropertySkippedWarning | 1 |
Hint: Use the .inspect.issues() for more details.
neat
Unverified Data Model
type | Logical Data Model |
---|---|
intended for | Information Architect |
name | Inferred Model |
external_id | NeatInferredDataModel |
space | neat_space |
version | v1 |
classes | 2 |
properties | 58 |
Instances
Overview:
- 2 types
- 14105 instances
Type | Occurrence | |
---|---|---|
0 | Activity | 13002 |
1 | Asset | 1103 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to graph store using CSVExtractor
- Extracted triples to graph store using CSVExtractor
- Adds property that contains id of reference to all references of given class in Rules
This gives us an unverified data model, which we can then verify.
Verify Data Model¶
neat.verify()
You can inspect the issues with the .inspect.issues(...) method.
Succeeded with warnings: UnverifiedInformationModel → VerifiedInformationModel
count | |
---|---|
NeatIssue | |
ResourceRegexViolationWarning | 1 |
Hint: Use the .inspect.issues() for more details.
neat.inspect.issues()
1 issues found¶
- ResourceRegexViolationWarning: The Property with identifier source.1 in the Properties sheet, Property column is violating the CDF regex (?!^(property|space|externalId|createdTime|lastUpdatedTime|deletedTime|edge_id|node_id|project_id|property_group|seq|tg_table_name|extensions)$)(^[a-zA-Z][a-zA-Z0-9_]{0,253}[a-zA-Z0-9]?$). This will lead to errors when converting to DMS data model.
Fix: Either export the data model and make the necessary changes manually or run prepare.data_model.cdf_compliant_external_ids.
We notice that one of the properies in the Activity
contains a .
which is illegal for CDF data models. Looking at the type hint we see that we can usea prepare method to fix this issue.
neat.prepare.data_model.cdf_compliant_external_ids()
The ToCompliantEntities actions expects a UnverifiedModel. Moving back 1 step to the last UnverifiedInformationModel.
Success: UnverifiedInformationModel → UnverifiedInformationModel
When we run the .cdf_compliant_external_ids
we notice that it expects an UnverifiedModel. Neat automatically goes back to the last unverified model and executes the action.
We now have to run neat.verify()
to get back a verified model.
neat.verify()
Success: UnverifiedInformationModel → VerifiedInformationModel
Note that this time we verify the model withouth any issues.
neat.convert("dms")
Rules converted to dms
Success: VerifiedInformationModel → VerifiedDMSModel
Inspect Data Model¶
We use the .show
to inspect the data model and see that the activity is connected to the asset
neat.set.data_model_id(("sp_doctrino2", "DoctrinoAssetActivityModel", "v1"))
Success: VerifiedDMSModel → VerifiedDMSModel
neat.show.data_model()
http_purl.org_cognite_neat_data-model_verified_physical_sp_doctrino2_DoctrinoAssetActivityModel_v1.html
Inspecting the model, we see that we have successfully linked the activity with the asset.
Publishing Data Model to CDF¶
Now we are ready to publish this to CDF.
neat.to.cdf.data_model()
You can inspect the details with the .inspect.outcome.data_model(...) method.
name | unchanged | |
---|---|---|
0 | spaces | 1 |
1 | containers | 2 |
2 | views | 2 |
3 | data_models | 1 |
4 | nodes | 0 |
Populating Data Model¶
Neat keeps track of the data, so we can immidiately populate this data model with the original data
neat.to.cdf.instances()
INFO | 2024-12-06 15:37:01,060 | Staring DMSLoader and will process 2 views. INFO | 2024-12-06 15:37:01,063 | Starting ViewId(space='sp_doctrino2', external_id='Activity', version='v1') 1/2. INFO | 2024-12-06 15:37:44,886 | Finished ViewId(space='sp_doctrino2', external_id='Activity', version='v1'). INFO | 2024-12-06 15:37:44,888 | Starting ViewId(space='sp_doctrino2', external_id='Asset', version='v1') 2/2. INFO | 2024-12-06 15:37:48,052 | Finished ViewId(space='sp_doctrino2', external_id='Asset', version='v1').
You can inspect the details with the .inspect.outcome.instances(...) method.
name | created | changed | |
---|---|---|---|
0 | Activity | 13,002 | 0 |
1 | Asset | 1,025 | 78 |