RDF Onboarding
RDF Onboarding¶
Prerequisite:
- Access to a CDF Project.
- Know how to install and setup Python.
- Launch a Python notebook.
In this tutorial we will show how to onboard source data to CDF. We will:
- read in Nordic44 data, containing instances of the nordic power system in the form of RDF triples.
- infer the underlying data model
- prepare data model to be CDF compliant
- upload data model to CDF
- populate data model with instances from the Nordic44 data
from cognite.neat import NeatSession, get_cognite_client
client = get_cognite_client(".env")
Found .env file in repository root. Loaded variables from .env file.
neat = NeatSession(client, storage="oxigraph", verbose=True)
Neat Engine 2.0.3 loaded.
We have already nordic44 as an example in neat and we can read it in as following:
neat.read.rdf.examples.nordic44()
No issues found
Let's inspect content of the NEAT session:
neat
Instances
Overview:
- 59 types
- 2713 instances
Type | Occurrence | |
---|---|---|
0 | CurrentLimit | 530 |
1 | Terminal | 452 |
2 | OperationalLimitSet | 238 |
3 | OperatingShare | 207 |
4 | VoltageLimit | 184 |
... | ... | ... |
58 | FullModel | 1 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to graph store using RdfFileExtractor
Neat provides you a way also to visualize instances:
neat.show.instances()
instances.html
Let's now infer data model from the instances (this take with in-memory graph store ~ 20s).
The inference will produce un-validated logical data model (aka Information Rules), which we will later prepare for validation and later conversion to physical data model (aka DMS Rules)
issues = neat.infer()
Now if we inspect session you will beside instances overview also metadata about the inferred data model:
neat
Unverified Data Model
type | Logical Data Model |
---|---|
intended for | Information Architect |
name | Inferred Model |
external_id | NeatInferredDataModel |
space | neat_space |
version | v1 |
classes | 59 |
properties | 331 |
Instances
Overview:
- 59 types
- 2713 instances
Type | Occurrence | |
---|---|---|
0 | CurrentLimit | 530 |
1 | Terminal | 452 |
2 | OperationalLimitSet | 238 |
3 | OperatingShare | 207 |
4 | VoltageLimit | 184 |
... | ... | ... |
58 | FullModel | 1 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to graph store using RdfFileExtractor
Since the inferred data model might contain external_ids of properties and objects which are not compliant with cdf we need to prepare it prior validation and conversion to cdf compliant data model. This can be done by running:
neat.prepare.data_model.cdf_compliant_external_ids()
Success: UnverifiedInformationModel → UnverifiedInformationModel
neat.verify()
Success: UnverifiedInformationModel → VerifiedInformationModel
After running verify we can see that we now have a verified data model in neat session:
neat
Verified Data Model
type | Logical Data Model |
---|---|
intended for | Information Architect |
name | Inferred Model |
external_id | NeatInferredDataModel |
version | v1 |
classes | 59 |
properties | 331 |
Instances
Overview:
- 59 types
- 2713 instances
Type | Occurrence | |
---|---|---|
0 | CurrentLimit | 530 |
1 | Terminal | 452 |
2 | OperationalLimitSet | 238 |
3 | OperatingShare | 207 |
4 | VoltageLimit | 184 |
... | ... | ... |
58 | FullModel | 1 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to graph store using RdfFileExtractor
- Added rules to graph store as InformationRules
- Upsert prefixes to graph store
Similar to the instances we can visualize the data model:
neat.show.data_model()
http_purl.org_cognite_neat_data-model_verified_logical_neat_space_NeatInferredDataModel_v1.html
Now let's convert the verified data model to be CDF complian data model (aka physical data model)
neat.convert("dms")
Rules converted to dms
Success: VerifiedInformationModel → VerifiedDMSModel
neat
Verified Data Model
aspect | physical |
---|---|
intended for | DMS Architect |
name | Inferred Model |
space | neat_space |
external_id | NeatInferredDataModel |
version | v1 |
views | 59 |
containers | 59 |
properties | 331 |
Instances
Overview:
- 59 types
- 2713 instances
Type | Occurrence | |
---|---|---|
0 | CurrentLimit | 530 |
1 | Terminal | 452 |
2 | OperationalLimitSet | 238 |
3 | OperatingShare | 207 |
4 | VoltageLimit | 184 |
... | ... | ... |
58 | FullModel | 1 |
Provenance:
- Initialize graph store as OxigraphStore
- Extracted triples to graph store using RdfFileExtractor
- Added rules to graph store as InformationRules
- Upsert prefixes to graph store
Now we have all the components to form knowledge graph in CDF, i.e. data model and instances. Let's upload them to CDF:
neat.to.cdf.data_model()
name | created | |
---|---|---|
0 | spaces | 1 |
1 | containers | 59 |
2 | views | 59 |
3 | data_models | 1 |
neat.to.cdf.instances(space="inference_space")
name | created | changed | |
---|---|---|---|
0 | Nodes | 1000.0 | NaN |
1 | Edges | NaN | NaN |
2 | Nodes | 737.0 | 263.0 |
3 | Edges | NaN | NaN |
4 | Nodes | 547.0 | 166.0 |
5 | Edges | NaN | NaN |