From conceptual to physical data model by leveraging and extending core data model concepts¶
Prerequisite:
- Basic understanding of Data Modeling in CDF
- Basic understanding of Core Data Model
- Access to a CDF Project.
- Know how to install and setup Python.
- Launch a Python notebook.
In this tutorial, you will learn to model data based on an established and industry accepted approach which fuses the following two techniques:
- expert elicitation: by putting domain expert in focus, who has knowledge about the domain, and who has business questions to answer aided by data to be modeled
- progressive disclosure: by incrementally increasing complexity of our data model, starting with conceptual data model (defining concepts) , and progressively increasing the model fidelity, making sure that a Core Data Model (CDM) concepts are leveraged, until ready to be converted to physical data model (aka CDF data model)
The data modeling flow depicting the approach is shown below through different roles:
In this tutorial will model Wind Energy domain, for company Wind of Change, starting by defining minimal model that can support domain expert working in Wind Farm Prospecting business unit of Wind Of Change, trying to answer "Is this site viable for a profitable wind project?" business question.
Summary
- domain: Wind Energy
- company: Wind of Change
- business unit: Wind Farm Prospecting
- business question: Is a site the right one to develop a wind farm project?
NEAT¶
We will use NEAT to create the data model based on the above requirements.
Interaction with NEAT is done through so-called NeatSession. NeatSession is typically instantiated with Cognite client which allows us to connect to CDF and read and write data models and instances. Therefore, we will import NeatSession and a convenience method get_cognite_client:
from cognite.neat import NeatSession, get_cognite_client
if you do not have
.envfile stored locally callget_cognite_client()first to create one:
client = get_cognite_client(".env")
Found .env file in repository root. Loaded variables from .env file.
neat = NeatSession(client)
For Neat to improve, we need to collect usage information. You acknowledge and agree that neat may collect usage information.To remove this message run 'neat.opt.in_() or to stop collecting usage information run 'neat.opt.out()'. Neat Engine 2.0.5 loaded.
Expert elicitation¶
In the introduction of this tutorial, we have established the business question of interest.
Usually, a data modeling process is unknown to domain experts, especially platform-specific details (e.g., in the case of CDF that are details such as views, containers, etc.). However, domain experts understand what concepts they need information about to answer the business question. Therefore, it is crucial to interview them to extract these concepts (a process known as expert elicitation). The extracted concepts, and connections between them, form the base for what is known as Conceptual Data Model.
Let's assume that we asked our domain expert how he/she is able to answer the question:
Is a site the right one to develop a wind farm project?
We got an answer that he/she typically use a simple rule of thumb expressed with the following formula:
7.5 * Annual Energy Production * Electricity Price >= Total Costs
So in short 7.5 years since the wind farm starts operating, amount of electricity that is sold should cover all the costs.
To satisfy the above formula, the domain expert highlights a need to know the following data:
- location and area of the site
- wind climate for the site
- wind turbine characteristic
- costs
- expected electricity price
With a set of more detailed questions related, we would be able to outline the following concepts:
- Wind farm: A group of wind turbines installed in a specific area to generate electricity from wind.
- Wind turbine: A machine that converts wind energy into electrical power.
- Meteorological mast: A tall tower equipped with sensors to measure wind and weather conditions at potential wind farm sites.
- Anemometer: A device mounted on a mast to measure wind speed.
- Wind vane: A sensor that shows the direction of the wind.
- Site: A specific location being evaluated for the feasibility of building a wind farm.
- Cost: The total expense involved in developing, building, and operating a wind farm.
- Electricity price: The market rate at which the generated electricity can be sold.
At this stage, we can start forming the conceptual data model, for which NEAT offers an Excel template that we can leverage. Below is the command that will generate the template:
neat.template.conceptual_model("wind_farm_prospecting_conceptual_data_model.xlsx")
Capturing expert's knowledge in conceptual data model¶
Let's now fill in the template.
As we are adding concepts (which in the current template are added in the Classes sheet, and are known as classes), we should try to select appropriate CDM concepts to build them off.
To simplify the selection of CDM concepts that we want to base our concepts on, we will use the following rule of thumb(s):
Large infrastructures, physical objects that can be represented as a hierarchy of concepts (assets), and/or objects that can be seen as "boxes" to put other objects in / be part of we will base of
CogniteAssetPhysical objects that do not have a clear hierarchical structure, or are part of other physical objects we will base off
CogniteEqupimentAny other concept that does follow rules 1 and 2, and requires human-readable properties, we will base off
CogniteDescribable
According to the above, we have based our concepts:
WindFarmofCogniteAsset: as this is a large infrastructure that will contain child assetsWindTurbineandMetMastWindTurbineofCogniteAsset: as it is the building block ofWindFarmand enables further fine-graining to building components such asBlade,Tower,Nacelle, etc.MetMastofCogniteAsset: as it is a part ofWindFarm, and will be a "box" to "add" all sensorsAnemometerofCogniteEquipment: it is a sensor that attaches to a meteorological mast, not necessarily in a hierarchical formWindVaneofCogniteEquipment: the same reason as anemometerSiteofCogniteDescribable: it is not an asset or equipment, but it will hold information such as name and description.CostofCogniteDescribable: same reasoning asSiteElectricityPriceofCogniteDescribable: same reasoning asSite
But let's make one small improvement. Since Anemometer and WindVane are both sensors, we can create a more generic concept Sensor, which is based on CogniteEquipment, and update the model such that Anemometer and WindVane implement Sensor instead of CogniteEquipment. Why are we doing this? This will allow us to define common properties of Anemometer and WindVane for Sensor, and only define custom properties for Anemometer and WindVane.
The video below shows the process of filling in the template with our concepts (only rendered at thisisneat.io).
Adding properties to concepts¶
In this second part of the expert elicitation process, we are going into more detail about our concepts with our domain expert. We will add properties to concepts. These being:
- properties that hold data (also known as attributes)
- and properties that connect concepts (also known as connections or relationships)
Specifically, we have added the following properties:
WindFarmwindTurbineto hold connection to allWindTurbine(s) which makeWindFarm
WindTurbine:powerCurveto hold power curve measurements of theWindTurbinehubHeightto store the hub height of theWindTurbineratedPowerto store the rated power of theWindTurbinemanufacturerto store manufacturer of theWindTurbine
MetMast:iecCompliantto indicate if theMetMastis IEC compliantsensorto hold connection toSensor(s) which are attached toMetMast
Sensor:heightto store the height at which theSensoris mounted on theMetMastboomDirectionto store the boom direction of theSensoruncertaintyto store uncertainty of theSensormeasurementRangeto store the measurement range of theSensormeasurementsto store measurements (i.e. timeseries)
Anemometer:numberOfCupsto store how many cups theAnemometerhas
ElectricityPrice:upperPriceto store the upper price of the electricitylowerPriceto store the lower price of the electricity
Cost:opexto store operational expenditurecapexto store capital expenditure
The video below shows the process of filling the template with properties (only rendered at thisisneat.io).
Adding Location concept¶
You might notice that we did not add any custom properties to Site yet.
As we need to express the geographical location of Site in the form of a bounding box, we need to create a helper concept Location to store information about:
latitudelongitude
We will create this new concept and add these properties to it. Similar to Site, we will base Location off CogniteDescribable (it does not hurt to have information about location expressed with name and description which are properties of CogniteDescribable).
Once we create this concept we will bbox (bounding box) property to Site and set its value type to be Location, with a minimum count of 3 (we need at least three geo locations to express bounding box), and max count of 100 (this is the provisional max number of locations to express bounding box, we do not need high resolution).
Also, we will add new property location to WindFarm and MetMast and set its value type to Location. In this way, we will be able to express the geographical location of wind turbines and met mast in a wind farm.
The video below shows the process described above:
Improving value type for powerCurve property¶
In a discussion with the domain expert, we learned that a wind turbine power curve is a graph that is used to represent the amount of power that a wind turbine can produce at different wind speeds. Typically there are multiple power curves specific to air density. The below image shows an idealized power curve of a wind turbine.
Idealised Wind Turbine Power Curve (origin WIKIMEDIA)
Initially, we set powerCurve property value type to be CogniteTimeSeries, which is wrong as we cannot represent a wind turbine power curve with timeseries. A classic CDF resource sequence would be more appropriate here. However, we do not have a corresponding representation in Core Data Model for the Cognite sequence. Therefore, similar to the case of Site, we will create a helper concept PowerCurve, which we will base on CogniteDescribable and add the following custom properties:
windSpeedBins: Discrete wind speed intervals used to categorize wind data.powerBins: Corresponding power output values for each wind speed bin.cutInSpeed: The minimum wind speed at which the turbine starts generating power.cutOutSpeed: The wind speed at which the turbine shuts down to prevent damage.ratedSpeed: The wind speed at which the turbine reaches its maximum (rated) power output.airDensity: For which air density for which the power curve is viable
Additionally, as we are to know about the site air density, we need to know about the site:
- atmospheric pressure
- temperature
- humidity
to be able to calculate the air density.
Accordingly, we need to add concepts of:
Barometerto have data on atmospheric pressureThermometerto have data on temperatureHygrometerto have data on humidity
Here one can see an example of what seemed a relatively small update to our data model with new knowledge from the domain expert. In reality, adding new knowledge resulted in a bit involved process of data model update. Therefore, be focused, take your time, listen, and ask questions. Data modeling is never a "one-time effort", but a continuous process that lasts as long as our knowledge about our domain evolves.
The video below shows how the template has been updated with the above information:
Expanding and fine-tuning conceptual data model¶
Up until this point, we have formed our conceptual data model, and based our concepts on CDM concepts.
At this stage, we will now "expand" the conceptual data model with properties that are inherited from CDM concepts through their implementation. NEAT has a special method to perform this action which is accessible via neat.template.expand().
In addition to expanding the list of properties with those originating from the implemented concepts, NEAT is adding automatically the property <nameOfConcept>GUID to every concept the user defines. By adding a specific property to a user-defined concept, one can skip adding filters in the physical data model to ensure the consumption of data through user-defined concepts.
Furthermore, due to the current data modeling UI limitations in CDF, NEAT will add all concepts from the Core Data Model.
Now with the expanded list of properties, we will do the following:
- update
parentandrootfor each concept that implementsCogniteAssetto beWindFarm - update the
assetproperty ofSensorto point toMetMast - update the
equipmentproperty ofMetMastto point toSensor - remove the
measurementproperty fromSensor,WindVane, andAnemometer, sinceCogniteEqupimentalready has the propertytimeSerieswhich points toCogniteTimeSeies. Therefore, no need to have a property holding the same information - remove the
sensorproperty fromMetMastas there is already propertyequipmentwhich we will update to point toSensorinstead ofCogniteEquipmet - remove dummy property
<nameOfConcept>GUIDfrom concepts that already have custom properties, which in our case are all concepts exceptWindVane,Thermometer,Hygrometer, andBarometer, as for this concept we did not add any custom property.
NEAT automatically adds
<nameOfConcept>GUIDto every concept the user defines. By adding specific properties to user-defined concepts one can skip adding filters to ensure consumption of data through user-defined concepts.
To expand the data model we call the method neat.template.expand() and pass the file name of our filled conceptual data model:
neat.template.expand("wind_farm_prospecting_conceptual_data_model.xlsx")
Success: Created extension template
The video below shows process of fine tuning conceptual data model
The resulting Excel file can be downloaded using this link.
Converting conceptual to physical data model¶
The conceptual data model cannot be directly published to CDF. Therefore, we need to convert it to the physical data model form which can be published to CDF.
So we will do the following:
- read conceptual data model to
NeatSessionvianeat.read.excel("<your filename>") - check the content of the session by calling
neat - visualize data model by calling
neat.show.data_model() - visualize all the implements by calling
neat.show.data_model.implements() - convert conceptual to physical data model via the command
neat.convert() - export physical data model via the command
neat.to.excel("<filename>")
neat = NeatSession(client)
For Neat to improve, we need to collect usage information. You acknowledge and agree that neat may collect usage information.To remove this message run 'neat.opt.in_() or to stop collecting usage information run 'neat.opt.out()'. Neat Engine 2.0.5 loaded.
neat.read.excel("wind_farm_prospecting_conceptual_data_model_expanded.xlsx")
Success: Read NEAT(verified,conceptual,wind_energy,WindFarmProspecting,v1)
neat
Data Model
| level | conceptual |
|---|---|
| intended for | Domain Expert and/or Information Architect |
| name | Wind Farm Prospecting Data Model |
| external_id | WindFarmProspecting |
| version | v1 |
| concepts | 44 |
| properties | 252 |
neat.show.data_model()
http_purl.org_cognite_neat_data-model_verified_conceptual_wind_energy_WindFarmProspecting_v1.html
neat.show.data_model.implements()
http_purl.org_cognite_neat_data-model_verified_conceptual_wind_energy_WindFarmProspecting_v1_implements.html
neat.convert()
Rules converted to dms.
Success: NEAT(verified,conceptual,wind_energy,WindFarmProspecting,v1) → NEAT(verified,physical,wind_energy,WindFarmProspecting,v1)
neat
Data Model
| level | physical |
|---|---|
| intended for | Data Engineer |
| name | Wind Farm Prospecting Data Model |
| space | wind_energy |
| external_id | WindFarmProspecting |
| version | v1 |
| views | 44 |
| containers | 17 |
| properties | 252 |
neat.to.excel("wind_farm_prospecting_physical_data_model.xlsx")
Finetuning physical data model prior to publishing¶
Even though NEAT tries to make the best possible conversion from conceptual to the physical data model, there is always room for improvements in both the conversion process but also post-conversion optimization of the produced physical data model.
In this specific context, in the Properties sheet we will make the following modifications:
- change
edgeconnection type for propertieswindTurbine,powerCurve,bboxtodirectas we do not have a need to useedgeas connection type (we are not adding properties to our connection, neither we expect a very large number of connections). - we will add units to
hubHeight,ratedPower,height,boomDirection,uncertainty,measurementRange,latitude,longitude,windSpeedBins,powerBins,cutInSpeed,cutOutSpeed,ratedSpeedandairDensityproperties
An up-to-date list of supported units can be found at this page
The video below demonstrates how these changes are introduced in our physical data model:
The resulting Excel file can be downloaded using this link.
Read and publish finetuned physical data model¶
Now we will:
- read the fine-tuned physical data model
- publish it to CDF
- and view final result in CDF
neat.read.excel(".wind_farm_prospecting_physical_data_model.xlsx", enable_manual_edit=True)
[WARNING] Experimental feature 'enable_manual_edit' is subject to change without notice
Succeeded with warnings: Read NEAT(verified,physical,wind_energy,WindFarmProspecting,v1)
| count | |
|---|---|
| NeatIssue | |
| PropertyNotFoundWarning | 54 |
Hint: Use the .inspect.issues() for more details.
Mind potential warnings, especially
PropertyNotFoundWarning.
neat.to.cdf.data_model()
You can inspect the details with the .inspect.outcome.data_model(...) method.
| name | created | |
|---|---|---|
| 0 | spaces | 1 |
| 1 | containers | 13 |
| 2 | views | 14 |
| 3 | data_models | 1 |
| 4 | nodes | 0 |
The video below show resulting data model in CDF:
