From conceptual to physical data model by leveraging and extending core data model concepts¶
Prerequisite:
- Basic understanding of Data Modeling in CDF
- Basic understanding of Core Data Model
- Access to a CDF Project.
- Know how to install and setup Python.
- Launch a Python notebook.
In this tutorial, you will learn to model data based on an established and industry accepted approach which fuses the following two techniques:
- expert elicitation: by putting domain expert in focus, who has knowledge about the domain, and who has business questions to answer aided by data to be modeled
- progressive disclosure: by incrementally increasing complexity of our data model, starting with conceptual data model (defining concepts) , and progressively increasing the model fidelity, making sure that a Core Data Model (CDM) concepts are leveraged, until ready to be converted to physical data model (aka CDF data model)
The data modeling flow depicting the approach is shown below through different roles:
In this tutorial will model Wind Energy domain, for company Wind of Change, starting by defining minimal model that can support domain expert working in Wind Farm Prospecting business unit of Wind Of Change, trying to answer "Is this site viable for a profitable wind project?" business question.
Summary
- domain: Wind Energy
- company: Wind of Change
- business unit: Wind Farm Prospecting
- business question: Is a site the right one to develop a wind farm project?
NEAT¶
We will use NEAT to create the data model based on the above requirements.
Interaction with NEAT is done through so-called NeatSession. NeatSession
is typically instantiated with Cognite client which allows us to connect to CDF and read and write data models and instances. Therefore, we will import NeatSession
and a convenience method get_cognite_client
:
from cognite.neat import NeatSession, get_cognite_client
if you do not have
.env
file stored locally callget_cognite_client()
first to create one:
client = get_cognite_client(".env")
Found .env file in repository root. Loaded variables from .env file.
neat = NeatSession(client)
For Neat to improve, we need to collect usage information. You acknowledge and agree that neat may collect usage information.To remove this message run 'neat.opt.in_() or to stop collecting usage information run 'neat.opt.out()'. Neat Engine 2.0.5 loaded.
Expert elicitation¶
In the introduction of this tutorial, we have established the business question of interest.
Usually, a data modeling process is unknown to domain experts, especially platform-specific details (e.g., in the case of CDF that are details such as views, containers, etc.). However, domain experts understand what concepts they need information about to answer the business question. Therefore, it is crucial to interview them to extract these concepts (a process known as expert elicitation). The extracted concepts, and connections between them, form the base for what is known as Conceptual Data Model.
Let's assume that we asked our domain expert how he/she is able to answer the question:
Is a site the right one to develop a wind farm project?
We got an answer that he/she typically use a simple rule of thumb expressed with the following formula:
7.5 * Annual Energy Production * Electricity Price >= Total Costs
So in short 7.5 years since the wind farm starts operating, amount of electricity that is sold should cover all the costs.
To satisfy the above formula, the domain expert highlights a need to know the following data:
- location and area of the site
- wind climate for the site
- wind turbine characteristic
- costs
- expected electricity price
With a set of more detailed questions related, we would be able to outline the following concepts:
- Wind farm: A group of wind turbines installed in a specific area to generate electricity from wind.
- Wind turbine: A machine that converts wind energy into electrical power.
- Meteorological mast: A tall tower equipped with sensors to measure wind and weather conditions at potential wind farm sites.
- Anemometer: A device mounted on a mast to measure wind speed.
- Wind vane: A sensor that shows the direction of the wind.
- Site: A specific location being evaluated for the feasibility of building a wind farm.
- Cost: The total expense involved in developing, building, and operating a wind farm.
- Electricity price: The market rate at which the generated electricity can be sold.
At this stage, we can start forming the conceptual data model, for which NEAT offers an Excel template that we can leverage. Below is the command that will generate the template:
neat.template.conceptual_model("wind_farm_prospecting_conceptual_data_model.xlsx")
Capturing expert's knowledge in conceptual data model¶
Let's now fill in the template.
As we are adding concepts (which in the current template are added in the Classes
sheet, and are known as classes), we should try to select appropriate CDM concepts to build them off.
To simplify the selection of CDM concepts that we want to base our concepts on, we will use the following rule of thumb(s):
Large infrastructures, physical objects that can be represented as a hierarchy of concepts (assets), and/or objects that can be seen as "boxes" to put other objects in / be part of we will base of
CogniteAsset
Physical objects that do not have a clear hierarchical structure, or are part of other physical objects we will base off
CogniteEqupiment
Any other concept that does follow rules 1 and 2, and requires human-readable properties, we will base off
CogniteDescribable
According to the above, we have based our concepts:
WindFarm
ofCogniteAsset
: as this is a large infrastructure that will contain child assetsWindTurbine
andMetMast
WindTurbine
ofCogniteAsset
: as it is the building block ofWindFarm
and enables further fine-graining to building components such asBlade
,Tower
,Nacelle
, etc.MetMast
ofCogniteAsset
: as it is a part ofWindFarm
, and will be a "box" to "add" all sensorsAnemometer
ofCogniteEquipment
: it is a sensor that attaches to a meteorological mast, not necessarily in a hierarchical formWindVane
ofCogniteEquipment
: the same reason as anemometerSite
ofCogniteDescribable
: it is not an asset or equipment, but it will hold information such as name and description.Cost
ofCogniteDescribable
: same reasoning asSite
ElectricityPrice
ofCogniteDescribable
: same reasoning asSite
But let's make one small improvement. Since Anemometer
and WindVane
are both sensors, we can create a more generic concept Sensor
, which is based on CogniteEquipment
, and update the model such that Anemometer
and WindVane
implement Sensor
instead of CogniteEquipment
. Why are we doing this? This will allow us to define common properties of Anemometer
and WindVane
for Sensor
, and only define custom properties for Anemometer
and WindVane
.
The video below shows the process of filling in the template with our concepts (only rendered at thisisneat.io).
Adding properties to concepts¶
In this second part of the expert elicitation process, we are going into more detail about our concepts with our domain expert. We will add properties to concepts. These being:
- properties that hold data (also known as attributes)
- and properties that connect concepts (also known as connections or relationships)
Specifically, we have added the following properties:
WindFarm
windTurbine
to hold connection to allWindTurbine
(s) which makeWindFarm
WindTurbine
:powerCurve
to hold power curve measurements of theWindTurbine
hubHeight
to store the hub height of theWindTurbine
ratedPower
to store the rated power of theWindTurbine
manufacturer
to store manufacturer of theWindTurbine
MetMast
:iecCompliant
to indicate if theMetMast
is IEC compliantsensor
to hold connection toSensor
(s) which are attached toMetMast
Sensor
:height
to store the height at which theSensor
is mounted on theMetMast
boomDirection
to store the boom direction of theSensor
uncertainty
to store uncertainty of theSensor
measurementRange
to store the measurement range of theSensor
measurements
to store measurements (i.e. timeseries)
Anemometer
:numberOfCups
to store how many cups theAnemometer
has
ElectricityPrice
:upperPrice
to store the upper price of the electricitylowerPrice
to store the lower price of the electricity
Cost
:opex
to store operational expenditurecapex
to store capital expenditure
The video below shows the process of filling the template with properties (only rendered at thisisneat.io).
Adding Location
concept¶
You might notice that we did not add any custom properties to Site
yet.
As we need to express the geographical location of Site
in the form of a bounding box, we need to create a helper concept Location
to store information about:
latitude
longitude
We will create this new concept and add these properties to it. Similar to Site
, we will base Location
off CogniteDescribable
(it does not hurt to have information about location expressed with name
and description
which are properties of CogniteDescribable
).
Once we create this concept we will bbox
(bounding box) property to Site
and set its value type to be Location
, with a minimum count of 3 (we need at least three geo locations to express bounding box), and max count of 100 (this is the provisional max number of locations to express bounding box, we do not need high resolution).
Also, we will add new property location
to WindFarm
and MetMast
and set its value type to Location
. In this way, we will be able to express the geographical location of wind turbines and met mast in a wind farm.
The video below shows the process described above:
Improving value type for powerCurve
property¶
In a discussion with the domain expert, we learned that a wind turbine power curve is a graph that is used to represent the amount of power that a wind turbine can produce at different wind speeds. Typically there are multiple power curves specific to air density. The below image shows an idealized power curve of a wind turbine.
Idealised Wind Turbine Power Curve (origin WIKIMEDIA)
Initially, we set powerCurve
property value type to be CogniteTimeSeries
, which is wrong as we cannot represent a wind turbine power curve with timeseries. A classic CDF resource sequence would be more appropriate here. However, we do not have a corresponding representation in Core Data Model for the Cognite sequence. Therefore, similar to the case of Site
, we will create a helper concept PowerCurve
, which we will base on CogniteDescribable
and add the following custom properties:
windSpeedBins
: Discrete wind speed intervals used to categorize wind data.powerBins
: Corresponding power output values for each wind speed bin.cutInSpeed
: The minimum wind speed at which the turbine starts generating power.cutOutSpeed
: The wind speed at which the turbine shuts down to prevent damage.ratedSpeed
: The wind speed at which the turbine reaches its maximum (rated) power output.airDensity
: For which air density for which the power curve is viable
Additionally, as we are to know about the site air density, we need to know about the site:
- atmospheric pressure
- temperature
- humidity
to be able to calculate the air density.
Accordingly, we need to add concepts of:
Barometer
to have data on atmospheric pressureThermometer
to have data on temperatureHygrometer
to have data on humidity
Here one can see an example of what seemed a relatively small update to our data model with new knowledge from the domain expert. In reality, adding new knowledge resulted in a bit involved process of data model update. Therefore, be focused, take your time, listen, and ask questions. Data modeling is never a "one-time effort", but a continuous process that lasts as long as our knowledge about our domain evolves.
The video below shows how the template has been updated with the above information:
Expanding and fine-tuning conceptual data model¶
Up until this point, we have formed our conceptual data model, and based our concepts on CDM concepts.
At this stage, we will now "expand" the conceptual data model with properties that are inherited from CDM concepts through their implementation. NEAT
has a special method to perform this action which is accessible via neat.template.expand()
.
In addition to expanding the list of properties with those originating from the implemented concepts, NEAT
is adding automatically the property <nameOfConcept>GUID
to every concept the user defines. By adding a specific property to a user-defined concept, one can skip adding filters in the physical data model to ensure the consumption of data through user-defined concepts.
Furthermore, due to the current data modeling UI limitations in CDF, NEAT
will add all concepts from the Core Data Model.
Now with the expanded list of properties, we will do the following:
- update
parent
androot
for each concept that implementsCogniteAsset
to beWindFarm
- update the
asset
property ofSensor
to point toMetMast
- update the
equipment
property ofMetMast
to point toSensor
- remove the
measurement
property fromSensor
,WindVane
, andAnemometer
, sinceCogniteEqupiment
already has the propertytimeSeries
which points toCogniteTimeSeies
. Therefore, no need to have a property holding the same information - remove the
sensor
property fromMetMast
as there is already propertyequipment
which we will update to point toSensor
instead ofCogniteEquipmet
- remove dummy property
<nameOfConcept>GUID
from concepts that already have custom properties, which in our case are all concepts exceptWindVane
,Thermometer
,Hygrometer
, andBarometer
, as for this concept we did not add any custom property.
NEAT automatically adds
<nameOfConcept>GUID
to every concept the user defines. By adding specific properties to user-defined concepts one can skip adding filters to ensure consumption of data through user-defined concepts.
To expand the data model we call the method neat.template.expand()
and pass the file name of our filled conceptual data model:
neat.template.expand("wind_farm_prospecting_conceptual_data_model.xlsx")
Success: Created extension template
The video below shows process of fine tuning conceptual data model
The resulting Excel file can be downloaded using this link.
Converting conceptual to physical data model¶
The conceptual data model cannot be directly published to CDF. Therefore, we need to convert it to the physical data model form which can be published to CDF.
So we will do the following:
- read conceptual data model to
NeatSession
vianeat.read.excel("<your filename>")
- check the content of the session by calling
neat
- visualize data model by calling
neat.show.data_model()
- visualize all the implements by calling
neat.show.data_model.implements()
- convert conceptual to physical data model via the command
neat.convert()
- export physical data model via the command
neat.to.excel("<filename>")
neat = NeatSession(client)
For Neat to improve, we need to collect usage information. You acknowledge and agree that neat may collect usage information.To remove this message run 'neat.opt.in_() or to stop collecting usage information run 'neat.opt.out()'. Neat Engine 2.0.5 loaded.
neat.read.excel("wind_farm_prospecting_conceptual_data_model_expanded.xlsx")
Success: Read NEAT(verified,logical,wind_energy,WindFarmProspecting,v1)
neat
Data Model
type | Logical Data Model |
---|---|
intended for | Information Architect |
name | Wind Farm Prospecting Data Model |
external_id | WindFarmProspecting |
version | v1 |
classes | 44 |
properties | 252 |
neat.show.data_model()
http_purl.org_cognite_neat_data-model_verified_logical_wind_energy_WindFarmProspecting_v1.html
neat.show.data_model.implements()
http_purl.org_cognite_neat_data-model_verified_logical_wind_energy_WindFarmProspecting_v1_implements.html
neat.convert()
Rules converted to dms.
Success: NEAT(verified,logical,wind_energy,WindFarmProspecting,v1) → NEAT(verified,physical,wind_energy,WindFarmProspecting,v1)
neat
Data Model
aspect | physical |
---|---|
intended for | DMS Architect |
name | Wind Farm Prospecting Data Model |
space | wind_energy |
external_id | WindFarmProspecting |
version | v1 |
views | 44 |
containers | 17 |
properties | 252 |
neat.to.excel("wind_farm_prospecting_physical_data_model.xlsx")
Finetuning physical data model prior to publishing¶
Even though NEAT tries to make the best possible conversion from conceptual to the physical data model, there is always room for improvements in both the conversion process but also post-conversion optimization of the produced physical data model.
In this specific context, in the Properties
sheet we will make the following modifications:
- change
edge
connection type for propertieswindTurbine
,powerCurve
,bbox
todirect
as we do not have a need to useedge
as connection type (we are not adding properties to our connection, neither we expect a very large number of connections). - we will add units to
hubHeight
,ratedPower
,height
,boomDirection
,uncertainty
,measurementRange
,latitude
,longitude
,windSpeedBins
,powerBins
,cutInSpeed
,cutOutSpeed
,ratedSpeed
andairDensity
properties
An up-to-date list of supported units can be found at this page
The video below demonstrates how these changes are introduced in our physical data model:
The resulting Excel file can be downloaded using this link.
Read and publish finetuned physical data model¶
Now we will:
- read the fine-tuned physical data model
- publish it to CDF
- and view final result in CDF
neat.read.excel("wind_farm_prospecting_physical_data_model.xlsx", enable_manual_edit=True)
[WARNING] Experimental feature 'enable_manual_edit' is subject to change without notice
Succeeded with warnings: Read NEAT(verified,physical,wind_energy,WindFarmProspecting,v1)
count | |
---|---|
NeatIssue | |
PropertyNotFoundWarning | 54 |
Hint: Use the .inspect.issues() for more details.
Mind potential warnings, especially
PropertyNotFoundWarning
.
neat.to.cdf.data_model()
You can inspect the details with the .inspect.outcome.data_model(...) method.
name | created | |
---|---|---|
0 | spaces | 1 |
1 | containers | 13 |
2 | views | 14 |
3 | data_models | 1 |
4 | nodes | 0 |
The video below show resulting data model in CDF: