From conceptual to physical data model by leveraging and extending core data model concepts¶

Prerequisite:

Basic understanding of Data Modeling in CDF
Basic understanding of Core Data Model
Access to a CDF Project.
Know how to install and setup Python.
Launch a Python notebook.

In this tutorial, you will learn to model data based on an established and industry accepted approach which fuses the following two techniques:

expert elicitation: by putting domain expert in focus, who has knowledge about the domain, and who has business questions to answer aided by data to be modeled
progressive disclosure: by incrementally increasing complexity of our data model, starting with conceptual data model (defining concepts) , and progressively increasing the model fidelity, making sure that a Core Data Model (CDM) concepts are leveraged, until ready to be converted to physical data model (aka CDF data model)

The data modeling flow depicting the approach is shown below through different roles:

In this tutorial will model Wind Energy domain, for company Wind of Change, starting by defining minimal model that can support domain expert working in Wind Farm Prospecting business unit of Wind Of Change, trying to answer "Is this site viable for a profitable wind project?" business question.

Summary

domain: Wind Energy
company: Wind of Change
business unit: Wind Farm Prospecting
business question: Is a site the right one to develop a wind farm project?

NEAT¶

We will use NEAT to create the data model based on the above requirements.

Interaction with NEAT is done through so-called NeatSession. NeatSession is typically instantiated with Cognite client which allows us to connect to CDF and read and write data models and instances. Therefore, we will import NeatSession and a convenience method get_cognite_client:

In [1]:

Copied!

from cognite.neat import NeatSession, get_cognite_client
from cognite.neat import NeatSession, get_cognite_client

if you do not have .env file stored locally call get_cognite_client() first to create one:

In [2]:

Copied!

client = get_cognite_client(".env")
client = get_cognite_client(".env")

Found .env file in repository root. Loaded variables from .env file.

In [3]:

Copied!

neat = NeatSession(client)
neat = NeatSession(client)

For Neat to improve, we need to collect usage information. You acknowledge and agree that neat may collect usage information.To remove this message run 'neat.opt.in_() or to stop collecting usage information run 'neat.opt.out()'.
Neat Engine 2.0.5 loaded.

Expert elicitation¶

In the introduction of this tutorial, we have established the business question of interest.

Usually, a data modeling process is unknown to domain experts, especially platform-specific details (e.g., in the case of CDF that are details such as views, containers, etc.). However, domain experts understand what concepts they need information about to answer the business question. Therefore, it is crucial to interview them to extract these concepts (a process known as expert elicitation). The extracted concepts, and connections between them, form the base for what is known as Conceptual Data Model.

Let's assume that we asked our domain expert how he/she is able to answer the question:

Is a site the right one to develop a wind farm project?

We got an answer that he/she typically use a simple rule of thumb expressed with the following formula:

7.5 * Annual Energy Production * Electricity Price >= Total Costs

So in short 7.5 years since the wind farm starts operating, amount of electricity that is sold should cover all the costs.

To satisfy the above formula, the domain expert highlights a need to know the following data:

location and area of the site
wind climate for the site
wind turbine characteristic
costs
expected electricity price

With a set of more detailed questions related, we would be able to outline the following concepts:

Wind farm: A group of wind turbines installed in a specific area to generate electricity from wind.
Wind turbine: A machine that converts wind energy into electrical power.
Meteorological mast: A tall tower equipped with sensors to measure wind and weather conditions at potential wind farm sites.
Anemometer: A device mounted on a mast to measure wind speed.
Wind vane: A sensor that shows the direction of the wind.
Site: A specific location being evaluated for the feasibility of building a wind farm.
Cost: The total expense involved in developing, building, and operating a wind farm.
Electricity price: The market rate at which the generated electricity can be sold.

At this stage, we can start forming the conceptual data model, for which NEAT offers an Excel template that we can leverage. Below is the command that will generate the template:

In [24]:

Copied!

neat.template.conceptual_model("wind_farm_prospecting_conceptual_data_model.xlsx")
neat.template.conceptual_model("wind_farm_prospecting_conceptual_data_model.xlsx")

Capturing expert's knowledge in conceptual data model¶

Let's now fill in the template.

As we are adding concepts (which in the current template are added in the Classes sheet, and are known as classes), we should try to select appropriate CDM concepts to build them off. To simplify the selection of CDM concepts that we want to base our concepts on, we will use the following rule of thumb(s):

Large infrastructures, physical objects that can be represented as a hierarchy of concepts (assets), and/or objects that can be seen as "boxes" to put other objects in / be part of we will base of CogniteAsset
Physical objects that do not have a clear hierarchical structure, or are part of other physical objects we will base off CogniteEqupiment
Any other concept that does follow rules 1 and 2, and requires human-readable properties, we will base off CogniteDescribable

According to the above, we have based our concepts:

WindFarm of CogniteAsset: as this is a large infrastructure that will contain child assets WindTurbine and MetMast
WindTurbine of CogniteAsset: as it is the building block of WindFarm and enables further fine-graining to building components such as Blade, Tower, Nacelle, etc.
MetMast of CogniteAsset: as it is a part of WindFarm, and will be a "box" to "add" all sensors
Anemometerof CogniteEquipment: it is a sensor that attaches to a meteorological mast, not necessarily in a hierarchical form
WindVane of CogniteEquipment: the same reason as anemometer
Site of CogniteDescribable: it is not an asset or equipment, but it will hold information such as name and description.
Cost of CogniteDescribable: same reasoning as Site
ElectricityPrice of CogniteDescribable: same reasoning as Site

But let's make one small improvement. Since Anemometer and WindVane are both sensors, we can create a more generic concept Sensor, which is based on CogniteEquipment, and update the model such that Anemometer and WindVane implement Sensor instead of CogniteEquipment. Why are we doing this? This will allow us to define common properties of Anemometer and WindVane for Sensor, and only define custom properties for Anemometer and WindVane.

The video below shows the process of filling in the template with our concepts (only rendered at thisisneat.io).

Adding properties to concepts¶

In this second part of the expert elicitation process, we are going into more detail about our concepts with our domain expert. We will add properties to concepts. These being:

properties that hold data (also known as attributes)
and properties that connect concepts (also known as connections or relationships)

Specifically, we have added the following properties:

WindFarm
- windTurbine to hold connection to all WindTurbine(s) which make WindFarm

WindTurbine:
- powerCurve to hold power curve measurements of the WindTurbine
- hubHeight to store the hub height of the WindTurbine
- ratedPower to store the rated power of the WindTurbine
- manufacturer to store manufacturer of the WindTurbine

MetMast:
- iecCompliant to indicate if the MetMast is IEC compliant
- sensor to hold connection to Sensor(s) which are attached to MetMast

Sensor:
- height to store the height at which the Sensor is mounted on the MetMast
- boomDirection to store the boom direction of the Sensor
- uncertainty to store uncertainty of the Sensor
- measurementRange to store the measurement range of the Sensor
- measurements to store measurements (i.e. timeseries)

Anemometer:
- numberOfCups to store how many cups the Anemometer has

ElectricityPrice:
- upperPrice to store the upper price of the electricity
- lowerPrice to store the lower price of the electricity

Cost:
- opex to store operational expenditure
- capex to store capital expenditure

The video below shows the process of filling the template with properties (only rendered at thisisneat.io).

Adding `Location` concept¶

You might notice that we did not add any custom properties to Site yet.

As we need to express the geographical location of Site in the form of a bounding box, we need to create a helper concept Location to store information about:

latitude
longitude

We will create this new concept and add these properties to it. Similar to Site, we will base Location off CogniteDescribable (it does not hurt to have information about location expressed with name and description which are properties of CogniteDescribable).

Once we create this concept we will bbox (bounding box) property to Site and set its value type to be Location, with a minimum count of 3 (we need at least three geo locations to express bounding box), and max count of 100 (this is the provisional max number of locations to express bounding box, we do not need high resolution).

Also, we will add new property location to WindFarm and MetMast and set its value type to Location. In this way, we will be able to express the geographical location of wind turbines and met mast in a wind farm.

The video below shows the process described above:

Improving value type for `powerCurve` property¶

In a discussion with the domain expert, we learned that a wind turbine power curve is a graph that is used to represent the amount of power that a wind turbine can produce at different wind speeds. Typically there are multiple power curves specific to air density. The below image shows an idealized power curve of a wind turbine.

Idealised Wind Turbine Power Curve (origin WIKIMEDIA)

Initially, we set powerCurve property value type to be CogniteTimeSeries, which is wrong as we cannot represent a wind turbine power curve with timeseries. A classic CDF resource sequence would be more appropriate here. However, we do not have a corresponding representation in Core Data Model for the Cognite sequence. Therefore, similar to the case of Site, we will create a helper concept PowerCurve, which we will base on CogniteDescribable and add the following custom properties:

windSpeedBins: Discrete wind speed intervals used to categorize wind data.
powerBins: Corresponding power output values for each wind speed bin.
cutInSpeed: The minimum wind speed at which the turbine starts generating power.
cutOutSpeed: The wind speed at which the turbine shuts down to prevent damage.
ratedSpeed: The wind speed at which the turbine reaches its maximum (rated) power output.
airDensity: For which air density for which the power curve is viable

Additionally, as we are to know about the site air density, we need to know about the site:

atmospheric pressure
temperature
humidity

to be able to calculate the air density.

Accordingly, we need to add concepts of:

Barometer to have data on atmospheric pressure
Thermometer to have data on temperature
Hygrometer to have data on humidity

Here one can see an example of what seemed a relatively small update to our data model with new knowledge from the domain expert. In reality, adding new knowledge resulted in a bit involved process of data model update. Therefore, be focused, take your time, listen, and ask questions. Data modeling is never a "one-time effort", but a continuous process that lasts as long as our knowledge about our domain evolves.

The video below shows how the template has been updated with the above information:

Expanding and fine-tuning conceptual data model¶

Up until this point, we have formed our conceptual data model, and based our concepts on CDM concepts. At this stage, we will now "expand" the conceptual data model with properties that are inherited from CDM concepts through their implementation. NEAT has a special method to perform this action which is accessible via neat.template.expand().

In addition to expanding the list of properties with those originating from the implemented concepts, NEAT is adding automatically the property <nameOfConcept>GUID to every concept the user defines. By adding a specific property to a user-defined concept, one can skip adding filters in the physical data model to ensure the consumption of data through user-defined concepts.

Furthermore, due to the current data modeling UI limitations in CDF, NEAT will add all concepts from the Core Data Model.

Now with the expanded list of properties, we will do the following:

update parent and root for each concept that implements CogniteAsset to be WindFarm
update the asset property of Sensor to point to MetMast
update the equipment property of MetMast to point to Sensor
remove the measurement property from Sensor, WindVane, and Anemometer, since CogniteEqupiment already has the property timeSeries which points to CogniteTimeSeies. Therefore, no need to have a property holding the same information
remove the sensor property from MetMast as there is already property equipment which we will update to point to Sensor instead of CogniteEquipmet
remove dummy property <nameOfConcept>GUID from concepts that already have custom properties, which in our case are all concepts except WindVane, Thermometer, Hygrometer, and Barometer, as for this concept we did not add any custom property.

NEAT automatically adds <nameOfConcept>GUID to every concept the user defines. By adding specific properties to user-defined concepts one can skip adding filters to ensure consumption of data through user-defined concepts.

To expand the data model we call the method neat.template.expand() and pass the file name of our filled conceptual data model:

In [ ]:

Copied!

neat.template.expand("wind_farm_prospecting_conceptual_data_model.xlsx")
neat.template.expand("wind_farm_prospecting_conceptual_data_model.xlsx")

Out[ ]:

Success: Created extension template

The video below shows process of fine tuning conceptual data model

The resulting Excel file can be downloaded using this link.

Converting conceptual to physical data model¶

The conceptual data model cannot be directly published to CDF. Therefore, we need to convert it to the physical data model form which can be published to CDF.

So we will do the following:

read conceptual data model to NeatSession via neat.read.excel("<your filename>")
check the content of the session by calling neat
visualize data model by calling neat.show.data_model()
visualize all the implements by calling neat.show.data_model.implements()
convert conceptual to physical data model via the command neat.convert()
export physical data model via the command neat.to.excel("<filename>")

In [7]:

Copied!

neat = NeatSession(client)
neat = NeatSession(client)

For Neat to improve, we need to collect usage information. You acknowledge and agree that neat may collect usage information.To remove this message run 'neat.opt.in_() or to stop collecting usage information run 'neat.opt.out()'.
Neat Engine 2.0.5 loaded.

In [ ]:

Copied!

neat.read.excel("wind_farm_prospecting_conceptual_data_model_expanded.xlsx")
neat.read.excel("wind_farm_prospecting_conceptual_data_model_expanded.xlsx")

Out[ ]:

Success: Read NEAT(verified,conceptual,wind_energy,WindFarmProspecting,v1)

In [10]:

Copied!

neat
neat

Out[10]:

Data Model


level	conceptual
intended for	Domain Expert and/or Information Architect
name	Wind Farm Prospecting Data Model
external_id	WindFarmProspecting
version	v1
concepts	44
properties	252

In [11]:

Copied!

neat.show.data_model()
neat.show.data_model()

http_purl.org_cognite_neat_data-model_verified_conceptual_wind_energy_WindFarmProspecting_v1.html

Out[11]:

In [12]:

Copied!

neat.show.data_model.implements()
neat.show.data_model.implements()

http_purl.org_cognite_neat_data-model_verified_conceptual_wind_energy_WindFarmProspecting_v1_implements.html

Out[12]:

In [13]:

Copied!

neat.convert()
neat.convert()

Rules converted to dms.

Out[13]:

Success: NEAT(verified,conceptual,wind_energy,WindFarmProspecting,v1) → NEAT(verified,physical,wind_energy,WindFarmProspecting,v1)

In [14]:

Copied!

neat
neat

Out[14]:

Data Model


level	physical
intended for	Data Engineer
name	Wind Farm Prospecting Data Model
space	wind_energy
external_id	WindFarmProspecting
version	v1
views	44
containers	17
properties	252

In [15]:

Copied!

neat.to.excel("wind_farm_prospecting_physical_data_model.xlsx")
neat.to.excel("wind_farm_prospecting_physical_data_model.xlsx")

Finetuning physical data model prior to publishing¶

Even though NEAT tries to make the best possible conversion from conceptual to the physical data model, there is always room for improvements in both the conversion process but also post-conversion optimization of the produced physical data model.

In this specific context, in the Properties sheet we will make the following modifications:

change edge connection type for properties windTurbine ,powerCurve, bbox to direct as we do not have a need to use edge as connection type (we are not adding properties to our connection, neither we expect a very large number of connections).
we will add units to hubHeight,ratedPower,height,boomDirection,uncertainty,measurementRange,latitude,longitude,windSpeedBins,powerBins,cutInSpeed,cutOutSpeed,ratedSpeed and airDensityproperties

An up-to-date list of supported units can be found at this page

The video below demonstrates how these changes are introduced in our physical data model:

The resulting Excel file can be downloaded using this link.

Read and publish finetuned physical data model¶

Now we will:

read the fine-tuned physical data model
publish it to CDF
and view final result in CDF

In [ ]:

Copied!

neat.read.excel(".wind_farm_prospecting_physical_data_model.xlsx", enable_manual_edit=True)
neat.read.excel(".wind_farm_prospecting_physical_data_model.xlsx", enable_manual_edit=True)

[WARNING] Experimental feature 'enable_manual_edit' is subject to change without notice

Out[ ]:

Succeeded with warnings: Read NEAT(verified,physical,wind_energy,WindFarmProspecting,v1)

	count
NeatIssue
PropertyNotFoundWarning	54

Hint: Use the .inspect.issues() for more details.

Mind potential warnings, especially PropertyNotFoundWarning.

In [39]:

Copied!

neat.to.cdf.data_model()
neat.to.cdf.data_model()

You can inspect the details with the .inspect.outcome.data_model(...) method.

Out[39]:

	name	created
0	spaces	1
1	containers	13
2	views	14
3	data_models	1
4	nodes	0

The video below show resulting data model in CDF:

In [ ]:

From conceptual to physical data model by leveraging and extending core data model concepts¶

NEAT¶

Expert elicitation¶

Capturing expert's knowledge in conceptual data model¶

Adding properties to concepts¶

Adding Location concept¶

Improving value type for powerCurve property¶

Expanding and fine-tuning conceptual data model¶

Converting conceptual to physical data model¶

Data Model

Data Model

Finetuning physical data model prior to publishing¶

Read and publish finetuned physical data model¶

Adding `Location` concept¶

Improving value type for `powerCurve` property¶