database database-design architecture iot

Sensor network calculating features - design database

How would you model a system involving sensors

which have some spatial data like lat,lon,altitude,...
that produce raw measurements let's say distance, temperature,...
for the purpose of calculating "features" of the location they are at like the volume, temperature

Here's my idea for tables

Sensor
- id
- model
- type

SensorMetadata
- sensorId
- timestamp // for time series to account for changing
- lat
- long
- alt
- metadata: json // some dynamic changeable data based on the domain lets say relative distance to something...
- unit

Measurements
- id
- timestamp
- sensorId
- value

Feature
- id
- type // complex features like volume
- value
- timestamp
- // relation to location (maybe sensorMetadata)

1 would you model it to be specific for the domain if let's say tanks of fluids are in question or would you model it generic using "features" language?

2 how and when would you calculate "features" based on measurements? Clearly here I'm missing what sensors would I take into account to calculate the feature of the location (let's say they use multiple sensors in some cases)

Solution

regarding Sensor, don't be afraid to use something like SensorInstance - ambiguity is something to be avoided if you can, especially if you can do so with out making really long names. In the long run, a clear (unambiguious) database design is usually better than a concise one.

SensorMetadata - SensorState is another option.

Measurements - try to avoid plural names, singular (like Sensor) is usually better. This page has some good guidance: https://www.sqlshack.com/learn-sql-naming-conventions/

The big issue I can see though is Measurements.value - how do you interpret the value? What data type is it - i.e. what's the data type(s) your sensors are producing? Measurements relate to sensors - do sensors measurements always result in one value (as implied by your design)?

I'm not clear what the Feature table's purpose is.

For design challenges like this, have a look for existing reference architectures and designs - this outfit looks like they might have some useful whitepapers: https://crate.io/use-cases/iot-sensor-data/ This might also be useful reading for you but won't provide a direct answer itself: https://hackernoon.com/ingesting-iot-and-sensor-data-at-scale-ee548e0f8b78

Update 28-Jun-21

So Value column in Measurement would be interpreted based on SensorId it came from and since Sensor has its SensorMetadata which stores its Unit I would be able to interpret the Value if that's ok?

Should be fine.

For Feature - a couple of approaches:

Calculate it later - i.e. just gather raw data, and do bulk calculations later. For example you could ETL the data from the data capture system (which due to the table design is basically OLTP data model) to a reporting system (OLAP data model).
Calculate it now - i.e. as it's captured by the application feeding the database, or a trigger/logic in the database. For that you'll want some sort of reference between the feature table and the measurement and/or sensor table. That way you are able to test the logic that calculated the Feature.Value, because if you can't do that - can you trust your data? It would also allow you to calculate new values if you decided you wanted to change the calculation formula.

Personally I think #1 is more flexible, once you have the raw data you can retrospectively add any new feature you like, but that might not fit with how you need the system to work.

Option 2 is tricky. If Feature.Value is calculated off more than one Measurement.Value (and potentially more than one sensor) then it's also likely the number of measurements and sensors will vary across different features - so you'd need a many-to-many relationship between them (Features and Measurements).

Having a many-to-many relationship is fine: a joining table is common practice (in general terms), and fits into the OLTP model which is good for transactions i.e. data gathering. Flipping this into a more reporting friendly OLAP model will be more complicated though.

Lastly, regarding your question #2 - I think I have partially answered it indirectly. Basically it depends: how soon is the Feature data needed? If it's needed at runtime, to provide immediate information to people or systems then you'll want to calculate it as soon as possible. Maybe pout that as a second question, and provide information on the feature use case, solution context (who is using it, under what circumstances), etc.