Search code examples
javascriptnode.jsexpressdata-analysis

Detecting change in raw data


I am currently building a web application that acts as a storage tank level dashboard. It parses incoming data from a number of sensors in tanks and stores these values in a database. The application is built using express / node.js. The data is sampled every 5 minutes but is sent to the server every hour (12 samples per transmission).

I am currently trying to expand the application's capabilities to detect changes in the tank level due to filling or emptying. The end goal is to have a daily report that generates a summary of the filling / emptying events with the duration of time and quantity added or removed. This image shows a screenshot of tank capacity during one day - https://i.sstatic.net/RKlIC.jpg.

My questions are:

  1. What algorithms / functions are available that detects the changes in tank level? How would I implement them into my application?
  2. When should the data handling take place? As the data is parsed and saved into the server? At the end of the day with a function that goes through all the data for that day?
  3. Is it worth considering some sort of data cleaning during the parsing stage? I have noticed times when there are random spikes in the data due to noise.
  4. How should I handle events when they immediately start emptying the tank immediately after completing a delivery? I will need the algorithm to be robust enough that it detects a change in the direction of the slope to be the end of an event. Example of this is in the provided image.

I realise that it may difficult to put together a robust solution. There are times when the tank is being emptied at the same time that it is being filled. This makes it difficult to measure these reductions. The only was to know that this took place is the slope of during the delivery flatlines for approximately 15 minutes and the delivery is a fixed amount less than the usual delivery total.

This has been a fun project to put together. Thanks for any assistance.


Solution

    1. You should be able to develop an algorithm that specifies what you mean by a fill or en emptying (a change in tank level). A good place to start is X% in Y seconds. You then calibrate to avoid false positives or false negatives (e.g. showing a fill when there was none vs. missing a fill when it occurs. One potential approach is to average the fuel level over a period of time (say 10 minutes) and compare it with the average for the next 10 minutes. If there is a difference above a threshold (say 5%), you can call this a change.

    2. When you process the data depends on when you need it, so if the users need to be constantly informed of changes, this could be done on querying of the data. Processing the data into changes in level on write to your datastore might be more efficient (you only do it once), however you lose the ability to tweak your algorithm. It could well depend on performance, e.g. if someone wants to pull a years worth of data, is the system able to deal with this?

    3. You will almost certainly need to do something like a low pass filter on the incoming data. You don't want to show a tank fill based on a temporary spike in level. This is easy to do with an array of values. As mentioned above, a moving average, say of the last 10 minutes of levels is another way of smoothing the data. You may never get a 0% false positive rate or a 0% false negative rate, you can only aim for values as low as possible.

    4. In this case it looks like a fill followed by an emptying of the tank. If you consider these to be two separate events then you can simply detect changes on the incoming data. I'd suggest you create a graph marking fills as a symbol on the graph as well as emptying. This way you can eyeball the data to ensure you are detecting changes. I would also say you could add some very useful unit tests for your calculations using perhaps jasmin.js or cucumber.js.