Recommended approach to store multi-dimensional data (e.g. spectra) in InfluxDB

I am trying to incorporate the time series database with the laboratory real time monitoring equipment. For scalar data such as temperature the line protocol works well:

temperature,site=reactor temperature=20.0 1556892576842902000

For 1D (e.g., IR Spectrum) or higher dimensional data, I came up two approaches to write data.

Write each element of the spectrum as field set as shown below. This way I can query individual frequency and perform analysis or visualization using the existing software. However, each record will easily contain thousands of field sets due to the high resolution of the spectrometer. My concern is whether the line protocol is too chunky and the storage can get inefficient or not.

ir_spectrum,site=reactor w1=10.0,w2=11.2,w3=11.3,......,w4000=2665.2 1556892576842902000

Store the vector as a serialized string (e.g., JSON). This way I may need some plugins to adapt the data to the visualization tools such as Grafana. But the protocol will look cleaner. I am not sure whether the storage layout is better than the first approach or not.

ir_spectrum,site=reactor data="[10.0, 11.2, 11.3, ......, 2665.2]" 1556892576842902000

I wonder whether there is any recommended way to store the high dimensional data? Thanks!

Solution

The first approach is better from the performance and disk space usage PoV. InfluxDB stores each field in a separate column. If a column contains similar numeric values, then it may be compressed better compared to the column with JSON strings. This also improves query speed when selecting only a subset of fields or filtering on a subset of fields.

P.S. InfluxDB may need high amounts of RAM for big number of fields and big number of tag combinations (aka high cardinality). In this case there are alternative solutions, which support InfluxDB line protocol and require lower amounts of RAM for high cardinality time series. See, for example, VictoriaMetrics.