Search code examples
influxdbinfluxql

Modelling influxdb data into tags and fields


TL;DR
How to model data into fields vs tags incase you want to perform both group by and count(distinct())

So currently this is my influxdb data model:

api_requests (database)
   - requests_stats (measurement)
        - api_path (tag)
        - app_version (tag)
        - host (tag)
        - platform (tag) 

        - account_id (field)
        - user_id (field)
        - function_name (field)
        - network (field)
        - network_type (field)
        - time_to_execute (field)

So now I want to find out the number of distinct accounts (active accounts). So I can run the following query:

SELECT count(distinct("account_id")) AS "active_accounts"
FROM "api_requests"."autogen"."requests_stats"

This works fine as account id is a field.

Now suppose I want to perform a group by operation on account_id, for example to find the number of requests received per account:

SELECT count("function_name") AS "request_count" 
FROM "api_requests"."autogen"."requests_stats"
GROUP BY "account_id"

I cannot do this as group by is recommended on tags.

How would one manage this kind of scenerio?

One of the solution is to store the value in both field and value but that would be data redundancy.

The other and the most optimal way would be for count(distinct()) to work on tags. Is this possible? This was actually a feature request in their github repo.

Or can something be done about the data model to achieve the same?


Solution

  • Use tag for account_id. Instead of count query:

    SELECT count(distinct("account_id")) AS "active_accounts"
    FROM "api_requests"."autogen"."requests_stats"
    

    use query, which will calculate exact tag value cardinality:

    SHOW TAG VALUES EXACT CARDINALITY WITH KEY = "account_id"
    

    This will work only for your use case, because you don't want to use any additional (time, tag) filter in your distinct count query.