Search code examples
fiware-cygnus

How to handle grouping rules


I'm trying to keep some order into my Cosmos space assigned. Currently I'm storing data as illustrated below:

.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL1_PhysicalTest/TEMPORAL1_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL2_PhysicalTest/TEMPORAL2_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL3_PhysicalTest/TEMPORAL3_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/TEMPORAL4_PhysicalTest/TEMPORAL4_PhysicalTest.txt

Where TEMPORAL1 represents my entities ids and PhysicalTest respective type. However, I will like to know the appropriated mechanism to store data based on below (hypothetical) structure:

.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL1_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL2_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL3_PhysicalTest.txt
.../webhdfs/v1/user/[ USERNAME ]/[ Fiware-Service ]/[ Fiware-ServicePath ]/physicaltests/TEMPORAL4_PhysicalTest.txt

I believe it could be addressed by grouping rules; no sure though.

If that's the case I have settled my grouping_rules.conf as below with no successful result, since I ended up with a structure as presented firstly:

{
    "grouping_rules": [
        {
            "id": 1,
            "fields": [
                "entityType"
            ],
            "regex": "PhysicalTest.*",
            "destination": "PhysicalTest",
            "fiware_service_path": "/[ Fiware-Service ]/physicaltests"
        }
    ]
}

Solution

  • Such a thing cannot be done. Cygnus stores the data al HDFS folders following this pattern (*):

    /user/<username>/<service>/<service-path>/<entity-id>_<entity-type>/<entity-id>_<entity-type>.txt
    

    The structure of the <entity-id>_<entity-type>/<entity-id>_<entity-type>.txt part cannot be changed, in the sense always the (notified or mapped -will be explained later-) entity ID and (notified or mapped -will be explained later-) entity type will be used for composing it. Please observe such a structure replicates the entity ID and type concatenation both in a subfolder and in a file. Why? Because Hadoop works with directories, not files. Thus, in order to allow for a single entity analysis, such a structured was designed in Cygnus.

    Being said that, the above structure can be changed by using Name Mappings, a feature that allows you modifying the entity ID and/or the entity type (among others). This is a very powerful feature since you could say, for instance, "all the entities of type car will see their IDs mapped to a single ID of my choice", what means that all entities will be stored in the same subdirectory/file:

    /user/<username>/<service>/<service-path>/<unique-entity-id>_<entity-type>/<unique-entity-id>_<entity-type>.txt
    

    This is the closest to what you need, I guess.

    And what about Grouping Rules you are mentioning? They were something previous to Name Mappings. They allowed us to modify the entire concatenation of entity ID and type (what we called the "destination"), nevertheless the explained structure was maintained as well:

    /user/<username>/<service>/<service-path>/<destination>/<destination>.txt
    

    Grouping Rules are deprecated in favour of Name Mappings.

    (*) Alternatively, you can avoid the <username> level if you configure service_as_namespace = true. This is useful if your FIWARE service matches a valid HDFS user:

    /user/<service>/<service-path>/<entity-id>_<entity-type>/<entity-id>_<entity-type>.txt