Search code examples
kdb

How to convert nested json to data frame with kdb+


I am trying to get the data from cryptostats like below, it gives me back a nested json. I want it to be in a table format. How do I do that?

query:"https://api.cryptostats.community/api/v1/fees/oneDayTotalFees/2023-02-07";
raw:.Q.hg query;
res:.j.k raw;

To get json file, use https://api.cryptostats.community/api/v1/fees/oneDayTotalFees/2023-02-07 To view json code into a table format, use https://jsongrid.com/json-grid Final result would be a kdb+ table which has all the cols from nested json output


Solution

  • They are all dictionaries

    q)distinct type each res[`data]
    ,99h   
    

    But they do not collapse to a table because they do not all have matching keys

    q)distinct key each res[`data]
    `id`bundle`results`metadata`errors
    `id`bundle`results`metadata
    

    Looking at a row where errors is populated we can see it is a dictionary

    q)res[`data;0;`errors]
    oneDayTotalFees| "Error executing oneDayTotalFees on compound: Date incomplete"
    

    You can create a prototype dictionary with a blank errors key in it and join , each piece of data onto it. This will result in uniform dictionaries which will be promoted to a table type 98h

    q)table:(enlist[`errors]!enlist (`$())!()),/:res`data
    q)type table
    98h
    

    Row which already had errors is unaffected:

    q)table 0
    errors  | (,`oneDayTotalFees)!,"Error executing oneDayTotalFees..
    id      | "compound"
    bundle  | 0n
    results | (,`oneDayTotalFees)!,0n
    metadata| `source`icon`name`category`description`feeDescription;..
    

    Row which previously did not have errors now has a valid empty dictionary

    q)table 1
    errors  | (`symbol$())!()
    id      | "swapr-ethereum"
    bundle  | "swapr"
    results | (,`oneDayTotalFees)!,24.78725
    metadata| `category`name`icon`bundle`blockchain`description`feeDescription..
    

    If you want to explore nested objects you can index at depth (see blog post linked above). If you have many sparse keys leaving it like this is efficient for storage:

    q)select tokenSymbol:metadata[::;`tokenSymbol] from table where not ""~/:metadata[::;`tokenSymbol]
    tokenSymbol
    -----------
    "HNY"
    

    If you do wish to explode a nested field you can run similar to:

    q)table:table,'{flip c!flip table[`metadata]@\:(c:distinct raze key each table[`metadata])}[]
    q)meta table
    c               | t f a
    ----------------| -----
    errors          |
    id              | C
    bundle          | C
    results         |
    metadata        |
    source          | C
    icon            | C
    name            | C
    category        | C
    description     | C
    feeDescription  | C
    blockchain      | C
    website         | C
    tokenTicker     | C
    tokenCoingecko  | C
    protocolLaunch  | C
    tokenLaunch     | C
    adapter         | C
    subtitle        | C
    events          | C
    shortName       | C
    protocolShutdown| C
    tokenSymbol     | C
    subcategory     | C
    tokenticker     | C
    tokencoingecko  | C
    

    Care needs to be taken will filling in nulls and keeping consistent types of data in each column. In this dataset the events tag inside metadata is tabular data:

    q)select distinct type each events from table
    events
    ------
    10
    98
    0
    

    This would need to be cleaned similar to:

    q)table:update events:count[i]#enlist ([] date:();description:()) from table where not 98h=type each events