Search code examples
dbt

Use dbt to load JSON files into staging


Is it possible to use dbt to copy data from files into the database?

I've got some files as data sources and I would like to load them into an staging layer before starting to process them.

Something such as this command in postgresql:

COPY staging.test (json_content) from 'C:\tmp\46002.json' 

But as far as I see in the documentation, the models require a SELECT statement and the COPY statement fails when running dbt run


Solution

  • From raw local files? No.

    Your two dbt direct data options are:

    1. Load some size limited datasets via dbt seeds which only supports csv's currently.
    2. load data from cloud hosted storage like s3 buckets via external-tables.

    This is the best resource to explain why this application doesn't attempt to support the EL part of the ELT (Extract-Load-Transformation) process:

    What is dbt - dbtLabs Blog

    Without getting into tool recommendation, you'll need an additional component to your data stack to handle the extraction process for you. dbt presumes the data has been loaded into a "pre-processing" or "raw" area of your warehouse generally following Lewis Gavin's excellent architecture here:

    How to architect the perfect Data Warehouse - Lewis Gavin


    Edit: Additional notes on the seed data - currently comma separated is the only format supported. Some discussion in this issue about extending that to other delimiters and escape formats.

    No current support for seeding json files. Also under discussion.