Search code examples
continuous-integrationsinger-iomeltano

How to provide a relative start date for Meltano CI/CD pipelines on Singer


When we are running a Meltano build/test cycle (such as in a CI/CD pipeline), we want our Singer pipelines to run as follows:

  1. Don't use pre-captured state bookmarks that might permit a stream to entirely not have a meaningful run. (For instance, if there are zero records new, or not enough records new to trigger a representative test.)

  2. Don't require developers to constantly have to push forward a hardcoded start_date. (What starts out as a "fast" test of a month of data eventually becomes a much longer-running test covering multiple months.)

For any tap name tap-mysource, we should be able to set $TAP_MYSOURCE_START_DATE to provide a default start_date config value. What's a good way to provide a default relative start time for CI builds - for instance, a rolling 21 day window?

I think most use cases probably running on GitHub Actions but we also use GitLab CI.


Solution

  • As of now, there isn't an expression language to perform today()-n and provide relative start dates in that manner. However, you can initialize an environment variable with the relative date prior to execution, and Meltano can pass that a dynamic input to the tap by way of the naming convention <PLUGIN_NAME>_<SETTING_NAME>.

    Depending on your flavor of OS, this may need to be slightly adjusted:

    On Mac:

    N_DAYS=1
    TAP_MYSOURCE_START_DATE=$(date -v-1d "+%Y-%m-%d")
    echo "Using dynamic start date of today-$N_DAYS: $TAP_MYSOURCE_START_DATE"
    meltano elt tap-mysource target-mydest
    

    On Ubuntu:

    N_DAYS=1
    TAP_MYSOURCE_START_DATE=$(date +%Y-%m-%d -d "$N_DAYS day ago")
    echo "Using dynamic start date of today-$N_DAYS: $TAP_MYSOURCE_START_DATE"
    meltano elt tap-mysource target-mydest
    

    Ref: