Search code examples
kedro

Data versioning of "Hello_World" tutorial


i have added "versioned: true" in the "catalog.yml" file of the "hello_world" tutorial.

example_iris_data:
  type: pandas.CSVDataSet
  filepath: data/01_raw/iris.csv
  versioned: true

Then when I used "kedro run" to run the tutorial, it has error as below: "VersionNotFoundError: Did not find any versions for CSVDataSet".

May i know what is the right way for me to do versioning for the "iris.csv" file? thanks!


Solution

  • Try versioning one of the downstream outputs. For example, add this entry in your catalog.yml, and run kedro run

    example_train_x:
      type: pandas.CSVDataSet
      filepath: data/02_intermediate/example_iris_data.csv
      versioned: true
    

    And you will see example_iris.data.csv directory (not a file) under data/02_intermediate. The reason example_iris_data gives you an error is that it's the starting data and there's already iris.csv in data/01_raw so, Kedro cannot create data/01_raw/iris.csv/ directory because of the name conflict with the existing iris.csv file.

    Hope this helps :)