Search code examples
pythondatasethdx

How do I create a dataset with a file resource on the Humanitarian Data Exchange (HDX) using the HDX Python API library?


I would like to create a dataset the Humanitarian Data Exchange (HDX) with a single resource, a csv file. I would like to use the HDX Python API. I have looked at the documentation but need a more complete example of how to do it. How can I create the dataset?


Solution

  • The code below creates a dataset and resource on HDX. It generates a test file to upload and adds it to the resource before creating the dataset or updating any existing dataset with the same name. To run the script requires that you have created a user on HDX.

    To run the code below, you need to supply the command line argument hdx_token which is your API token. You need to generate an API token for your user by going to https://data.humdata.org/user/xxx/api-tokens replacing xxx with your user name. You can also supply hdx_site which defaults to "stage". hdx_site can be "prod" to upload to the main production HDX site (https://data.humdata.org/). You can use "stage" or "feature" to test uploading to one of our test servers (https://stage.data-humdata-org.ahconu.org/ or https://feature.data-humdata-org.ahconu.org/).

    One thing to keep in mind if using the test servers is that they are refreshed from production periodically so newly created users, organisations, API tokens or datasets on production may not yet exist on the test servers.

    In the code below wherever you see "My Org", it should be replaced with your organisation name.

    You need to get the id associated with your user using: https://data.humdata.org/api/3/action/user_show?id=xxx where xxx is your user name and use that as the parameter passed to the dataset.set_maintainer call.

    You also need the id of the organisation to which you wish to upload: https://data.humdata.org/api/3/action/organization_show?id=xxx where xxx is the organisation name. To be able to upload to the organisation requires that you be an editor or admin in that organisation. That id should be used as the parameter passed to the dataset.set_organization call.

    #!/usr/bin/python
    """
    Creates a dataset on HDX.
    
    """
    import argparse
    import csv
    import logging
    from os.path import join
    
    from hdx.data.dataset import Dataset
    from hdx.data.resource import Resource
    from hdx.facades.simple import facade
    from hdx.utilities.dateparse import parse_date
    from hdx.utilities.path import get_temp_dir
    
    logger = logging.getLogger(__name__)
    
    
    def main():
        """Generate dataset and create it in HDX"""
        dataset = Dataset(
            {
                "name": "my-test",
                "title": "My Test Dataset",
                "license_id": "cc-by-igo",
                "methodology": "Other",
                "private": False,
                "dataset_source": "My Org"
            }
        )
        dataset["notes"] = "Long description of dataset goes here!"
        dataset["methodology_other"] = "Describe methodology here!"
        dataset["caveats"] = "Any caveats or comments about the data go here!"
        dataset.set_maintainer(
            "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
        )  # user id
        dataset.set_organization(
            "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
        ) # organisation id
        dataset.set_expected_update_frequency("Every year")
        dataset.set_subnational(False)
        dataset.add_tags(["displacement"])
        dataset.set_reference_period(parse_date("2020-03-05"), parse_date("2021-02-25"))
    
        dataset.add_country_location("AFG")
        # or
        dataset.add_country_locations(["AFG"])
        # or
        dataset.add_other_location("world")
    
        logger.info("Dataset metadata created!")
    
        path = join(get_temp_dir(), "test.csv")
        with open(path, "w", encoding="UTF8") as f:
            writer = csv.writer(f)
    
            # write the header
            writer.writerow(["heading1", "heading2", "heading3", "heading4"])
    
            # write the data
            writer.writerow([1, 2, 3, 4])
            writer.writerow([5, 6, 7, 8])
    
        logger.info(f"Test file {path} created!")
    
        resource = Resource(
            {"name": "test file", "description": "description of test file"}
        )
        resource.set_file_type("csv")
        resource.set_file_to_upload(path)
    
        logger.info("Resource metadata created!")
    
        dataset.add_update_resource(resource)
        dataset.create_in_hdx(
            remove_additional_resources=True,
            updated_by_script="My Org Script",
        )
        logger.info("Completed!")
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser(description="My Script")
        parser.add_argument("-ht", "--hdx_token", default=None, help="HDX api token")
        parser.add_argument("-hs", "--hdx_site", default=None, help="HDX site to use")
        args = parser.parse_args()
        hdx_site = args.hdx_site
        if hdx_site is None:
            hdx_site = "stage"
        facade(
            main,
            hdx_key=args.hdx_token,
            hdx_site=hdx_site,
            user_agent="My Org",
        )