Search code examples
pythonjsonpandasflatten

pipdeptree json result to csv


The json result of pipdeptree has form:

[
    {
        "package": {
            "key": "ansible-base",
            "package_name": "ansible-base",
            "installed_version": "2.10.8",
            "required_version": "==2.10.8"
        },
        "dependencies": []
    },
    {
        "package": {
            "key": "botocore",
            "package_name": "botocore",
            "installed_version": "1.27.10",
            "required_version": ">=1.12.36,<2.0a.0"
        },
        "dependencies": [
            {
                "key": "boto3",
                "package_name": "boto3",
                "installed_version": "1.24.10"
            },
            {
                "key": "s3transfer",
                "package_name": "s3transfer",
                "installed_version": "0.6.0"
            }
        ]
    }
    
]

I would like to save it as csv with columns: package.key,package.package_name,package.installed_version,package.required_version,dependencies

and in dependencies column to list the keys only (so for the second package, the value for dependencies should be boto3; s3transfer.

I've tried to simply flatten the json with pandas.normalize_json but it doesn't handle the dependencies array correctly. What is the easiest way to do this?


Solution

  • Try:

    import pandas as pd
    
    
    lst = [
        {
            "package": {
                "key": "ansible-base",
                "package_name": "ansible-base",
                "installed_version": "2.10.8",
                "required_version": "==2.10.8",
            },
            "dependencies": [],
        },
        {
            "package": {
                "key": "botocore",
                "package_name": "botocore",
                "installed_version": "1.27.10",
                "required_version": ">=1.12.36,<2.0a.0",
            },
            "dependencies": [
                {
                    "key": "boto3",
                    "package_name": "boto3",
                    "installed_version": "1.24.10",
                },
                {
                    "key": "s3transfer",
                    "package_name": "s3transfer",
                    "installed_version": "0.6.0",
                },
            ],
        },
    ]
    
    out = []
    for d in lst:
        out.append({})
        for k, v in d["package"].items():
            out[-1][f"package.{k}"] = v
        out[-1]["dependencies"] = "; ".join(v["key"] for v in d["dependencies"])
    
    df = pd.DataFrame(out)
    print(df)
    df.to_csv("data.csv", index=False)
    

    Prints:

        package.key package.package_name package.installed_version package.required_version       dependencies
    0  ansible-base         ansible-base                    2.10.8                 ==2.10.8                   
    1      botocore             botocore                   1.27.10        >=1.12.36,<2.0a.0  boto3; s3transfer
    

    and saves data.csv:

    package.key,package.package_name,package.installed_version,package.required_version,dependencies
    ansible-base,ansible-base,2.10.8,==2.10.8,
    botocore,botocore,1.27.10,">=1.12.36,<2.0a.0",boto3; s3transfer