Search code examples
pythonjsonpython-3.xparsingtext

How to read a text file and parse it to JSON format


I have a text file with the following format.

Order|AA|BB|CC|DD
2|status1|Cdd.int|true|false
12|status2|Cdd.String|true|false
1|status3|Cdd.Float|false|true
  1. I would like to read this text file

  2. I would like to append also metadata with an empty value

  3. I am only interested in Order, AA, BB, and CC with sort by order and then parse it into JSON format as follows.

The expected output looks like the following.

{
 "fields": [
 {
    "metadata": {},
    "name" : "status3",
    "type" : "Float",
    "nullable" : false
},
{
    "metadata": {},
    "name" : "status1",
    "type" : "int",
    "nullable" : true
},
{
    "metadata": {},
    "name" : "status2",
    "type" : "String",
    "nullable" : true
}
],
'type':'struct'
}

Can anyone help with this?


Solution

  • Assuming we have the data stored in a txt file file.txt as follows:

    Order|AA|BB|CC|DD
    2|status1|Cdd.int|true|false
    12|status2|Cdd.String|true|false
    1|status3|Cdd.Float|false|true
    

    The following code does what you need (explained in the comments of the code itself):

    import pandas as pd
    import json
    
    #read csv
    df = pd.read_csv("file.txt", sep = "|")
    
    headers = {"AA": "name", 
               "BB": "type",
               "CC": "nullable"
    }
    
    #Drop columns which are not in headers dict
    df.drop([c for c in df.columns if c not in headers.keys()], inplace=True, axis=1)
    
    #Rename columns based on headers dict
    df.rename(columns = headers, inplace = True)
    
    #Format columns
    df["type"] = df["type"].str.split(".").str[1].str.lower()
    
    #Build your final dict
    output = {"fields": [], "type": "struct"}
    for n, row in df.iterrows():
        data_dict = {"metadata": {}}
        data_dict.update(row.to_dict())
        output["fields"].append(data_dict)
    
    #Save json
    with open("output.json", "w") as f:
        json.dump(output, f, indent = 4)
    

    The output json (output.json) is as follows:

    {
        "fields": [
            {
                "metadata": {},
                "name": "status1",
                "type": "int",
                "nullable": true
            },
            {
                "metadata": {},
                "name": "status2",
                "type": "string",
                "nullable": true
            },
            {
                "metadata": {},
                "name": "status3",
                "type": "float",
                "nullable": false
            }
        ],
        "type": "struct"
    }
    

    Hope it helps!