Search code examples
pythonjsonpandasjsonlines

How to turn a dataframe to jsonl with similar index for every line?


I have dataframe that has "input" as the index of all of the rows. Thousands of them.

  df1 = 
  index   item name   item quantity
  input       apple         4
  input       orange        3
  input       lemon         6  

I need to turn it to a jsonl file looking like this. I need it looking like this because of the requirement in Shopify GraphQL Admin API Bulk import.

{ "input": { "item name": "apple", "item quantity": "4"}}
{ "input": { "item name": "orange", "item quantity": "3"}}
{ "input": { "item name": "lemon", "item quantity": "6"}}

I can use df1.to_json(orient='records', lines=True) to convert it into jsonl without the index to look like this

{ "item name": "apple", "item quantity": "4"}}
{ "item name": "orange", "item quantity": "3"}}
{ "item name": "lemon", "item quantity": "6"}}

But i will need to add the key "input" to the front which i dont know how to do it. I am new to json.

I also tried df1.to_json(orient="index") but it gives me an error ValueError: DataFrame index must be unique for orient='index' indicating that every index must be unique.

Any help is appreciated. Cheers


Solution

  • Your required output is:

    { "input": { "item name": "apple", "item quantity": "4"}}
    { "input": { "item name": "orange", "item quantity": "3"}}
    { "input": { "item name": "lemon", "item quantity": "6"}}
    

    orient='index' won't work in this case because index are not unique. The above answer looks great. Your expected output is seperate JSONs.

    This is a workaround, According to your required format incase if you need the same format itself

    import pandas as pd
    import json
    data=[['apple',4],['orange','3'],['lemon',6]]
    df=pd.DataFrame(data,columns=['item name','item quantity'])
    json_as_str=df.to_json(orient="index")
    json_value=json.loads(json_as_str)
    string_formatted=[]
    for key,val in json_value.items():
        string_formatted.append("{'input':%s}" %val)
    with open("file_name_here.jsonl","a") as fh:
        for i in string_formatted:
            i=i.replace("'",'"')
            fh.write(f"{i}\n")
    

    Output:

    {"input":{"item name": "apple", "item quantity": 4}}
    {"input":{"item name": "orange", "item quantity": 3}}
    {"input":{"item name": "lemon", "item quantity": 6}}