Search code examples
pythonjsonobjecttransformdtype

split JSON/list of dictionaries in the column in dataframe to new rows in python


I am quite new to Python, I tried to find an answer but nothing I tried seems to be working. And the most of the answers are provided when the whole data in JSON format

Through PYODBC I use the following code to retrieve data

formula = """select  id, type, custbody_attachment_1 from transaction  """
lineitem = pd.read_sql_query(formula, cnxn)

It gives me something like the following

+-------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Internal_ID | Type | Formula_Text                                                                                                                                                                                           |
+-------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2895531     | Bill |                                                                                                                                                                                                        |
+-------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3492009     | Bill | [{"FL":"https://.app.netsuite.com/core/media/media.nl?id=someLinkToTheFile0","NM":"someFileName0"}]                                                                                                    |
+-------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3529162     | Bill | [{"FL":"5https://.app.netsuite.com/core/media/media.nl?id=someLinkToTheFile1","NM":"someFileName1"},{"FL":"https://.app.netsuite.com/core/media/media.nl?id=someLinkToTheFile2","NM":"someFileName2"}] |
+-------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I need the output like this. (There might be more than 2 links in the cell.)

+-------------+------+---------------------------------------------------------------------+---------------+
| Internal_ID | Type | FL                                                                  | NM            |
+-------------+------+---------------------------------------------------------------------+---------------+
| 2895531     | Bill |                                                                     |               |
+-------------+------+---------------------------------------------------------------------+---------------+
| 3492009     | Bill | https://.app.netsuite.com/core/media/media.nl?id=someLinkToTheFile0 | someFileName0 |
+-------------+------+---------------------------------------------------------------------+---------------+
| 3529162     | Bill | https://.app.netsuite.com/core/media/media.nl?id=someLinkToTheFile1 | someFileName1 |
+-------------+------+---------------------------------------------------------------------+---------------+
| 3529162     | Bill | https://.app.netsuite.com/core/media/media.nl?id=someLinkToTheFile2 | someFileName2 |
+-------------+------+---------------------------------------------------------------------+---------------+

I tried to play with JSON but there were one problem after another(because it seemed like JSON data to me). In the end I run

print(lineitem['custbody_attachment_1'])

and got the following in Python console

999    [{"FL":"https://4811553.app.netsuite.com/core/...
Name: custbody_attachment_1, Length: 1000, dtype: object

So, I have no idea how to transform this so I could create new rows


Solution

  • df = df.explode('Formula_Text')
    df = pd.concat([df.drop(['Formula_Text'], axis=1), df['Formula_Text'].apply(pd.Series)], axis=1)
    print(df)