Hi I am trying to clean up data but having trouble reading json file as separate dataframe column. I have thousands of records like this in a file:
{"hotel_class": 4.0,
"region_id": 60763,
"url": "http://www.tripadvisor.com/Hotel_Review-g60763-d113317-Reviews Casablanca_Hotel_Times_Square-New_York_City_New_York.html",
"phone": "",
"details": null,
"address": {"region": "NY", "street-address": "147 West 43rd Street", "postal-code": "10036", "locality": "New York City"},
"type": "hotel",
"id": 113317,
"name": "Casablanca Hotel Times Square"}
i tried to load it as:
with open('offering.txt') as datafile:
data_json = json.load(datafile)
but it is giving an error i.e
JSONDecodeError: Extra data: line 2 column 1 (char 398)
so i tried doing it with
df=pd.read_json('offering.txt',lines=True)
but if i do it this way, my address column has nested values and i want to separate them in different columns. how to do it?
df['address']
0 {'region': 'NY', 'street-address': '147 West 4...
1 {'region': 'CA', 'street-address': '300 S Dohe...
2 {'region': 'NY', 'street-address': '790 Eighth...
3 {'region': 'NY', 'street-address': '152 West 5...
4 {'region': 'NY', 'street-address': '130 West 4...
Name: address, Length: 4333, dtype: object
Try:
df = pd.read_json("offering.txt", lines=True)
df_out = pd.concat([df, df.pop("address").apply(pd.Series)], axis=1)
print(df_out)
Prints:
hotel_class region_id url phone details type id name region street-address postal-code locality
0 4 60763 http://www.tripadvisor.com/Hotel_Review-g60763-d113317-Reviews Casablanca_Hotel_Times_Square-New_York_City_New_York.html NaN hotel 113317 Casablanca Hotel Times Square NY 147 West 43rd Street 10036 New York City
1 5 60763 http://www.tripadvisor.com/Hotel_Review-g60763-d113317-Reviews Casablanca_Hotel_Times_Square-New_York_City_New_York.html NaN hotel 113317 Casablanca Hotel Times Square CA 147 West 43rd Street 10036 New York City
...