Search code examples
pythonjupyter-notebookdata-cleaningdata-preprocessing

Get data from a Object datatype in python dataframe


I have a DataFrame with Coordinates saved in the following format {"type":"Point","coordinates":[25.2484759,55.3525189]}. It is saved as an object dtype. Please help me retrieving the coordinates from this column without iteration.

I am a beginner in coding ,but I do think that running a loop over this and splitting the data would be a unnecessary task.Hope you all could help me

This is what I thought float(trip_data["pickuplocation"][0][31:-13]),float(trip_data["pickuplocation"][0][-12:-3])

I want coordinates to be saved as an array. Sorry If I sound less technical.Please feel free to ask more details.


Solution

  • If you want to use dataframes with spatial data you should might at the GeoPandas package.

    To address the question, assuming the column is a string, you were quite close, you can get the coordinates without a loop using:

    coord1 = trip_data["pickuplocation"].str[31:-13].astype(float)
    coord2 = trip_data["pickuplocation"].str[-12:-3].astype(float)
    

    You need to tell pandas to treat this object as a string to use string indexing, and then you tell it the series are floats with the astype.

    Edit: A more reliable, if less safe (do not use this in production code because ast.literal_eval is not safe) approach might be to use ast though:

    import ast 
    
    coords = trip_data["pickuplocation"].apply(lambda x: ast.literal_eval(x)["coordinates"])
    
    

    You should then be able to index coords as a list.