Search code examples
pythonpandasdataframedata-sciencedata-analysis

Python - take out the data inside cell of dataframe to another cells


This is the data in single cell of dataframe with 14 columns. Cell is the element of column. There are 45k+ this kind of cells, to do it manually is a hell.

one cell data

I'd like to do with this cell 3 things:

  1. move text part with address, state, zip - to another column;
  2. delete the hooks () of cell;
  3. separate for 2 columns longitude and latitude.

How it's possible to do?


Solution

  • Here's a simple, working example with 2 data points:

    text1 = """30881 EKLUTNA LAKE RD
    CHUGIAK, AK 99567
    (61.4478, -149.3136)"""
    
    text2 = """30882 FAKE STR
    CHUGIAK, AK 98817
    (43.4478, -119.3136)"""
    
    d = {'col1': [text1, text2]}
    
    df = pd.DataFrame(data=d)
    
    def fix(row):
      #We split the text by newline
      address, cp, latlong =  row.col1.split('\n')
    
      #We get the latitude and longitude by splitting by a comma
      latlong_vec = latlong[1:-1].split(',')
    
      #This part isn't really necessary but we create the variables for claity
      lat = float(latlong_vec[0])
      long = float(latlong_vec[1])
    
      return pd.Series([address + ". " + cp, lat, long])
    
    
    df[['full address', 'lat', 'long']] = df.apply(fix, axis = 1)
    

    Output of the 3 new columns:

    df['full address']
    0    30881 EKLUTNA LAKE RD. CHUGIAK, AK 99567
    1           30882 FAKE STR. CHUGIAK, AK 98817
    
    df['lat']
    
    0    61.4478
    1    43.4478
    Name: lat, dtype: float64
    
    df['long']
    
    0   -149.3136
    1   -119.3136
    Name: long, dtype: float64
    

    Name: full address, dtype: object