Search code examples
pythonpandasdataframevalueerror

Issue with value error when trying to split a string range of data


I have a dataframe parcels full of addresses which I have imported from an excel spreadsheet. I have imported three columns parcel_id,prop_locat,ZipCode, and HouseNum. I have scraped together (copied) some code from this question: Split (explode) range in dataframe into multiple rows but am receiving a ValueError: invalid literal for int() with base 10: ' ' error.

My code is below with a sample data set with what I am trying to achieve:

import pandas as pd

def split(s):
    ranges = (x.split("-") for x in s.split("-"))
    return [i for r in ranges for i in range(int(r[0]), int(r[-1]) + 1)]

xlsx = 'drive/MyDrive/Parcels.xlsx'
parcels = pd.read_excel(xlsx,usecols=['parcel_id','prop_locat','ZipCode','HouseNum'])
parcels['HouseNum'] = parcels['HouseNum'].astype(str)
parcels['Length'] = parcels['HouseNum'].apply(len)

parcels['full_length'] = parcels['prop_locat'].apply(len)
parcels['address'] = parcels.apply(lambda x: x['prop_locat'][x['Length']:x['full_length']], 1)

parcels.HouseNum = parcels.HouseNum.apply(split)
parcels.set_index(['parcel_id','ZipCode','address',]).Options.apply(pd.Series).stack().reset_index().drop('level_2',1)

print(parcels)

Sample data set:

    parcel_id       prop_locat      HouseNum   ZipCode  Length  full_length address 
0  xxxxxxxxxxxxxx  1234 W 500 S       1234      xxxxx       4           12   W 500 S  
1  xxxxxxxxxxxxxx  123-130 W 700 S    123-130   xxxxx       7           15   W 700 S

The goal is to be able to take the range of values for the address 123-130 and append them to the data frame with the added address. IE 123 W 700 S, 124 W 700 S, 125 W 700 S, n, 130 W 700 S.

Any pointing in the right direction would be greatly appreciated. Thank you!


Solution

  • The error occurs from one of your HouseNum being a blank space instead of what your function is expecting. With some slight changes, invalid cases can be accounted for:

    Given:

      house_num
    0      1234
    1       103
    2   123-130
    3           # a blank space ' ' as given in your error.
    

    Doing:

    def stuff(val):
        val = [int(x) for x in val.split('-') if x.isnumeric()]
        return [x for x in range(val[0], val[-1]+1)] if val else None
    
    df.house_num = df.house_num.apply(stuff)
    df = df.explode('house_num')
    print(df)
    

    Output:

      house_num
    0      1234
    1       103
    2       123
    2       124
    2       125
    2       126
    2       127
    2       128
    2       129
    2       130
    3      None