In the below code, I'm using simple data manipulation to split the columns and remove unnecessary characters.
input_uni_towns = pd.read_fwf("university_towns.txt", sep = " ", header = None)
uni_towns = input_uni_towns.rename(columns={0: "Raw_data"})
uni_towns['Cleaned'] = uni_towns["Raw_data"].replace(regex=True,to_replace=[r'\[[^()]*\]'],value=r'')
uni_towns[["State","University"]] = uni_towns.Cleaned.str.split("(",n=1,expand=True)
uni_towns["University"] = uni_towns["University"].str.rstrip(')')
cleaned_uni_towns = uni_towns[["State","University"]]
After this above step, I want to assign State to records whose above record has None assigned to it. For Ex: Auburn (Auburn University) current State is Auburn, but I want this to be updated to Alabama and similarly for records below Alabama till code encounters next State i.e. Alaska
This is the expected output
You can keep put NaN
in State when University is not None
(i.e. that's a row with a university), then fill the NaN
in State with the most recent non-NaN
value.
df = pd.DataFrame({'Cleaned': ['Alabama', 'Auburn (Auburn University)', 'Alaska']})
df[['State', 'University']] = df.Cleaned.str.split('(', n = 1, expand = True)
df.University = df.University.str.rstrip(')')
df.State = np.where(df.University.map(lambda u: u is None), df.State, np.nan)
df.State = df.State.fillna(method = 'ffill')