Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Alaska[edit]
Fairbanks (University of Alaska Fairbanks)[2]
Arizona[edit]
Flagstaff (Northern Arizona University)[6]
Tempe (Arizona State University)
Tucson (University of Arizona)
This is my text, i need to create a data frame with 1 column for the state name, and another column for the town name, i know how to remove the university names. but how do i tell pandas that at every [edit] is a new state.
expected output dataframe
Alabama Auburn
Alabama Florence
Alabama Jacksonville
Alaska Fairbanks
Arizona Flagstaff
Arizona Tempe
Arizona Tucson
I am not sure if i can use read_table, if i can how? I did import everything into a dataframe but the state and the city are in the same column. Also i tried with a list, but the problem is still the same.
I need something that works like if there is a [edit] in the line then all the value after it and before the next [edit] line is the state of the lines in between
Maybe pandas
can do it but you can do it easily.
data = '''Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Alaska[edit]
Fairbanks (University of Alaska Fairbanks)[2]
Arizona[edit]
Flagstaff (Northern Arizona University)[6]
Tempe (Arizona State University)
Tucson (University of Arizona)'''
# ---
result = []
state = None
for line in data.split('\n'):
if line.endswith('[edit]'):
# remember new state
state = line[:-6] # without `[edit]`
else:
# add state, city to result
city, rest = line.split(' ', 1)
result.append( [state, city] )
# --- display ---
for state, city in result:
print(state, city)
if you read from file then
result = []
state = None
with open('your_file') as f:
for line in f:
line = line.strip() # remove '\n'
if line.endswith('[edit]'):
# remember new state
state = line[:-6] # without `[edit]`
else:
# add state, city to result
city, rest = line.split(' ', 1)
result.append( [state, city] )
# --- display ---
for state, city in result:
print(state, city)
Now you can use result
to create DataFrame
.