Search code examples
pythonpython-3.xpandassklearn-pandas

read_table in pandas, how to get input from text to a dataframe


Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Alaska[edit]
Fairbanks (University of Alaska Fairbanks)[2]
Arizona[edit]
Flagstaff (Northern Arizona University)[6]
Tempe (Arizona State University)
Tucson (University of Arizona)

This is my text, i need to create a data frame with 1 column for the state name, and another column for the town name, i know how to remove the university names. but how do i tell pandas that at every [edit] is a new state.

expected output dataframe

Alabama Auburn
Alabama Florence 
Alabama Jacksonville
Alaska  Fairbanks 
Arizona Flagstaff
Arizona Tempe
Arizona Tucson  

I am not sure if i can use read_table, if i can how? I did import everything into a dataframe but the state and the city are in the same column. Also i tried with a list, but the problem is still the same.

I need something that works like if there is a [edit] in the line then all the value after it and before the next [edit] line is the state of the lines in between


Solution

  • Maybe pandas can do it but you can do it easily.

    data = '''Alabama[edit]
    Auburn (Auburn University)[1]
    Florence (University of North Alabama)
    Jacksonville (Jacksonville State University)[2]
    Alaska[edit]
    Fairbanks (University of Alaska Fairbanks)[2]
    Arizona[edit]
    Flagstaff (Northern Arizona University)[6]
    Tempe (Arizona State University)
    Tucson (University of Arizona)'''
    
    # ---
    
    result = []
    
    state = None
    
    for line in data.split('\n'):
    
        if line.endswith('[edit]'):
            # remember new state
            state = line[:-6] # without `[edit]`
        else:
            # add state, city to result
            city, rest = line.split(' ', 1)
            result.append( [state, city] )
    
    # --- display ---
    
    for state, city in result:
        print(state, city)
    

    if you read from file then

    result = []
    
    state = None
    
    with open('your_file') as f:
        for line in f:
            line = line.strip() # remove '\n'
    
            if line.endswith('[edit]'):
                # remember new state
                state = line[:-6] # without `[edit]`
            else:
                # add state, city to result
                city, rest = line.split(' ', 1)
                result.append( [state, city] )
    
    # --- display ---
    
    for state, city in result:
        print(state, city)
    

    Now you can use result to create DataFrame.