Search code examples
pandaslistrowmultiple-columns

Using multiple lists how do we assign which list our row came from to a new column


I have several lists of english words. How do I make a column in a DataFrame that tells me which list each word came from. So in the future as more words are added from new lists I can keep track of what list a word came from?

list_1 = [['ant', 3] ['bat', 3] ['cat', 3]]

df = pd.DataFrame(list_1, columns = ['word', 'length'], dtype = str)

How would I add list_2 data to this dataframe and identify which lists the data came from under the source column?

list_2 = [['rose', 4] ['tulip', 5] ['lilac', 5] ['daisy', 5]]

Expected output:

   source   word  length
0  list_1    ant       3
1  list_1    bat       3
2  list_1    cat       3
3  list_2   rose       4
4  list_2  tulip       5
5  list_2  lilac       5
6  list_2  daisy       5

Solution

  • Here is how I would do it, using a dictionary to hold the lists, and a small comprehension with the dataframe constructor:

    import pandas as pd
    
    list_1 = ['ant', 'bat', 'cat']
    list_2 = ['rose', 'tulip', 'lilac', 'daisy']
    
    lists = {'list_1': list_1, 'list_2': list_2}
    
    
    df = pd.DataFrame([(k,e,len(e)) for k,l in lists.items() for e in l],
                      columns=['source', 'word', 'length'])
    

    Output:

       source   word  length
    0  list_1    ant       3
    1  list_1    bat       3
    2  list_1    cat       3
    3  list_2   rose       4
    4  list_2  tulip       5
    5  list_2  lilac       5
    6  list_2  daisy       5