Search code examples
pythonpandasgeopandas

Creating paired nested list from a list in a column of pandas dataframe where the end element of first pair should be the start element of next


I have a data in geodataframe as shown in the image. enter image description here It contains a column by name neighbourhood_list which contains the list of all the neighbourhood codes of a route. what i want is to create a nested list in which the end element of first pair should be the start element of next because I want to generate a OD directed network (for generating edges) and order also matters here.

to make it bit clear, here is some code.

Here is lets say one record from the dataframe on which i tried some bodge way to get the desired result

list= [15,30,9,7,8]
new_list=[]
for i in range(len(list)-1):
    new_list.append(list[i])
    new_list.append(list[i+1])

so the above code gives the combined list which i then broke into the pairs which i needed

chunks = [new_list[x:x+2] for x in range(0, len(new_list), 2)]
chunks

Actual data is [15,30,9,7,8] and desired output is [[15, 30], [30, 9], [9, 7], [7, 8]]

I just figured out the above code from the answer here Split a python list into other "sublists" i.e smaller lists

However now the real issue is how to apply it in pandas

so far i am trying to tweak around something mentioned here https://chrisalbon.com/python/data_wrangling/pandas_list_comprehension/

here is some incomplete code, i am not sure if it is correct but i thought if somehow i could get the len of list items from each row of the neighbourhood_list column then maybe i could accomplish

for row in df['neighbourhood_list']:
    for i in range ??HOW TO GET range(len) of each row??
    new.append(row[i])
    new.append(row[i+1])

note: as a layman i dont know how the nested looping or lambda functions work or if there is any available pandas functions to perform this task. another thing i think is of something like this also mentioned on stackoverflow, but still how to get length of list of each row, even if i try to create a function first and then apply it to my column.

df[["YourColumns"]].apply(someFunction)

apologies ahead if the question need more clarification (i can give more details of the problem if needed)

Thanks so much.


Solution

  • My best guess is that you are trying to create a column containing a list of ordered pairs from a column of lists. If that is the case, something like this should work:

    Edit

    From what you described, your 'neighbourhood_list' column is not a list yet, but is a string. Add this line to turn the column items to lists, then run the pairs apply.

    df['neighbourhood_list']=df['neighbourhood_list'].apply(lambda row: row.split(','))
    df['pairs'] = df['neighbourhood_list'].apply(lambda row: [[row[i],row[i+1]] for i in range(len(row)-1)])
    

    If I have misunderstood, please let me know and I'll try and adjust accordingly.