Search code examples
pythonpandascontourscikit-imageenumerate

How to store scikit-image contours in a Pandas DataFrame with a single vertex and a contour number per row


I am using a modified version of this scikit-image demo to create contours from the edges resulting from watershed segmentation of an image. In this result, each level has one contour only, made of row-column index pairs.

It is easy to display contours as in the demo. But what I'd like to do is use the enumerate loop to append each vertex of each contour to a Pandas DataFrame, separating the row and column index, and then add a level/contour index in a separate column.

To illustrate I will start with a small toy example where each contour has one index only. With this code:

np.random.seed(131)
test = np.random.randint(50, size=5)
n_list = []
t_list = []
for n, t in enumerate(test):
    n_list.append(n)
    t_list.append(t)
contours_df = pd.DataFrame({'contour': n_list, 'contour': t_list})
contours_df 

I get this DataFrame:

test

A more representative example would be something like this:

np.random.seed(131)
test1 = np.random.randint(50, size=(5, 2,  2))
n_list1 = []
t_list1 = []
for n1, t1 in enumerate(test1):
    n_list1.append(n1)
    t_list1.append(t1)
contours_df1 = pd.DataFrame({'contour': n_list1, 'points': t_list1})
contours_df1

which gives me this DataFrame:

enter image description here

I can export this to an Excel file using XlsxWriter, like this:

# using XlsxWriter documentation example
writer = pd.ExcelWriter('contours_df1.xlsx', engine='xlsxwriter')
contours_df1.to_excel(writer, sheet_name='Sheet1')
writer.save()

To get this:

Excel 1

But what I would really like is to split the contours so as to get something like this as a final Excel output:

Excel 2


Solution

  • I would use pandas concatenation. For reasonably-sized data, it's a matter of taste whether you build up a list per column (though you would need a second nested loop to allow for arbitrary-sized contours). For larger data, I think this method should be faster because it makes use of NumPy/pandas vectorization where possible.

    Here's an example:

    import numpy as np
    import pandas as pd
    
    contours = [np.random.random((i, 2)
                for i in np.random.randint(3, 10, size=5)]
    
    dataframes = []
    for contour_id, contour in enumerate(contours):
        current_dataframe = pd.DataFrame(contour, columns=['row', 'column'])
        current_dataframe['contour'] = contour_id
        dataframes.append(current_dataframe)
    contours_data = pd.concat(dataframes)
    
    contours_data.to_excel('filename.xlsx', sheet_name='Sheet1')
    

    Side note: you don't need to create an ExcelWriter if you are only writing a single sheet.