I have two directories. One contains images, and the other contains masks. each image in the images folder has a mask with the same filename in the masks folder. Now I want create a pandas dataframe with a single column with the list of the locations of the images and the second column containing the corresponding location of the masks. To do a preliminary investigation on how to do this, I wrote the following code:
# Generate a list of all the files and their
def generate_list(images, masks):
images_df = pd.concat([pd.DataFrame([file],
columns=['images']) for file in os.listdir(images)], ignore_index = True)
masks_df = pd.concat([pd.DataFrame([file],
columns=['masks']) for file in os.listdir(masks)], ignore_index = True)
df = pd.concat([images_df, masks_df], axis=0, ignore_index=True)
print(df)
return df
However, I get the output:
images masks
0 47_1.bmp NaN
1 5_1.bmp NaN
2 26_1.bmp NaN
3 24_1.bmp NaN
4 7_1.bmp NaN
5 19_1.bmp NaN
6 19.bmp NaN
7 18.bmp NaN
8 45_1.bmp NaN
26 4_1.bmp NaN
.. ... ...
131 NaN 14.bmp
132 NaN 50_1.bmp
133 NaN 15_1.bmp
134 NaN 28_1.bmp
135 NaN 9_1.bmp
136 NaN 16.bmp
137 NaN 17_1.bmp
138 NaN 17.bmp
139 NaN 33_1.bmp
Clearly, os.listdir already shuffles the list of the files being taken into the concat
operation.
How would I go about doing this?
def generate_list(images, masks):
images_df = pd.concat([pd.DataFrame([images + file]) for file in os.listdir(images)], ignore_index=True)
masks_df = pd.concat([pd.DataFrame([masks + file]) for file in os.listdir(masks)], ignore_index=True)
df = pd.concat([images_df, masks_df], axis=1, ignore_index=True)
return df.sample(frac=1)
Here is my new answer. The axis was wrong!