Search code examples
pythondataframepytorchdata-analysis

I'm trying to sort the data and categorize them as masks and images


df_imgs = df1[~df1['path'].str.contains("mask")] # get's all the pictures without the mask
df_masks = df1[df1['path'].str.contains("mask")] # all the images with mask

BASE_LEN = 98
END_IMG_LEN = 4
END_MASK_LEN = 9

IMG_SIZE = 512

# let's sort the data
imgs = sorted(df_imgs["path"].values, key = lambda x: int(x[BASE_LEN:-END_IMG_LEN]))
masks = sorted(df_masks["path"].values, key=lambda x: int(x[BASE_LEN:-END_MASK_LEN]))

print("Sorted image paths:")
for img_path in imgs:
    print(img_path)

print("\nSorted mask paths:")
for mask_path in masks:
    print(mask_path)

# let's check the sorting
idx = np.random.randint(0, len(imgs) - 1)
print(f"Path to the images: {imgs[idx]}, \nPath to the masks: {masks[idx]}")

Error: ValueError Traceback (most recent call last) in <cell line: 11>() 9 10 # let's sort the data ---> 11 imgs = sorted(df_imgs["path"].values, key = lambda x: int(x[BASE_LEN:-END_IMG_LEN])) 12 masks = sorted(df_masks["path"].values, key=lambda x: int(x[BASE_LEN:-END_MASK_LEN])) 13

in (x) 9 10 # let's sort the data ---> 11 imgs = sorted(df_imgs["path"].values, key = lambda x: int(x[BASE_LEN:-END_IMG_LEN])) 12 masks = sorted(df_masks["path"].values, key=lambda x: int(x[BASE_LEN:-END_MASK_LEN])) 13

ValueError: invalid literal for int() with base 10: ''

Can someone help me out here and tell me why I'm facing this issue I want to categorize the masks and images so that I can create a dataframe at the end, which contains the paths of each images and masks.


Solution

  • The error you are having (I assume) is due to the fact that the code inside the lambda function int(x[BASE_LEN:-END_IMG_LEN]) is trying to convert a non-integer string into an integer.

    The slice x[BASE_LEN:-END_IMG_LEN] is extracting a part of the string that is not a valid integer, resulting in the ValueError.

    And then the BASE_LEN, END_IMG_LEN, and END_MASK_LEN values are constants and might not match the actual positions of the numeric values you're trying to extract from your strings.

    I don't know the exact format of the paths in the 'path' column, so it's hard to provide a specific solution. Try to print the value of the substring inside the lambda function that you're attempting to convert to an integer. This will allow you to see exactly what is causing the error.

    Then try the following :

    def key_function(x):
        value = x[BASE_LEN:-END_IMG_LEN]
        print("Attempting to convert:", value)  # Print the problematic value
        return int(value)
    
    imgs = sorted(df_imgs["path"].values, key=key_function)
    

    You will be able to see what is being attempted to convert to an integer and figure out why it's not working. Feel free to share an example of the paths if you'd like more specific guidance.