Search code examples
pandasdataframeimbalanced-datasmote

I'm trying to use SMOGN to balance my data but it's giving TypeError or UFuncTypeError how to solve this problem?


I have data as images(arrays) with their labels uploaded from folders. the data is imbalanced and i'm trying to balance it using smgon after creating dataframe. the data histogram

here's the code:

    r_labels=[]
    im=[]
    for filename in os.listdir(folder):
        img = cv.imread(os.path.join(folder, filename))
        if img is not None:
            aio_plant = filename.split("_")
            flowering_time = aio_plant[2].split(".")[0]
            im.append(np.asarray(img).astype(np.float32))
            r_labels.append(np.uint8(flowering_time))
    df = pd.DataFrame({'images': im, 'labels':r_labels})  
    sm= smogn.smoter(
        data = df,  ## pandas dataframe
        y = 'labels'  ## string ('header name')
        )

this is giving an error: TypeError: unhashable type: 'numpy.ndarray' I tried to change the type like this:

            r_labels.append(flowering_time)

and it gives: UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U2'), dtype('<U2')) -> None

the data looks like this:

                                                 images  labels
0     [[[0.0, 0.0, 255.0], [0.0, 255.0, 0.0], [0.0, ...      86
1     [[[255.0, 0.0, 0.0], [255.0, 0.0, 0.0], [0.0, ...      53
2     [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...      46
3     [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [0.0, ...      44
4     [[[255.0, 0.0, 0.0], [255.0, 0.0, 0.0], [255.0...      63
...                                                 ...     ...
998   [[[0.0, 0.0, 255.0], [0.0, 255.0, 0.0], [255.0...      86
999   [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...     215
1000  [[[0.0, 0.0, 255.0], [0.0, 0.0, 255.0], [0.0, ...      92
1001  [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...      61
1002  [[[255.0, 0.0, 0.0], [0.0, 255.0, 0.0], [255.0...     183

Solution

  • I solved the problem by converting labels to hashable integers and images column to string representation of NumPy array then converting them back after smote.

    # Convert labels to hashable integers
        df['labels'] = df['labels'].astype(int)
        # Convert images column to string representation of NumPy array
        df['images'] = df['images'].apply(lambda x: np.array2string(x.flatten(), separator=','))
    
        sm= smogn.smoter(
            data = df,  ## pandas dataframe
            y = 'labels',  ## string ('header name')
            )
        sm['images'] = sm['images'].apply(lambda x: np.fromstring(x[1:-1], sep=','))
        df['labels'] = df['labels'].astype(int)