python scikit-learn data-science imbalanced-data aif360

A problem in using AIF360 metrics in my code

I am trying to run AI Fairness 360 metrics on skit-learn (imbalanced-learn) algorithms, but I have a problem with my code. The problem is when I apply skit-learn (imbalanced-learn) algorithms like SMOTE, it return a numpy array. While AI Fairness 360 preprocessing methods return BinaryLabelDataset. Then the metrics should receive an object from BinaryLabelDataset class. I am stuck in how to convert my arrays to BinaryLabelDataset to be able to use measures.

My preprocessing algorithm needs to receive X,Y. So, I split the dataset before calling SMOTE method into X and Y. The dataset before using SMOTE was standard_dataset and it was ok to use metrics, but the problem after I used SMOTE method because it converts data to numpy array.

I got the following error after running the code :

File "adult_dataset.py", line 654, in <module>
cm = BinaryLabelDatasetMetric(y_pred, privileged_groups=p, unprivileged_groups=u)

raise TypeError("'dataset' should be a BinaryLabelDataset or a MulticlassLabelDataset")
TypeError: 'dataset' should be a BinaryLabelDataset or a MulticlassLabelDataset

Here is my code:

# dataset_orig is standard_dataset
scaler = MinMaxScaler(copy=False)
dataset_orig.features = scaler.fit_transform(dataset_orig.features)

# Spilitting data to X and Y 
X_orig = dataset_orig.features
y_orig = dataset_orig.labels

# SMOTETomek returns X_resampled, y_resampled as numpy arrays
smote_tomek = SMOTETomek(random_state=0)
X_resampled, y_resampled = smote_tomek.fit_resample(X_orig, X_orig)


#Creation of Train and Test dataset
X_train, X_test, y_train, y_test = 
train_test_split(X_resampled,y_resampled,test_size=0.2,random_state=42)


model.fit(X_train, y_train)
y_pred = model.predict(X_test)

p = [{'sex': 1.}]
u = [{'sex': 0.}]
cm = BinaryLabelDatasetMetric(y_pred, privileged_groups=p, unprivileged_groups=u)
print("Disparate_Impact", cm.disparate_impact())
print("Statistical Parity Difference", cm.statistical_parity_difference())
print("Consistency (Individual Fairness)", cm.consistency())

I think the problem with y_pred when calling BinaryLabelDatasetMetric. It should to be BinaryLabelDataset. Is there any way to use AIF360 Metrics in my code?

Solution

You are correct that the problem is with y_pred. You can concatenate it to X_test, transform it to a StandardDataset object, and then pass that one to the BinaryLabelDatasetMetric. The output object will have the methods for calculating different fairness metrics. I do not know how your dataset looks like, but here is a complete reproducible example that you can adapt to do this process for your dataset.