I'm trying to drop columns in my pandas dataframe with 0 variance.
I'm sure this has been answered somewhere but I had a lot of trouble finding a thread on it. I found this thread, however when I tried the solution for my dataframe, baseline
with the command
baseline_filtered=baseline.loc[:,baseline.std() > 0.0]
I got the error
"Unalignable boolean Series provided as "
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
So, can someone tell me why I'm getting this error or provide an alternative solution?
There are some non numeric columns, so std
remove this columns by default:
baseline = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,1,1,1,1,1],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
#no A, F columns
m = baseline.std() > 0.0
print (m)
B True
C True
D False
E True
dtype: bool
So possible solution for add or remove strings columns is use DataFrame.reindex
:
baseline_filtered=baseline.loc[:,m.reindex(baseline.columns, axis=1, fill_value=True) ]
print (baseline_filtered)
A B C E F
0 a 4 7 5 a
1 b 5 8 3 a
2 c 4 9 6 a
3 d 5 4 9 b
4 e 5 2 2 b
5 f 4 3 4 b
baseline_filtered=baseline.loc[:,m.reindex(baseline.columns, axis=1, fill_value=False) ]
print (baseline_filtered)
B C E
0 4 7 5
1 5 8 3
2 4 9 6
3 5 4 9
4 5 2 2
5 4 3 4
Another idea is use DataFrame.nunique
working with strings and numeric columns:
baseline_filtered=baseline.loc[:,baseline.nunique() > 1]
print (baseline_filtered)
A B C E F
0 a 4 7 5 a
1 b 5 8 3 a
2 c 4 9 6 a
3 d 5 4 9 b
4 e 5 2 2 b
5 f 4 3 4 b