I have a pandas dataframe called df
of about 2 million records.
There is a column called transaction_id
that might contain:
I want to drop that column if ALL values (i.e. across ALL records) contain:
Is there a pythonic way of doing so?
So, if a column contains across al
Given the following toy dataframe, in which col1 should be removed and col2 should be kept according to your criteria:
import pandas as pd
df = pd.DataFrame(
{
"col1": [
"abs@&wew",
"123!45!4",
"asd12354",
"asdfzf_!",
"123_!",
"asd435_!",
"_-!",
],
"col2": [
"abscdwew",
"123454",
"asd12354",
"a_!sdfzf",
"123_!",
"asd435_!",
"_-!",
],
}
)
Here is one way to do it:
test = lambda x: True if x.isalpha() or x.isdigit() else False
cols_to_keep = df.apply(lambda x: any(test(x) for x in x))
df = df.loc[:, cols_to_keep]
print(df)
# Output
col2
0 abscdwew
1 123454
2 asd12354
3 a_!sdfzf
4 123_!
5 asd435_!
6 _-!