I have 3 columns Col1, Col2 and Col3 in pandas dataframe 'df'. I want to make a new column, Col4 out of these. The steps are as follows:
Check Col1, Col2 and Col3, if value is null and not 'aaa' or has a datetime value, skip and go to the other column, if value is not null and not datettime and not 'aaa' pick value and put in the new column i.e. Col4.
Col1, Col2 and Col3 are object datatypes.
For example:
col1 col2 col3 col4
aaa 2011-02-01 10:04:03 abc abc
2011-02-01 10:04:03 xyz abc xyz
ijk xyz ijk
Currently i have been using the code:
df["col4"] = np.where(
((df["col1"].notnull()) & (df["col1"] != "aaa")),
df["col1"],
(
((df["col2"].notnull()) & (df["col2"] != "aaa")),
df["col2"],
(((df["col3"].notnull()) & (df["col3"] != "aaa")), df["col3"], "No Value"),
),
)
This incorporates the part where we skip the null values and 'aaa'. I want a code so that i am also able to skip the datetime values here.
Can you please help?
With the dataframe you provided:
import datetime
import pandas as pd
df = pd.DataFrame(
{
"col1": [
"aaa",
datetime.datetime.strptime("2011-02-01 10:04:03", "%Y-%m-%d %H:%M:%S"),
"",
],
"col2": [
datetime.datetime.strptime("2011-02-01 10:04:03", "%Y-%m-%d %H:%M:%S"),
"xyz",
"ijk",
],
"col3": ["abc", "abc", "xyz"],
}
)
Here is another way to do it by first defining a helper function which uses Python built-in function isinstance to check datetime values:
def check(val):
return (
val
if (val and val != "aaa" and not isinstance(val, datetime.datetime))
else None
)
And then, use Pandas apply to check each column inside a generator (here, Python next built-in function returns the first non-false element, as suggested in this clever post):
df["col4"] = df.apply(
lambda x: next(
(y for y in [check(x["col1"]), check(x["col2"]), check(x["col3"])] if y), pd.NA
),
axis=1,
)
print(df)
# Output
col1 col2 col3 col4
0 aaa 2011-02-01 10:04:03 abc abc
1 2011-02-01 10:04:03 xyz abc xyz
2 ijk xyz ijk