why this for loop does not work...?
I want to get a new column with Delivery Year, it consists of these columns, however, there are a lot of Nans so the logic is that the for loop goes through columns and returns the first non-Na value. The best-case scenario is Delivery Date, when this is not there then Build Year if even this is not there then at least In-Service Date when the machine was set into work.
df = pd.DataFrame({'Platform ID' : [1,2,3,4], "Delivery Date" : [str(2009), float("nan"), float("nan"), float("nan")],
"Build Year" : [float("nan"),str(2009),float("nan"), float("nan")],
"In Service Date" : [float("nan"),str("14-11-2010"), str("14-11-2009"), float("nan")]})
df.dtypes
df
def delivery_year(delivery_year, build_year, service_year):
out = []
for i in range(0,len(delivery_year)):
if delivery_year.notna():
out[i].append(delivery_year)
if (delivery_year[i].isna() and build_year[i].notna()):
out[i].append(build_year)
elif build_year[i].isna():
out[i].append(service_year.str.strip().str[-4:])
else:
out[i].append(float("nan"))
return out
df["Delivery Year"] = delivery_year(df["Delivery Date"], df["Build Year"], df["In Service Date"])
When I run this function I get this error and I do not know why...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Update 3
I rewrote your function in the same manner of your, so without change the logic and the type of your columns. I let you compare the two versions:
def delivery_year(delivery_date, build_year, service_year):
out = []
for i in range(len(delivery_date)):
if pd.notna(delivery_date[i]):
out.append(delivery_date[i])
elif pd.isna(delivery_date[i]) and pd.notna(build_year[i]):
out.append(build_year[i])
elif pd.isna(build_year[i]) and pd.notna(service_year[i]):
out.append(service_year[i].strip()[-4:])
else:
out.append(float("nan"))
return out
df["Delivery Year"] = delivery_year(df["Delivery Date"],
df["Build Year"],
df["In Service Date"])
Notes:
I changed the name of your first parameter because delivery_year
is also the name of your function, so it can be confusing.
I also replaced the .isna()
and .notna()
methods by their equivalent functions: pd.isna(...)
and pd.notna(...)
.
The second if
became elif
Update 2
Use combine_first
to replace your function. combine_first
updates first series ('Delivery Date') with the second series where values are NaN
. You can chain them to fill your 'Delivery Year'.
df['Delivery Year'] = df['Delivery Date'] \
.combine_first(df['Build Year']) \
.combine_first(df['In Service Date'].str[-4:])
Output:
>>> df
Platform ID Delivery Date Build Year In Service Date Delivery Year
0 1 2009 NaN NaN 2009
1 2 NaN 2009 14-11-2010 2009
2 3 NaN NaN 14-11-2009 2009
3 4 NaN NaN NaN NaN
Update
You forgot the [i]
:
if delivery_year[i].notna():
The truth value of a Series is ambiguous:
>>> delivery_year.notna()
0 True # <- 2009
1 False # <- NaN
2 False
3 False
Name: Delivery Date, dtype: bool
Pandas should consider the series is True (2009) or False (NaN)?
You have to aggregate the result with .any()
or .all()
>>> delivery_year.notna().any()
True # because there is at least one non nan-value.
>>> delivery_year.notna().all()
False # because all values are not nan.