I have a dataframe with the following dtypes.
> df.dtypes
Col1 float64
Col2 object
dtype: object
When I do the following:
df['Col3'] = df['Col2'].apply(lambda s: len(s) >= 2 and s[0].isalpha())
I get:
TypeError: object of type 'float' has no len()
I believe if I convert "object" to "String", I will get to do what I want. However, when I do the following:
df['Col2'] = df['Col2'].astype(str)
the dtype of Col2
doesn't change. I am a little confused with datatype "object" in Pandas. What exactly is "object"?
More info: This is how Col2
looks like:
Col2
1 F5
2 K3V
3 B9
4 F0V
5 G8III
6 M0V:
7 G0
8 M6e-M8.5e Tc
If a column contains string or is treated as string, it will have a dtype
of object
(but not necessarily true backward -- more below). Here is a simple example:
import pandas as pd
df = pd.DataFrame({'SpT': ['string1', 'string2', 'string3'],
'num': ['0.1', '0.2', '0.3'],
'strange': ['0.1', '0.2', 0.3]})
print df.dtypes
#SpT object
#num object
#strange object
#dtype: object
If a column contains only strings, we can apply len
on it like what you did should work fine:
print df['num'].apply(lambda x: len(x))
#0 3
#1 3
#2 3
However, a dtype
of object does not means it only contains strings. For example, the column strange
contains objects with mixed types -- and some str
and a float
. Applying the function len
will raise an error similar to what you have seen:
print df['strange'].apply(lambda x: len(x))
# TypeError: object of type 'float' has no len()
Thus, the problem could be that you have not properly converted the column to string, and the column still contains mixed object types.
Continuing the above example, let us convert strange
to strings and check if apply
works:
df['strange'] = df['strange'].astype(str)
print df['strange'].apply(lambda x: len(x))
#0 3
#1 3
#2 3
(There is a suspicious discrepancy between df_cleaned
and df_clean
there in your question, is it a typo or a mistake in the code that causes the problem?)