I have created the below data frame combining two dataset from Kaggle.
Titanic: Machine Learning from Disaster (input/titanic/train.csv)
DataFrame name: output
PassengerId Nationality Name
0 1 CelticEnglish Braund, Mr. Owen Harris
1 2 CelticEnglish Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 Nordic,Scandinavian,Sweden Heikkinen, Miss. Laina
3 4 CelticEnglish Futrelle, Mrs. Jacques Heath (Lily May Peel
....
What I hoped to transform
PassengerId Nationality Name
0 1 CelticEnglish Braund
1 2 CelticEnglish Cumings
2 3 Nordic Heikkinen
3 4 CelticEnglish Futrelle
....
I tried to execute the below code, but I have no idea to fix the below.
Error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
----> 1 output['Nationality'].split('\n', 1)[0]
2 output['Name'].split('\n', 1)[0]
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'split'
output['Nationality'].split('\n', 1)[0]
output['Name'].split('\n', 1)[0]
I tried to change the type conversion, but the result was not changed.
output['Nationality'] = output['Nationality'].astype(str)
output['Name'] = output['Name'].astype(str)
output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]
output
PassengerId Nationality Name
0 1 CelticEnglish Braund, Mr. Owen Harris
1 2 CelticEnglish Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 Nordic,Scandinavian,Sweden Heikkinen, Miss. Laina
3 4 CelticEnglish Futrelle, Mrs. Jacques Heath (Lily May Peel)
Kaggle Notebook
A Series object doesn't have a split method. You're trying to split a string so you'll need to convert the column datatype into string first (or expand the column out into multiple columns) before applying a split.
check data type of columns with df.dtypes
assign datatype with output['Nationality'].astype(str)
edit: no parentheses on dtype call