I am currently trying to work with text data and I am relatively new at this. The column I'm trying to work with is the cast column, as shown below:
0 [Sam Worthington, Zoe Saldana, Sigourney Weave...
1 [Johnny Depp, Orlando Bloom, Keira Knightley, ...
2 [Daniel Craig, Christoph Waltz, Léa Seydoux, R...
3 [Christian Bale, Michael Caine, Gary Oldman, A...
4 [Taylor Kitsch, Lynn Collins, Samantha Morton,...
Name: cast, dtype: object
What I want is to lower all the upper cases. However when I try to do it, it converts everything to NaN values.
Here's the simple thing I've done:
data.cast=data.cast.str.lower()
Here's the output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 NaN
19 NaN
20 NaN
21 NaN
22 NaN
23 NaN
24 NaN
25 NaN
26 NaN
27 NaN
28 NaN
29 NaN
..
Can anyone help me understand what I'm doing wrong and how I could potentially fix it? Thank you for your time!!!
You try to convert a column which contains lists with a string methodology. so you need to create a simple function such as:
def lower(l):
return [x.lower() for x in l]
And use a map to remove capitals:
data = pd.DataFrame([{'col':['Titi','Toto','Tutu']},{'col':['Tata','Toto','Tutu']}])
data.col = data.col.map(lower)
data
The result is:
col
0 [titi, toto, tutu]
1 [tata, toto, tutu]