I'm trying to remove punctuation from the column "text" using this code:
texttweet = pd.read_csv("../input/pfizer-vaccine-tweets/vaccination_tweets.csv")
i = 0
punct = "\n\r"+string.punctuation
for tweet in texttweet['text']:
texttweet['text'][i] = tweet.translate(str.maketrans('', '', punct))
i += 1
texttweet
But I'm getting this message although I'm getting the needed results:
A value is trying to be set on a copy of a slice from a DataFrame
So is it OK to keep my code regardless of the message or should I change something?
Best way to do that is this:
texttweet = pd.read_csv("../input/pfizer-vaccine-tweets/vaccination_tweets.csv")
punct = "\n\r"+string.punctuation
texttweet['text'] = texttweet['text'].str.translate(str.maketrans('','',punct))
texttweet
For an explanation of the problem you were having see here: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy.
Basically texttweet['text'] is a "slice" of a dataframe, and you are taking that slice and trying to assign something to it in position i.
To avoid the error you can use texttweet.loc[i,'text'] = . This is different because it is being applied directly to the original dataframe, not a slice of it.