I have a python script that cleans an uploaded dataframe. One of the function is to remove the punctuation from the string, the function works on all punctuation in a string except for the [...] 3 dots or ellipses at the end of the line
In example 1, the function works as it should, while in the example 2 it keep the 3 dots at the end of the string.
======
import re
from string import punctuation
ppt = '''...!@#$%^&*()....{}[]|._-`/?:;"'\,~12345678876543'''
def processTweet(tweet):
'''
parameters:
====================
- tweets: list of text
functions:
====================
- Remove HTML special entities (e.g. &)
- Convert @username to AT_USER
- Remove tickers
- convert to lowercase
- Remove hyperlinks
- Remove hashtags
- Remove Punctuation and split 's, 't, 've with a space for filter
'''
# Remove HTML special entities (e.g. &)
tweet = re.sub(r'\&\w*;', '', tweet)
#Convert @username to AT_USER
tweet = re.sub('@[^\s]+','',tweet)
# Remove tickers
tweet = re.sub(r'\$\w*', '', tweet)
# To lowercase
tweet = tweet.lower()
# Remove hyperlinks
tweet = re.sub(r'https?:\/\/.*\/\w*', '', tweet)
# Remove hashtags
tweet = re.sub(r'#\w*', '', tweet)
# Remove Punctuation and split 's, 't, 've with a space for filter
tweet = re.sub(r'[' + ppt.replace('@', '') + ']+', ' ', tweet)
return tweet
If it is an ellipsis as pointed out by @JvdV,
tweet = tweet.replace('\u2026','')
This will remove any ellipses.