Search code examples
pythonexceltwitterhyperlink

removing links from dataset


I have the following dataset and I need to remove all of the links from it. The csv looks like this:

data

Does anyone know how I can quickly and easily do this?


Solution

  • You can use a regular expression in python as such:

    import re 
    
    for x in list :
         re.sub("http\S*\s", "", x)
    

    where list is a list of your csv data.

    This is the code I use to preprocess Twitter Data:

    all_text  = re.sub("#\S*\s", "", all_text)
    all_text  = re.sub("W+", "", all_text)
    all_text  = re.sub("@\S*\s", "", all_text)
    all_text  = re.sub("http\S*\s", "", all_text)