Search code examples
pythonregexspecial-characters

Removing words with special characters "\" and "/"


During the analysis of tweets, I run in the "words" that have either \ or / (could have more than one appearance in one "word"). I would like to have such words removed completely but can not quite nail this

This is what I tried:

sen = 'this is \re\store and b\\fre'
sen1 = 'this i\s /re/store and b//fre/'

slash_back =  r'(?:[\w_]+\\[\w_]+)'
slash_fwd = r'(?:[\w_]+/+[\w_]+)'
slash_all = r'(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))'

strt = re.sub(slash_back,"",sen)
strt1 = re.sub(slash_fwd,"",sen1)
strt2 = re.sub(slash_all,"",sen1)
print strt
print strt1
print strt2

I would like to get:

this is and
this i\s and
this and

However, I receive:

and 
this i\s / and /
i\s /re/store  b//fre/

To add: in this scenario the "word" is a string separated either by spaces or punctuation signs (like a regular text)


Solution

  • How's this? I added some punctuation examples:

    import re
    
    sen = r'this is \re\store and b\\fre'
    sen1 = r'this i\s /re/store and b//fre/'
    sen2 = r'this is \re\store, and b\\fre!'
    sen3 = r'this i\s /re/store, and b//fre/!'
    
    slash_back =  r'\s*(?:[\w_]*\\(?:[\w_]*\\)*[\w_]*)'
    slash_fwd = r'\s*(?:[\w_]*/(?:[\w_]*/)*[\w_]*)'
    slash_all = r'\s*(?:[\w_]*[/\\](?:[\w_]*[/\\])*[\w_]*)'
    
    strt = re.sub(slash_back,"",sen)
    strt1 = re.sub(slash_fwd,"",sen1)
    strt2 = re.sub(slash_all,"",sen1)
    strt3 = re.sub(slash_back,"",sen2)
    strt4 = re.sub(slash_fwd,"",sen3)
    strt5 = re.sub(slash_all,"",sen3)
    print(strt)
    print(strt1)
    print(strt2)
    print(strt3)
    print(strt4)
    print(strt5)
    

    Output:

    this is and
    this i\s and
    this and
    this is, and!
    this i\s, and!
    this, and!