Search code examples
javatwittertwitter4j

Removing hashtags, usernames and URLs from tweet using Twitter4j


Is there an easy way to remove hashtags, usernames and URLs mentioned in a tweet using twitter4j? I know that using getHashtagEntities(), getUserMentionEntities() and getURLEntities() I can retrieve those entities and their position in the string but how would I use them to "clean up" tweets?

I was thinking of using the replaceAll(entity, "") method to replace all those entities in the tweet with "" but that wouldn't always give correct results (.e.g it would remove #ht from the " _#ht " tweet even though it shouldn't).


Solution

  • I ended up using a lookbehind "(?<!\w)" with the replaceAll() method for each entity and I guess this solved my problem. However I was told that regex and the replaceAll() method can be quite slow so if anyone has any other suggestions I'd be happy to read them.