Search code examples
rtwittertwitter-oauth

Stripping line breaks in tweets via TwitteR


I need help removing the line breaks from tweets I extract using R language and the twitteR package. This is the code I've been using:

library(twitteR)
library(ROAuth)

consumer_key =''
consumer_secret = ''
access_token = ''
access_secret = ''

setup_twitter_oauth(consumer_key, consumer_secret, access_token,access_secret)
extracted_tweets2 = searchTwitter("'testword'", n=100000, lang="pt", retryOnRateLimit=120, since="2017-11-15", until="2018-01-17")

df <- do.call("rbind", lapply(extracted_tweets2, as.data.frame))
write.table(df,file="tweets1.csv", sep=";")

It returns me the following example in .csv format:

    1;Tweet text;rest of data
    2;Other tweet text;rest of data
    3;line 
separated 
tweet text;rest of data
    4;Other tweet text;rest of data

Similarly to this question, I want to remove the line breaks in tweet 3.

Thank in advance!


Solution

  • Assuming that the line breaks you are referring to are only carriage return and line feeds (ie. \r\n) and that you want to strip them from your df$text column, then,

    df <- do.call("rbind", lapply(extracted_tweets2, as.data.frame))
    df$text <- gsub("[\r\n]","", df$text)
    

    By the way, twitteR has the function twListToDF to neatly handle what you are doing with your do.call . Try:

    df <- twListToDF(extracted_tweets2)