Search code examples
rstringtwitterurl-shortenertwitter-r

Tweets returned by twitteR are shortened


I am using the twitteR package for R to collect some tweets. However, I noticed that the Tweet text returned by the searchTwitter function is not the complete tweet text, but abridged to equal exactly 140 characters with the rest of the text replaced by a link to the tweet on the web.

Using a tweet I found for an example:

require(twitteR)
require(ROAuth)

# authorize twitter with consmuer and access key/secret
setup_twitter_oauth(AAA, BBB, CCC, DDD)   # actual secret codes go here...

# get sample tweet
tweet <- searchTwitter("When I was driving around earlier this afternoon I only saw two Hunters",
                       n=500,
                       since = "2017-11-04",
                       until = "2017-11-05",
                       retryOnRateLimit=5000)

# print tweet
tweet[[1]]
[1] "_TooCrazyFox_: When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn'… *SHORTENEDURL*"
# the *SHORTENEDURL* is actually a link that brings you to the tweet; stackoverflow didn't want me to a put shortened urls in here

# convert to data frame
df <- twListToDF(tweet)

# output text and ID
df$text
[1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn'… *SHORTENEDURL*"

df$id
[1] "926943636641763328"

If I go to this tweet via my web browser, it is clear that twitteR shortened the text to 140 characters and included a link to the tweet containing the whole text.

I don't see any mention of this in the twitteR documentation. Is there any way to retain the entire tweet text during a search?

My assumption is that this is related to the change in Twitter character length as referenced here: https://developer.twitter.com/en/docs/tweets/tweet-updates (in the 'Compatibility mode JSON rendering'). This implies that I need to retrieve the full_text field, rather than the text field. However, this does not seem to be supplied by twitteR.


Solution

  • The twitteR package is in process of being deprecated. You should use rtweet instead.

    You can download rtweet from CRAN, but at the present time I recommend downloading the dev version from Github. The dev version will return the full text of tweets by default. It will also return the text of full, original text of retweeted or quoted statuses.

    To install the most recent version of rtweet from Github, use the devtools package.

    ## install newest version of rtweet
    if (!requireNamespace("devtools", quietly = TRUE)) {
      install.packages("devtools")
    }
    devtools::install_github("mkearney/rtweet")
    

    Once it's installed, load the rtweet package.

    ## load rtweet
    library(rtweet)
    

    rtweet has a dedicated package documentation website. It includes a vignette on obtaining and using Twitter API access tokens. If you follow the steps in the vignette, you only have to go through the authorization process once [per machine].

    To search for tweets, use the search_tweets() function.

    # get sample tweet
    rt <- search_tweets(
      "When I was driving around earlier this afternoon I only saw two Hunters",
      n = 500
    )
    

    Print output (a tbl data frame).

    > rt
    # A tibble: 1 x 42
               status_id          created_at    user_id   screen_name
                   <chr>              <dttm>      <chr>         <chr>
    1 926943636641763328 2017-11-04 22:45:59 3652909394 _TooCrazyFox_
    # ... with 38 more variables: text <chr>, source <chr>,
    #   reply_to_status_id <chr>, reply_to_user_id <chr>,
    #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
    #   favorite_count <int>, retweet_count <int>, hashtags <list>, symbols <list>,
    #   urls_url <list>, urls_t.co <list>, urls_expanded_url <list>,
    #   media_url <list>, media_t.co <list>, media_expanded_url <list>,
    #   media_type <list>, ext_media_url <list>, ext_media_t.co <list>,
    #   ext_media_expanded_url <list>, ext_media_type <lgl>,
    #   mentions_user_id <list>, mentions_screen_name <list>, lang <chr>,
    #   quoted_status_id <chr>, quoted_text <chr>, retweet_status_id <chr>,
    #   retweet_text <chr>, place_url <chr>, place_name <chr>,
    #   place_full_name <chr>, place_type <chr>, country <chr>, country_code <chr>,
    #   geo_coords <list>, coords_coords <list>, bbox_coords <list>
    

    Print tweet text (the full text).

    > rt$text
    [1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn't have my camera otherwise I would have taken some photos of the standing corn fields in the snow. I'll do it later., maybe tomorrow.\n#harvest17"
    

    To lookup Twitter statuses by ID, use the lookup_statuses() function.

    ## lookup tweet
    tweet <- lookup_statuses("926943636641763328")
    

    Print tweet text.

    > tweet$text
    [1] "When I was driving around earlier this afternoon I only saw two Hunters but it was during the midday break. I didn't have my camera otherwise I would have taken some photos of the standing corn fields in the snow. I'll do it later., maybe tomorrow.\n#harvest17"