I am using decode_short_url
of the twitteR
package to decode shortened URLs from Twitter posts, but I am not able to get the desired results, It is always giving back the same results such as:
## http://bit.ly/23226se656
## [1] "http://bit.ly/23226se656
UPDATE I wrapped this functionality in a package and managed to get it on CRAN same-day. Now, you can just do:
expand_urls("http://bit.ly/23226se656", check=TRUE, warn=TRUE)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## Source: local data frame [1 x 2]
## orig_url expanded_url
## 1 http://bit.ly/23226se656 NA
## Warning message:
## In FUN(X[[i]], ...) : client error: (404) Not Found
You can pass in a vector of URLs and get a data_frame
back in that form.
That particular bit.ly URL gives a 404
error. Here's a version of decode_short_url
that has an optional check
parameter that will attempt a HEAD
request and throw a warning message for any HTTP status other than 200.
You can further modify it to return NA
in the event the "expanded" link 404's (I have no idea what you need this to really do in the event the link is bad).
NOTE that the addd HEAD
request will significantly slow the process down, so you may want to do a first pass with check=FALSE
to a separate column, then compare which weren't "expanded", then check those with check=TRUE
You might also want to rename this to avoid namespace conflicts with the one from twitteR
decode_short_url <- function(url, check=FALSE, ...) {
request_url <- paste("http://api.longurl.org/v2/expand?url=",
url, "&format=json", sep="")
response <- GET(request_url, query=list(useragent="twitteR"), ...)
parsed <- content(response, as="parsed")
ret <- NULL
if (!("long-url" %in% names(parsed))) {
ret <- url
} else {
ret <- parsed[["long-url"]]
if (check) warn_for_status(HEAD(url))
decode_short_url("http://bit.ly/23226se656", check=TRUE)
## [1] "http://bit.ly/23226se656"
## Warning message:
## In decode_short_url("http://bit.ly/23226se656", check = TRUE) :
## client error: (404) Not Found