Search code examples
rfor-loopmediawiki-extensions

How do I iterate over a range of revision ID's when querying WikipediR?


I am using WikipediR to query revision ids to see if the very next edit is a 'rollback' or an 'undo'

I am interested in the tag and revision comment to identify if the edit was undone/rolled back. my code for this for a single revision id is:

library(WikipediR)

wp_diff<- revision_diff("en", "wikipedia", revisions = "883987486", properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)

I then convert the output of this to a df using the code

library(dplyr)
library(tibble)
diff <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))

This works great for a single revision id. I am wondering how I would loop or map over a vector of many revision ID's

I tried

vec <- c("883987486","911412795")
for (i in 1:length(vec)){
wp_diff[i]<- revision_diff("en", "wikipedia", revisions = i, properties = c("tags", "comment"), direction = "next", clean_response = T, as_wikitext=T)
}

But this creates the error Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0

When I try to convert the output list to a dataframe. Does anybody have any suggestions. I am not sure how to proceed.

Thanks.


Solution

  • Try the following code:

    # Make a function
    make_diff_df <- function(rev){
      wp_diff <- revision_diff("en", "wikipedia", revisions = rev,
                              properties = c("tags", "comment"), 
                              direction = "next", clean_response = TRUE, 
                              as_wikitext = TRUE)
    
      DF <- do.call(rbind, lapply(wp_diff, as.data.frame, stringasFactors=FALSE))
    
      # Define the names of the DF
      names(DF) <- c("pageid","ns","title","revisions.diff.from",
                      "revisions.diff.to","revisions.diff..",
                      "revisions.comment","revisions..mw.rollback.")
      return(DF)
    }
    
    vec <- c("883987486","911412795")
    
    # Use do.call and lapply with the function
    do.call("rbind",lapply(vec,make_diff_df))
    

    Note that you have to fixed the names of the DF inside the make_diff_df function in order to "rbind" inside do.call could work. The names with the 2 versions from the example are pretty similar.

    Hope this can help