Search code examples
rlistdataframewikipedia

How to make dataframe out of lists?


I'm working on creating a dataframe with information from wikipedia pages. 1905 wikipedia pages to be exact. I'm using the following function with a list of page titles I have, under the dataframe portalAcadie_titles.

Here are a few of the titles I'm looking to extract information from:

"10e Convention nationale acadienne", "11e Convention nationale acadienne", "12e Convention nationale acadienne", "13e Convention nationale acadienne", "14e Convention nationale acadienne", "15e Convention nationale acadienne", "16e Convention nationale acadienne", "1755 (groupe)", "1re Convention nationale acadienne", "2e Convention nationale acadienne", "33e finale des Jeux de l'Acadie", "3e Convention nationale acadienne", "4e Convention nationale acadienne", "5e Convention nationale acadienne", "6e Convention nationale acadienne", "7e Convention nationale acadienne", "8e Convention nationale acadienne", "9e Convention nationale acadienne", "Abbé Lanteigne", "Abel Leblanc", "Aberdeen (Nouvelle-Écosse", "Aboiteau", "Abrams-Village"

See the code bellow:

library(WikipediR)
pageInfo_fun <- function(portalAcadie_titles){
  page_info(language = "fr", 
            project = "wikipedia", 
            page = portalAcadie_titles,
            properties = c("url"),
            clean_response = T, Sys.sleep(0.0001))}

pageInfo_data <- apply(portalAcadie_titles,1, pageInfo_fun)

I'm trying to get this to a dataframe with each observation being a page with its properties as variables. But it gives me a list of characteristic for each page, within a list of all the pages.

A simple

pageInfo_df <- data.frame(pageInfo_data)

gives me 1 observation with all of the caracteristics to every page side by side. I have 1 observation with 24 773 variables.

My question is: how can i make each page as an observation, with the caracteristics as variables?


Solution

  • The Tidyverse purrr package can be used to produce the desired output:

    map_dfr(pageInfo_data, ~flatten(.))
    

    This approach flattens the output for each observation, and maps the output of each to a dataframe row.