Search code examples
rweb-scrapingshinyreactiveimport-csv

How can I read a column in an imported csv file in R shiny?


I want to import a CSV file from my computer, that includes URLs for web articles to the shiny app. Then I want to get 100 URLs, from the column "url" in the CSV file and web scrape all of those URLs to create a "Word cloud".

This is the server part of the code. I want to select the column "url" from the CSV file and iterate through the first 10 of "url"s with a for loop in order to web scrape data from all of the articles from which these URLs lead. Then I assign that data to the variable called "inputWords", then "inputWords" again assign to a variable called "data" in order to create a word cloud:

server <- function(input, output) {
data_source <- reactive({
    if (input$source == "csv") {
        data <- inputWords()
    }
    return(data)
})

inputWords <- reactive({
    if (is.null(input$csv)) {
        return("")
    }
    
    else if (is.table(input$csv)) {
        CSVFile <- read.csv(input$csv$datapath)
        Urls <- c(CSVFile$url[1:10])
        
        pages <- list()
        
        for (i in Urls) {
            ArticlePages <- read_html(i)
            
            articleText = ArticlePages %>% html_elements("h1.newsfull__title, p") %>% html_text()
            pages[[i]] <- c(articleText)
        }
        pages[1:10]
    }
})

And this is where I assign "data_source" to the word cloud

output$cloud <- renderWordcloud2({
    create_wordcloud(data_source(),
                     num_words = input$num)
})

This is the warning message:

Warning: Error in if: argument is of length zero

Link to sample data

enter image description here


Solution

  • The following works with your sample data. I am not sure what the issue is as you have not shown your ui.

    library(rvest)
    
    ui <- fluidPage(
      titlePanel("title panel"),
      fluidRow(
        column(3, fileInput("csv", h3("File input"))
        )
      ),
      fluidRow(
        column(width = 9, verbatimTextOutput("t1"))
      )
    )
    
    server <- function(input, output) {
      # data_source <- reactive({
      #   if (input$source == "csv") {
      #     data <- inputWords()
      #   }
      #   return(data)
      # })
      
      inputWords <- reactive({
        req(input$csv)
        CSVFile <- read.csv(input$csv$datapath)
        Urls <- as.character(CSVFile$url[1:10])
        
        pages <- list()
        
        for (i in Urls) {
          ArticlePages <- read_html(i)
          
          articleText = ArticlePages %>% html_elements("h1.newsfull__title, p")  %>% html_text()
          pages[[i]] <- c(articleText)
        }
        pages[1:10]
      })
      
      output$t1 <- renderText({paste(inputWords())})
      
    }
    
    shinyApp(ui = ui, server = server)