Search code examples
htmlrtext

Combining Several Character Objects into a Single Object


I am trying to scrape text from a news article - I am doing this as follows:

library(rvest)

url <- "https://www.bbc.com/future/article/20220823-how-auckland-worlds-most-spongy-city-tackles-floods"

final = url %>% 
  read_html() %>% 
  html_elements(".article__body-content p") %>% 
  html_text()

This seems to have worked, but I am trying to combine the results of this code into a single object. For example, the current results look like this:

[1] "Tangled mats of muddy vegetation line the footpaths of Underwood Park, a narrow stripe of green winding along a creek beneath the small volcanic cone of Ōwairaka (Mt Albert) in Auckland, New Zealand. In the water, clumps of sticks and the occasional plastic bag are marooned on protruding rocks and branches."

[2] "A winter storm swept through the city overnight, dropping heavy rain, and Te Auaunga (Oakley Creek), one of the city’s longest urban streams, has overflowed its banks."

[3] "\"But that’s supposed to happen,\" says Julie Fairey, chair of the Puketāpapa local board, who is showing me around Underwood and the neighbouring Walmsley Park."

I would like to make a single object of this text - for example (remove all " "):

final <- "Tangled mats of muddy vegetation line the footpaths of Underwood Park, a narrow stripe of green winding along a creek beneath the small volcanic cone of Ōwairaka (Mt Albert) in Auckland, New Zealand. In the water, clumps of sticks and the occasional plastic bag are marooned on protruding rocks and branches.
A winter storm swept through the city overnight, dropping heavy rain, and Te Auaunga (Oakley Creek), one of the city’s longest urban streams, has overflowed its banks.

But that’s supposed to happen, says Julie Fairey, chair of the Puketāpapa local board, who is showing me around Underwood and the neighbouring Walmsley Park."

When I inspect the results, I initially thought it was a list - but it's actually a character object. Had this been a list, I could have used the "unlist" command. But now I am not sure how to proceed.

Can someone please show me how to proceed?

Thanks!


Solution

  • The output from html_text is a vector of strings. We could join them as a single string with paste and collapse.

    library(rvest)
    library(magrittr)
    final <- url %>% 
      read_html() %>% 
      html_elements(".article__body-content p") %>% 
      html_text() %>%
      paste(collapse = "\n")
    

    Now, we check the output

    cat(final, sep = "\n")
    Tangled mats of muddy vegetation line the footpaths of Underwood Park, a narrow stripe of green winding along a creek beneath the small volcanic cone of Ōwairaka (Mt Albert) in Auckland, New Zealand. In the water, clumps of sticks and the occasional plastic bag are marooned on protruding rocks and branches.
    A winter storm swept through the city overnight, dropping heavy rain, and Te Auaunga (Oakley Creek), one of the city’s longest urban streams, has overflowed its banks.
    "But that’s supposed to happen," says Julie Fairey, chair of the Puketāpapa local board, who is showing me around Underwood and the neighbouring Walmsley Park.
    The connected parks are designed to collect excess stormwater, soak it up like a sponge, and slowly release it back into the creek. The debris left behind is evidence this "secret infrastructure" is working, Fairey says. The two parks are flanked on both sides by public housing developments. "This stuff is designed to flood so that the houses don’t," she says.
    It wasn’t always this way, Fairey tells me, as we watch a black shag drying its wings on a rock. Less than a decade ago, the waterway was a concrete-lined culvert that ran through seldom-visited muddy fields. When it flooded, water sloshed into the surrounding suburbs. It collected engine oil, sediment and rubbish and sucked this unhealthy mixture out into the city’s famous harbour, rendering the beaches unsafe to swim.
    But in 2016, work began to free Te Auaunga from rigid concrete, and restore it to a more natural, meandering shape. Its banks are now lush with native vegetation like harakeke (flax) and tī kouka (cabbage trees), as well as reeds, ferns and other filtering wetland plants.
    The changes have increased this part of the city’s ability to absorb excess rainfall, an attribute sometimes called “sponginess”. Auckland was recently named the most spongy global city in a report by multinational architecture and design firm Arup, thanks to its geography, soil type, and urban design – but experts warn it may not lead the pack for long.
    As climate change intensifies extreme weather events worldwide, what can other cities learn from Auckland's successes – and failures?
    The connected parks around Te Auaunga creek in Auckland are designed to soak up excess stormwater like a sponge (Credit: Kate Evans)
    ....