Parse HTML into text with Div level in R

library(XML)
html <- read_html("https://www.sec.gov/Archives/edgar/data/1011290/000114036105007405/body.htm")
doc.html = htmlTreeParse(html, useInternal = TRUE)
doc.text = unlist(xpathApply(doc.html, '//div', xmlValue))

The above code reads text twice because of div level/structure, I need to read text only once. Thank you for your time and help. i.e.

doc.text[2] # contains all the text which repeats again in 3 to 59

Solution

Try this:

library(rvest)
library(tidyverse)
html <- read_html("https://www.sec.gov/Archives/edgar/data/1011290/000114036105007405/body.htm")
text <- html %>% 
         html_nodes(xpath = "//text/div") %>%
         html_text(trim = TRUE) %>% 
         paste( collapse = ' ')

R Language - Extracting the correct Data Type in a PDF Table
Comparing the values of a certain number previous rows with the current row
rpart package installation in R
An efficient way to assign value based on a min-max range and category
Change output of the `purrr::map` function
osmdata_sf returns failed to perform HTTP request curl::curl_fetch_memory() error in R?
Comparing nls() to nls2() - what am I doing wrong
How to add "variables grid" below ggplot
How can I use predefined code snippets outside of code chunks in Quarto within RStudio/Posit?
Wrap text for collapse rows in KableExtra for a long table in R
Implementation of Breusch-Pagan test for random effects in plm with unbalanced panels
Finding a value of a dataset in different ones
Replicate matrix
Unexpected results after converting raster data from geographic to projected coordinate system using the terra package
How to remove rows by condition in R?
How do I add an alias for magrittr pipe from R in vscode
Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?
Sub-subtitle in a graph made with `ggplot2`
How can I execute a statement and ignore warnings with tryCatch?
Enumerate events where n consecutive values are not NA
Serialize/deserialize a column with R and DuckDB
Putting multiple plots on the same page in R?
NA values in a non-editable date column in a datatable in a shiny app change to "Invalid Date" when clicked on
How to enable/disable checkboxInput when certain panel is selected
Writing robust R code: namespaces, masking and using the `::` operator
Replacing with conditional value in dplyr case_when()
How to assign pre-determined RGB values to polygons
python/pandas equivalent to dplyr 1.0.0 summarize(across())
Calculating moving average
Estimating non-monotonic bi-exponential curve fit