Cleaning web text using readLines and the tm-package in R

I am trying to remove regex codes and numbers on a webpage using the readLines function. I am using the unlist function for some of this. However, I'm not sure how to remove numbers. I was thinking of using the tm-package, but I seem to be missing a format conversion. How can I transform my webpage to remove numbers etc. with tm, or is there an easier way of removing redundancy from the text? I hope to concatenate a number of webpages to be read, so it will be quite a bit of cleaning.

 library(rvest)
 library(tm)
 webpage <- readLines("https://www.sciencedaily.com/releases/2020/02/200219113746.htm", 
             encoding = "UCS-2LE")
 dirtytext <- unlist(strsplit(webpage,"\\r|\\n|\\t"))
 cleantext <- tm_map(dirtytext,removeNumbers)

The last line gives the error message:

'Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character"'

Solution

I'm not sure if you want to include the lede but the following returns the story by paragraph (which removes all the non-story elements contained in the text like advertising).

library(rvest)

url <- "https://www.sciencedaily.com/releases/2020/02/200219113746.htm"

page <- read_html(url)

story <- page %>%
  html_nodes("div#text p") %>%  # use "div#story_text p" to include lede
  html_text