Search code examples
rtextcharacter

Convert data frame from character to numeric


I have the following data frame with one column, which is currently stored as a character column:

enter image description here

I am trying to separate the text, but it seems like the separate() function doesn't work on character columns.

I tried to covert the columns, using the following codes. Neither of them works for me.

First try:

Overview_10_K_filings_df$Overview_10_K_filings <- as.numeric(as.character(Overview_10_K_filings_df$Overview_10_K_filings))

This creates the error message: "Warning message: NAs introduced by coercion"

Second try:

Overview_10_K_filings_df[1] <- apply(Overview_10_K_filings_df[1], 2,
                                     function(x) as.numeric(as.character(x))

Can you help me to transform the column? Or is there any other way that I can separate the content? Thanks!


Solution

  • By creating a DF out of the string and using str_replace in 3 steps. Maybe not the most concise way of achieving the goal. The three steps are kept in the DF for informative reasons how the replacing goes.

    library(tidyverse)
      
    t <- "QTR4/20151229_10-K_edgar_data_1230058_0000892626-15-000373.txt"
    t |> as.data.frame() |> 
    mutate(new1=stringr::str_replace(t, '/', ' | ')) |> 
      mutate(new2 = stringr::str_replace_all(new1, '_', ' | ')) |> 
      mutate(new3 = stringr::str_replace(new2, '.txt', ' | txt')) |> 
      select(new3) |> as.character()
    #> [1] "QTR4 | 20151229 | 10-K | edgar | data | 1230058 | 0000892626-15-000373 | txt"
    

    Better: Or you do it this way:

    b <- "_|/|\\."
    stringr::str_replace_all(t, b, ' | ')
    # [1] "QTR4 | 20151229 | 10-K | edgar | data | 1230058 | 0000892626-15-000373 | txt"