Search code examples
rextracttext-extractiontexdata-extraction

How extract parameter model intable?


I am trying to extract different parameters from a table and place them in a column each. I have tried but it did not work.

This is the example

  Models = c("ARIMA(1,0,10)(80,0,90)[12] with non-zero mean",
           "ARIMA(2,0,11) with non-zero mean",
           "ARIMA(3,0,12)(81,0,91)[12] with non-zero mean",
           "ARIMA(4,0,13)(82,0,92)[12] with non-zero mean",
           "ARIMA(5,0,14) with zero mean",
           "ARIMA(6,0,15) with non-zero mean")

Models = as.data.frame(Models)

I need to separate each parameter in a different column the idea is to separate it as follows

   Name p d  q   P   D   Q  PERIOD  MEAN
1 ARIMA 1 0 10  80   0  90   12     with non-zero mean
2 ARIMA 2 0 11 N/a N/a N/a   N/a    with non-zero mean
3 ARIMA 3 0 12  81   0  91   12     with non-zero mean
4 ARIMA 4 0 13  82   0  92   12     with non-zero mean
5 ARIMA 5 0 14 N/a N/a N/a   N/a    with zero mean
6 ARIMA 6 0 15 N/a N/a N/a   N/a    with non-zero mean

Is there a way to separate it automatically? I am new working with R I have researched but I can not find the solution

note: the models that are in examples are not real, it is only to identify the parameters


Solution

  • I'm not sure how this performs on your original data set but seems to work fine here:

    library(dplyr)
    library(stringr)
    library(tidyr)
    
    Models %>%
      as_tibble() %>%
      mutate(Mean = str_extract(value, "(?<=\\s)[^d]+"),
             value = gsub("\\s[^d]+", "", value), 
             value = gsub("[)(,]", " ", value, perl = TRUE),
             value = gsub("[\\[\\]]", "", value, perl = TRUE)) %>%
      separate(value, into = c("Name", "p", "d", "q", "P", "D", "Q", "Period"), sep = "\\s+") %>%
      mutate(across(p:Q, ~ replace(., . == (""), NA)))
    
    # A tibble: 6 x 9
      Name  p     d     q     P     D     Q     Period Mean              
      <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr>             
    1 ARIMA 1     0     10    80    0     90    12     with non-zero mean
    2 ARIMA 2     0     11    NA    NA    NA    NA     with non-zero mean
    3 ARIMA 3     0     12    81    0     91    12     with non-zero mean
    4 ARIMA 4     0     13    82    0     92    12     with non-zero mean
    5 ARIMA 5     0     14    NA    NA    NA    NA     with zero mean    
    6 ARIMA 6     0     15    NA    NA    NA    NA     with non-zero mean