Search code examples
rrsqlite

Keep only first word in Column RSQLite


I am working on tranferring SQL code over to R.

I have a dataset and want to extract the first word from a column in a new column.

Dataset

 df

        Col_A    ColB
  ABC JEn JDn       9
  nemmdn mend      10
pwek ne me ne      12

Expected Output:

        Col_A    ColB    Col_A_Abbreviation
  ABC JEn JDn       9                   ABC
  nemmdn mend      10                nemmdn
pwek ne me ne      12                  pwek

Below is what I have currently:

library(DBI)
library(RSQLite)
mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "df_1", df)

dbGetQuery(mydb, 
'SELECT IF(INSTRB(Col_A, \' \', 1, 1) >0, 
SUBSTR(Col_A, 1, INSTRB(Col_A, \' \', 1, 1) - 1), 
Col_A) AS Col_A_Abbreviation FROM df_1')

Here is my error message:

Error in result_create(conn@ptr, statement) : no such function: IF

Does RSQLite have limitations on what functions it can use?


Solution

  • 1) sqldf/RSQLite If you are using the SQLite backend then you can only use functions supported by RSQLite. Try this where df is defined reproducibly in the Note at the end.

    library(sqldf)
    sqldf("select 
        *, 
        substr([Col_A], 1, instr([Col_A], ' ') - 1) FirstWord 
      from df")
    

    giving:

              Col_A ColB FirstWord
    1   ABC JEn JDn    9       ABC
    2   nemmdn mend   10    nemmdn
    3 pwek ne me ne   12      pwek
    

    1a) If there can be entries with only one word then append a space to the end to ensure that there is always at least one space to match.

    sqldf("select 
        *, 
        substr([Col_A], 1, instr([Col_A] || ' ', ' ')-1) FirstWord 
      from df")
    

    2) Base R

    In base R remove the first space and everything after it:

    transform(df, FirstWord = sub(" .*", "", Col_A))
    

    Note

    df <-
    structure(list(`Col_A` = c("ABC JEn JDn", "nemmdn mend", "pwek ne me ne"
    ), `ColB` = c(9L, 10L, 12L)), class = "data.frame", row.names = c(NA, 
    -3L))