I am working on tranferring SQL code over to R.
I have a dataset and want to extract the first word from a column in a new column.
Dataset
df
Col_A ColB
ABC JEn JDn 9
nemmdn mend 10
pwek ne me ne 12
Expected Output:
Col_A ColB Col_A_Abbreviation
ABC JEn JDn 9 ABC
nemmdn mend 10 nemmdn
pwek ne me ne 12 pwek
Below is what I have currently:
library(DBI)
library(RSQLite)
mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "df_1", df)
dbGetQuery(mydb,
'SELECT IF(INSTRB(Col_A, \' \', 1, 1) >0,
SUBSTR(Col_A, 1, INSTRB(Col_A, \' \', 1, 1) - 1),
Col_A) AS Col_A_Abbreviation FROM df_1')
Here is my error message:
Error in result_create(conn@ptr, statement) : no such function: IF
Does RSQLite have limitations on what functions it can use?
1) sqldf/RSQLite If you are using the SQLite backend then you can only use functions supported by RSQLite. Try this where df
is defined reproducibly in the Note at the end.
library(sqldf)
sqldf("select
*,
substr([Col_A], 1, instr([Col_A], ' ') - 1) FirstWord
from df")
giving:
Col_A ColB FirstWord
1 ABC JEn JDn 9 ABC
2 nemmdn mend 10 nemmdn
3 pwek ne me ne 12 pwek
1a) If there can be entries with only one word then append a space to the end to ensure that there is always at least one space to match.
sqldf("select
*,
substr([Col_A], 1, instr([Col_A] || ' ', ' ')-1) FirstWord
from df")
2) Base R
In base R remove the first space and everything after it:
transform(df, FirstWord = sub(" .*", "", Col_A))
df <-
structure(list(`Col_A` = c("ABC JEn JDn", "nemmdn mend", "pwek ne me ne"
), `ColB` = c(9L, 10L, 12L)), class = "data.frame", row.names = c(NA,
-3L))