Search code examples
rpostgresqldplyramazon-redshiftr-dbi

Filter table from redshift database using R dplyr


I have a table saved in AWS redshift that has lots of rows and I want to collect only a subset of them using a "user_id" column. I am trying to use R with the dplyr library to accomplish this (see below).

conn_dplyr <- src_postgres('dev',
                       host = '****',
                       port = ****,
                       user = "****", 
                       password = "****")
 df <- tbl(conn_dplyr, "redshift_table")

However, when I try to subset over a collection of user ids it fails (see below). Can someone help me understand how I might be able to collect the data table over a collection of user id elements? The individual calls work, but when I combine them both it fails. In this case there are only 2 user ids, but in general it could be hundreds or thousands, so I don't want to do each one individually. Thanks for your help.

df_subset1 <- filter(df, user_id=="2239257806")
df_subset1 <- collect(df_subset1)

df_subset2 <- filter(df, user_id=="22159960")
df_subset2 <- collect(df_subset2)

df_subset_both <- filter(df, user_id==c("2239257806", "22159960"))
df_subset_both <- collect(df_subset_both)

Error in postgresqlExecStatement(conn, statement, ...) : 
RS-DBI driver: (could not Retrieve the result : ERROR:  operator does not     exist: character varying = record
HINT:  No operator matches the given name and argument type(s). You may need to add explicit type casts.
)

Solution

  • Try this:

    df_subset_both <- filter(df, user_id %in% c("2239257806", "22159960"))