I am trying to copy my dataframe results to Impala DB. But I am getting error while doing so.
library(RJDBC)
library(implyr)
drv <- JDBC("com.cloudera.impala.jdbc41.Driver","/User/ImpalaJDBC41.jar",identifier.quote="`")
conn <- dbConnect(drv, "username/password")
RJDBC::dbWriteTable(conn, 'default.segments', df)
I get below error.
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate ([Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Syntax error in line 1:
...ents (id DOUBLE PRECISION,eventdate VARCH...
^
Encountered: IDENTIFIER
Expected: BLOCK_SIZE, COMMENT, COMPRESSION, DEFAULT, ENCODING, INTERMEDIATE, LOCATION, NOT, NULL, PRIMARY, COMMA
CAUSED BY: Exception: Syntax error
), Query: CREATE TABLE default.segments (id DOUBLE
PRECISION,eventdate VARCHAR(255),segment INTEGER).)
Assuming something is wrong with datatypes. I have created table by specifying the datatypes and then inserting values to the DB.
RJDBC::dbSendUpdate(conn, paste("CREATE TABLE default.segments (id bigint,eventdate timestamp, segment bigint)",";"))
state1 <- paste0("INSERT INTO default.segments VALUES (", apply(df, 1, function(x) paste(x, collapse = ",")), ")" )
RJDBC::dbSendUpdate(conn, state1)
and this also gives me error with related to datatypes.
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate ([Cloudera]
[ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0,
SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000,
errorMessage:AnalysisException: Target table
'default.segments' is incompatible with source expressions.
Expression '2016 - 5 - 29' (type: BIGINT) is not compatible with column
'eventdate' (type: TIMESTAMP)
), Query: INSERT INTO default.segments VALUES ( 3,2016-
05-29, 79).)
below is the structure of my dataframe.
> str(df)
'data.frame': 19065 obs. of 3 variables:
$ id: num 3 3 3 69 102 102 102 102 102 102 ...
$ eventdate: Date, format: "2016-05-29" ...
$ segment: int 79 76 76 18 11 15 7 11 7 11 ...
In the last error it says Expression '2016 - 5 - 29' (type: BIGINT) is not compatible with column
'eventdate' (type: TIMESTAMP)
but my date column in dataframe is of Date
format. Then what could be the issue? Can someone please help.
The dates should be provided in quotes. You may transform the column before inserting:
df$eventdate <- paste0("'", df$eventdate, "'")
or, alternatively,
df$eventdate <- sQuote(df$eventdate)
otherwise it is recognized as an integer type instead.