I have the following simple SQL server code:
set nocount on;
if OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A;
set nocount on;
create table #A
( obj_id int,
obj_name varchar(50),
obj_dt datetime);
insert into #A (
obj_id,
obj_name,
obj_dt)
values
( 1
,'name'
,'2019-01-01 00:00:00'
),
( 2
,NULL
,NULL
),
( 2
,'alias'
,'2019-02-01 00:00:00'
);
set nocount on;
if OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B;
set nocount on;
select
#A.obj_id
,subq.obj_name
,subq.obj_dt
into #B
from #A
join (select
obj_id,
max(obj_name) as obj_name,
max(obj_dt) as obj_dt
from #A
group by obj_id) as subq
on #A.obj_id = subq.obj_id;
set nocount on;
select * from #B;
which, as expected, returns the following data when executed in Microsoft SQL Server Management Studio:
obj_id obj_name obj_dt
1 name 2019-01-01 00:00:00.000
2 alias 2019-02-01 00:00:00.000
2 alias 2019-02-01 00:00:00.000
So far so good. Now I wish to run this code from R and have that same output returned to R. I have the query above stored in the string query
and my RODBC connection stored in the variable connection
, and attempt to retrieve the data with
sqlQuery(connection,query)
The result is character(0)
. However, if I modify the query above by commenting out the subq.obj_name
and subq.obj_dt
fields in the definition of #B
then the code successfully returns the expected dataset
obj_id
1 1
2 2
3 2
from R.
So what is going on here? Both sql queries are valid and run successfully in the Microsoft SQL server environment, but only one works when piped through R. I can't figure out what accounts for the failure of the RODBC code to handle the second query.
Okay, so I think I've figured out what is going wrong here. The subquery
select
obj_id,
max(obj_name) as obj_name,
max(obj_dt) as obj_dt
from #A
group by obj_id
produces a hidden warning. You don't see the warning if you just run the code as is, but if you store the output in a temporary table then the warning message is produced:
select
obj_id,
max(obj_name) as obj_name,
max(obj_dt) as obj_dt
into #C
from #A
group by obj_id
Warning: Null value is eliminated by an aggregate or other SET operation.
The warning is hidden when this is run as part of the subquery in the original SQL code in the question. I believe this message is somehow part of output that R is "seeing", and once R sees that output it terminates the query. But since no results have been returned yet the output in R is empty (i.e. character(0)
).
To solve this issue I coalesced the variables that I'm computing the max of to some minimal values (I'm not sure about what the minimal character is in sql server collation but '0'
worked for my purposes). The idea is to remove NULL
values before aggregation so no warning is generated. The final working SQL code is below:
set nocount on;
if OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A;
set nocount on;
create table #A
( obj_id int,
obj_name varchar(50),
obj_dt datetime);
insert into #A (
obj_id,
obj_name,
obj_dt)
values
( 1
,'name'
,'2019-01-01 00:00:00'
),
( 2
,NULL
,NULL
),
( 2
,'alias'
,'2019-02-01 00:00:00'
);
set nocount on;
if OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B;
set nocount on;
select
#A.obj_id
,subq.obj_name
,subq.obj_dt
into #B
from #A
join
(select
obj_id,
max(isnull(obj_name,'0')) as obj_name,
max(isnull(obj_dt,cast(-1 as datetime))) as obj_dt
from #A
group by obj_id) as subq
on #A.obj_id = subq.obj_id;
set nocount on;
select * from #B;
I believe this behavior should be addressed in the RODBC package as it is quite likely to trip up others and can be a bit tricky to track down the root cause and troubleshoot.