Search code examples
sasdata-integration

Blob fields in SAS gets truncated


I have been working on a SAS job that extracts a table from SQL server and then loads that table to an Oracle table.

One of the fields there in SQL server is blob and they can be as big as 1G. I am getting length warnings when I run this blobs on oracle table seems to be truncated and as a result files there are corrupt.

I have seen SAS stating that character variable can be max 32K but SAS also states it can access blobs up to 2G.

How can we achieve that?

proc sql;
create view work.W2K3NU8 as
  select
     ID,
     DNUMBER,
     FILENAME,
     FILE   
        format = $HEX2048.
        informat = $HEX2048.,
     (input(compress(DATEENTERED),YYMMDD10.)) as DATEENTERED length = 8
        format = date.
        informat = date.
        label = 'DATEENTERED',
     (input(compress(DATEADDED),YYMMDD10.)) as DATEADDED length = 8
        format = date.
        informat = date.
        label = 'DATEADDED',
     (input(compress(DATECHANGED),YYMMDD10.)) as DATECHANGED length = 8
        format = date.
        informat = date.
        label = 'DATECHANGED',
     TYPE
from &SYSLAST;
quit;

AND here is data step

      data trd.GAFILES
          (dbnull = (
                     ID = NO
                     DNUMBER = YES
                     FILENAME = YES
                     GA_FILE = YES
                     DATEENTERED = YES
                     DATAADDED = YES
                     DATECHANGED = YES
                     TYPE = YES
                     ETL_CREATE = YES
                     ETL_UPDATE = YES));
     attrib ID length = $255
        format = $255.
        informat = $255.
        label = 'ID'; 
     attrib DNUMBER length = $10
        format = $10.
        informat = $10.
        label = 'DNUMBER'; 
     attrib FILENAME length = $255
        format = $255.
        informat = $255.
        label = 'FILENAME'; 
     attrib GA_FILE length = $4096
        format = $HEX2048.
        informat = $HEX2048.
        label = 'GA_FILE'; 
     attrib DATEENTERED length = 8
        format = DATETIME20.
        informat = DATETIME20.
        label = 'DATEENTERED'; 
     attrib DATAADDED length = 8
        format = DATETIME20.
        informat = DATETIME20.
        label = 'DATAADDED'; 
     attrib DATECHANGED length = 8
        format = DATETIME20.
        informat = DATETIME20.
        label = 'DATECHANGED'; 
     attrib TYPE length = $100
        format = $100.
        informat = $100.
        label = 'TYPE'; 
     attrib ETL_CREATE length = 8
        format = DATETIME20.
        informat = DATETIME20.
        label = 'ETL_CREATE'; 
     attrib ETL_UPDATE length = 8
        format = DATETIME20.
        informat = DATETIME20.
        label = 'ETL_UPDATE'; 
     call missing(of _all_);
     stop;
  run;

Solution

  • SAS datasets won't support > 32767 character size. I'm not sure where you saw that it supports greater; you're probably reading the ACCESS reference, which is describing the different data types of the DBMSs (ie, in the DB2 section, it describes BLOBs and CLOBs as allowing up to 2GB in size, but that's describing what DB2 supports - not SAS's support).

    SAS will happily access BLOBs, but it won't take more than 32767 from it. You would have to read in chunks, or use DBMS-specific language in a pass through session (which would have to pass it through without touching it). You can read in chunks like so (fill in appropriate substring function and connection information):

    proc sql;
    connect to <>;
    create table SASTBL as 
      select * from connection to <> (
       select substring_Function(blobfield,1,32767) as blob_1,
              substring_Function(blobfield,32768,32767) as blob_2,
              substring_Function(blobfield,65535,32767) as blob_3,
    (... etc ... )
      from your_tbl;
    );
    quit;
    

    If you have 9.4, you also might be able to use FedSQL to do the conversion; I'm not very familiar with FedSQL, but the purpose of it is to support more data types than SAS supports. It doesn't explicitly say that it can support BLOBs (the notes on BLOB are consistently 'mapped to a similar data type', which presumably means char or varchar) but it might be worth a shot if you have 9.4.