Search code examples
rexportquotesfoxprovisual-foxpro

Copy to without quotes


I have a large dataset in dbf file and would like to export it to the csv type file. Thanks to SO already managed to do it smoothly.

However, when I try to import it into R (the environment I work) it combines some characters together, making some rows much longer than they should be, consequently breaking the whole database. In the end, whenever I import the exported csv file I get only half of the db. Think the main problem is with quotes in string characters, but specifying quote="" in R didn't help (and it helps usually).

I've search for any question on how to deal with quotes when exporting in visual foxpro, but couldn't find the answer. Wanted to test this but my computer catches error stating that I don't have enough memory to complete my operation (probably due to the large db).

Any helps will be highly appreciated. I'm stuck with this problem on exporting from the dbf into R for long enough, searched everything I could and desperately looking for a simple solution on how to import large dbf to my R environment without any bugs.

(In R: Checked whether have problems with imported file and indeed most of columns have much longer nchars than there should be, while the number of rows halved. Read the db with read.csv("file.csv", quote="") -> didn't help. Reading with data.table::fread() returns error

Expected sep (',') but '0' ends field 88 on line 77980:

But according to verbose=T this function reads right number of rows (read.csv imports only about 1,5 mln rows)

Count of eol after first data row: 2811729 Subtracted 1 for last eol and any trailing empty lines, leaving 2811728 data rows


Solution

  • You might have to write code to do the export, rather than simply using the COPY TO ... DELIMITED command.

    SELECT thedbf
    mfld_cnt = AFIELDS(mflds)
    
    fh = FOPEN(m.filename, 1)
    
    SCAN
    
      FOR aa = 1 TO mfld_cnt
        mcurfld = 'thedbf.' + mflds[aa, 1]
        mvalue = &mcurfld
        ** Or you can use:
        mvalue = EVAL(mcurfld)
    
        ** manipulate the contents of mvalue, possibly based on the field type
        DO CASE
          CASE mflds[aa, 2] = 'D'
            mvalue = DTOC(mvalue)
    
            CASE mflds[aa, 2] $ 'CM'
              ** Replace characters that are giving you problems in R
              mvalue = STRTRAN(mvalue, ["], '')
    
            OTHERWISE
              ** Etc.
        ENDCASE
    
        = FWRITE(fh, mvalue)
        IF aa # mfld_cnt
          = FWRITE(fh, [,])
        ENDIF
    
      ENDFOR
    
      = FWRITE(fh, CHR(13) + CHR(10))
    ENDSCAN
    = FCLOSE(fh)
    

    Note that I'm using [ ] characters to delimit strings that include commas and quotation marks. That helps readability.