Search code examples
typescriptpapaparse

Parse CSV and first field gets quoted


I am trying Papaparse and have a large CSV (tab delimited)

Code looks like

const fs = require('fs');
const papa = require('papaparse');
const csvFile = fs.createReadStream('mylargefile.csv');
papa.parse(csvFile, {
  header: true,
  delimiter: "\t",
  dynamicTyping: true,
  quoteChar: '"',
  escapeChar: '"',
  step: function(row:any) {
    console.log("Row:", row.data);
  },
  complete: function() {
    console.log("All done!");
  }
});

The file looks like:

vehicle_id  id_101  id_102  id_104  id_120  id_103  id_113  id_106  id_108  id_109  id_107  id_26801    id_117  id_111  id_128  id_112  id_129  id_130  id_302  id_131  id_403  id_402  id_404  id_405  id_114  id_602  id_605  id_603  id_606  id_604  id_607  id_608  id_609  id_6502 id_20602    id_7403 id_15303    id_15304    id_8702 id_7502 id_702  id_901  id_931  id_951  id_902  id_903  id_904  id_905  id_906  id_132  id_110  id_115  id_125  id_116  id_502  id_121  id_122  id_123  id_28101    id_28001    id_301  id_303  id_304  id_316  id_318  id_319  id_317  id_305  id_306  id_307  id_308  id_309  id_310  id_311  id_312  id_313  id_314  id_315  id_100901   id_100902   id_100903   id_100904   id_100905   id_100906   id_174  id_176
24621920240201  246219  GR  20240201    20240201    20240430    99991231    4   2024    S   233626      MERCEDES-BENZ   MERCEDES    Mercedes-Benz   SPRINTER    Sprinter    Kastenwagen 2.0 CDI 84KW 311 PRO LWB HRF    311 CDI A3 HT Pro Skåp  -   300 PRO Pro B1  -   4   4   PV  PV  R   HI  LO  L   R   M   2   84  114 D   T   2   SEK SEK SEK 568875  455100  568875  568875  568875  -   D   20180901    20180901    V   90763513SPP0001-0       20240430            MERCEDES SPRINTER   S   na  20240201    -           W           3                       24621924            EUR 48845.57    39076.45    48845.57    48845.57    48845.57    -13 -74
32604520091019  326045  AB  20091019    20091019    20091208    99991231    4   2010    S   326045      GEELY GROUP VOLVO   VOLVO   S40 S40 -   2.4 140 AUTO    2.4 140 Auto    -   -   -   B1  -   4   4   SA  SA          SH  L   F   A   2.4 103 140 U       5   SEK         234900  187920  237900  237900  237900  -   B   20060515    20060515    C   -       20091208    -   OTH VOLVO V40   S   20091019    20091023    -           I           2               2.4 140     32604502            EUR 22381.66    17905.33    22667.51    22667.51    22667.51    -5  -63
32618020240111  326180  GR  20240111    20240111    20240412    99991231    4   2023    S   335826      SUBARU  SUBARU  Subaru  OUTBACK Outback -   2.5I ADVENTURE AUTO 4WD 2.5i Adventure  -   ADVENTURE   Adventure   B1  -   5   5   ES  ES          SH  L   4   A   2.5 124 169 U       5   SEK SEK SEK 419900  335920  419900  419900  419900  -   J   20210506    20210506    C           20240412            SUBARU LEGACY   S   na  20240111    -           CW          6                       32618019            EUR 36397.52    29118.02    36397.52    36397.52    36397.52    -5  -63

Everything looks good except for the first field vehicle_id

Console output:

Row: {
  'vehicle_id': 843104120240527,
  id_101: 8431041,
  id_102: 'GR',
  id_104: 20240527,
  id_120: 20240527,
  id_103: 20240531,
  id_113: 99991231,
  id_106: 'P',
  id_108: 2024,
  id_109: 'S',
  id_107: 305384,
  id_26801: null,
  id_117: 'TOYOTA',
  id_111: 'TOYOTA',
  id_128: 'Toyota',
  id_112: 'PROACE CITY',
  id_129: 'Proace City',

How can I get rid of the single quotes for 'vehicle_id'?


Solution

  • Looks like it was UTF8-BOM causing it Used this small util to fix it

    var bomstrip = require('bomstrip');
    const csvFile = fs.createReadStream('mylargefile.csv').pipe(new bomstrip());