I am trying Papaparse and have a large CSV (tab delimited)
Code looks like
const fs = require('fs');
const papa = require('papaparse');
const csvFile = fs.createReadStream('mylargefile.csv');
papa.parse(csvFile, {
header: true,
delimiter: "\t",
dynamicTyping: true,
quoteChar: '"',
escapeChar: '"',
step: function(row:any) {
console.log("Row:", row.data);
},
complete: function() {
console.log("All done!");
}
});
The file looks like:
vehicle_id id_101 id_102 id_104 id_120 id_103 id_113 id_106 id_108 id_109 id_107 id_26801 id_117 id_111 id_128 id_112 id_129 id_130 id_302 id_131 id_403 id_402 id_404 id_405 id_114 id_602 id_605 id_603 id_606 id_604 id_607 id_608 id_609 id_6502 id_20602 id_7403 id_15303 id_15304 id_8702 id_7502 id_702 id_901 id_931 id_951 id_902 id_903 id_904 id_905 id_906 id_132 id_110 id_115 id_125 id_116 id_502 id_121 id_122 id_123 id_28101 id_28001 id_301 id_303 id_304 id_316 id_318 id_319 id_317 id_305 id_306 id_307 id_308 id_309 id_310 id_311 id_312 id_313 id_314 id_315 id_100901 id_100902 id_100903 id_100904 id_100905 id_100906 id_174 id_176
24621920240201 246219 GR 20240201 20240201 20240430 99991231 4 2024 S 233626 MERCEDES-BENZ MERCEDES Mercedes-Benz SPRINTER Sprinter Kastenwagen 2.0 CDI 84KW 311 PRO LWB HRF 311 CDI A3 HT Pro Skåp - 300 PRO Pro B1 - 4 4 PV PV R HI LO L R M 2 84 114 D T 2 SEK SEK SEK 568875 455100 568875 568875 568875 - D 20180901 20180901 V 90763513SPP0001-0 20240430 MERCEDES SPRINTER S na 20240201 - W 3 24621924 EUR 48845.57 39076.45 48845.57 48845.57 48845.57 -13 -74
32604520091019 326045 AB 20091019 20091019 20091208 99991231 4 2010 S 326045 GEELY GROUP VOLVO VOLVO S40 S40 - 2.4 140 AUTO 2.4 140 Auto - - - B1 - 4 4 SA SA SH L F A 2.4 103 140 U 5 SEK 234900 187920 237900 237900 237900 - B 20060515 20060515 C - 20091208 - OTH VOLVO V40 S 20091019 20091023 - I 2 2.4 140 32604502 EUR 22381.66 17905.33 22667.51 22667.51 22667.51 -5 -63
32618020240111 326180 GR 20240111 20240111 20240412 99991231 4 2023 S 335826 SUBARU SUBARU Subaru OUTBACK Outback - 2.5I ADVENTURE AUTO 4WD 2.5i Adventure - ADVENTURE Adventure B1 - 5 5 ES ES SH L 4 A 2.5 124 169 U 5 SEK SEK SEK 419900 335920 419900 419900 419900 - J 20210506 20210506 C 20240412 SUBARU LEGACY S na 20240111 - CW 6 32618019 EUR 36397.52 29118.02 36397.52 36397.52 36397.52 -5 -63
Everything looks good except for the first field vehicle_id
Console output:
Row: {
'vehicle_id': 843104120240527,
id_101: 8431041,
id_102: 'GR',
id_104: 20240527,
id_120: 20240527,
id_103: 20240531,
id_113: 99991231,
id_106: 'P',
id_108: 2024,
id_109: 'S',
id_107: 305384,
id_26801: null,
id_117: 'TOYOTA',
id_111: 'TOYOTA',
id_128: 'Toyota',
id_112: 'PROACE CITY',
id_129: 'Proace City',
How can I get rid of the single quotes for 'vehicle_id'?
Looks like it was UTF8-BOM causing it Used this small util to fix it
var bomstrip = require('bomstrip');
const csvFile = fs.createReadStream('mylargefile.csv').pipe(new bomstrip());