Search code examples
javascriptnode.jsencodingpapaparse

Papa Parse: Parsed JSON key doesn't match expected string


Background: I'm uploading a csv file and a mapping file to my server using FormData and parsing using Papa Parse.

For some reason, Papa Parse's outputted object (which renders correctly using console.log) cannot be indexed by normal strings. I've even tried using JSON.parse(JSON.stringify(...)) on both my string and the object to see if I could normalize it somehow.


import Papa from 'papaparse'
import formidable from 'formidable'
import fs from 'fs'

...

const { files, fields } = await parseRequestForm(req)


let parsedMapping: Record<string, string = JSON.parse(fields.mapping as string)

const f = files.file as formidable.File

const output = await new Promise<{ loadedCount: number; totalCount: number }>(
  (resolve) => {
    const filecontent = fs.createReadStream(f.path)
    filecontent.setEncoding('utf8')

    let loadedCount = 0
    let totalCount = 0

    Papa.parse<Record<string, any>>(filecontent, {
      header: true,
      skipEmptyLines: true,
      dynamicTyping: true,
      chunkSize: 25,
      encoding: 'utf8',

      chunk: async (out) => {
        const data = out.data.map((r) => applyMapping(r, parsedMapping))

        totalCount += data.length

        try {
          await prisma.softLead.createMany({ data }).then((x) => {
            loadedCount += x.count
          })
        } catch (e) { }
      },

      complete: () => resolve({ loadedCount, totalCount }),
    })
  }
)


type ParsedForm = {
  error: Error | string
  fields: formidable.Fields
  files: formidable.Files
}

function parseRequestForm(req: NextApiRequest): Promise<ParsedForm> {
  const form = formidable({ encoding: 'utf8' })

  return new Promise((resolve, reject) => {
    form.parse(req, (err, fields, files) => {
      if (err) reject({ err })

      resolve({ error: err, fields, files })
    })
  })
}

function applyMapping(
  data: Record<string, any>,
  mapping: Record<keyof SoftLead, string>
): Partial<SoftLead> {
  return Object.fromEntries(
    Object.entries(mapping).map(([leadField, csvField]) => {

      // Struggling to access field here

      console.log('Field', `"${csvField}"`)
      console.log('Data', data)

      const parsed = JSON.parse(JSON.stringify(data))

      console.log(Buffer.from(Object.keys(parsed)[0]))
      console.log(Buffer.from(Buffer.from(csvField).toString('utf8')))
      
      console.log(parsed[csvField]) // undefined

      return [leadField, data[csvField]]
    })
  )
}

The Buffer lines are also indicating that the strings are not the same, even though they print the same to the console.

Papaparse's Index

  • Buffer.from(Object.keys(parsed)[0]) => <Buffer ef bb bf 45 6d 61 69 6c 73>

Map object key

  • Buffer.from(Buffer.from(csvField).toString('utf8')) => <Buffer 45 6d 61 69 6c 73>

A normal string

  • Buffer.from(Buffer.from('Emails').toString('utf-8')) => <Buffer 45 6d 61 69 6c 73>

Updates

  • 1: I also tried to set the encoding to utf16le but I think its failing altogether to parse because FormData apparently exclusively does utf8

Solution

  • I was able to solve this problem by stripping the BOM as described here. Simply,

    const parsed = Object.fromEntries(
      Object.entries(data).map(([k, v]) => [stripBom(k), v])
    )
    
    export default function stripBom(str: string) {
      if (str.charCodeAt(0) === 0xfeff) {
        return str.slice(1)
      }
    
      return str
    }