Python CSV to JSON parser add quotes to output

I have a CSV to JSON Python script thanks to the user Petri that let's me convert a Geonames CSV dump into a MongoImport-friendly JSON.

The problem is that Geonames has a field called alternatenames that is currently quoted and treated as one long string. Therefore it cannot be queried properly in MongoDB. I would like to change the field to a string array such as: "alternatenames":["name1", "name2"]

The Python script looks like this:

import csv, simplejson, decimal, codecs

data = open("cities.txt")
reader = csv.DictReader(data, delimiter=",", quotechar='"')

with codecs.open("cities.json", "w", encoding="utf-8") as out:
   for r in reader:
      for k, v in r.items():
         # make sure nulls are generated
         if not v:
            r[k] = None
         # parse and generate decimal arrays
         elif k == "loc":
            r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")]
         # generate a number
         elif k == "geonameid":
            r[k] = int(v)
      out.write(simplejson.dumps(r, ensure_ascii=False, use_decimal=True)+"\n")

My CSV has the following fields:

"geonameid","name","asciiname","alternatenames","loc","feature_class","feature_code","country_code","cc2","admin1_code","admin2_code","admin3_code","admin4_code"
3,"Zamīn Sūkhteh","Zamin Sukhteh","Zamin Sukhteh,Zamīn Sūkhteh","[48.91667,32.48333]","P","PPL","IR",,"15",,,
5,"Yekāhī","Yekahi","Yekahi,Yekāhī","[48.9,32.5]","P","PPL","IR",,"15",,,
7,"Tarvīḩ ‘Adāī","Tarvih `Adai","Tarvih `Adai,Tarvīḩ ‘Adāī","[48.2,32.1]","P","PPL","IR",,"15",,,

My current JSON output looks like this:

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Zamin Sukhteh,Zamīn Sūkhteh", "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Yekahi,Yekāhī", "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Tarvih `Adai,Tarvīḩ ‘Adāī", "asciiname": "Tarvih `Adai", "admin4_code": null}

I would like to change the JSON output to add a string array as follows (scroll to the right to alternatenames):

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Zamin Sukhteh", "Zamīn Sūkhteh"], "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Yekahi,Yekāhī"], "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Tarvih `Adai", "Tarvīḩ ‘Adāī"], "asciiname": "Tarvih `Adai", "admin4_code": null}

Also, should I change my quotechar in my Access 2010-exported CSV to ^ instead of " to avoid double quoting?

Thanks for any help.

Solution

Add another "elif" to your existing ones to handle the "alternatenames":

     elif k == "alternatenames":
        r[k] = [name.strip() for name in v.split(",")]

So first split the string on commas and then strip off the whitespace at the start/end.