Search code examples
pythonmongodbshellurlencodeeve

Eve + MongoDB with special characters


I have a (Mongo) database with locations from multiple planets/moons/asteroids. My db is called nomenclature and the collection is centroids. Here is a sample of the documents in this collection:

[
{
  "name":"kachina chasmata",
  "location":{
    "type":"Point",
    "coordinates":[-116.65,-32.6]
  },
  "body":"ariel"
},
{
  "name":"hokusai",
  "location":{
    "type":"Point",
    "coordinates":[16.65,57.84]
  },
  "body":"mercury"
},
{
  "name":"cañas",
  "location":{
    "type":"Point",
    "coordinates":[89.86,-31.188]
  },
  "body":"mars"
},
{
  "name":"anseris cavus",
  "location":{
    "type":"Point",
    "coordinates":[95.5,-29.708]
  },
  "body":"mars"
}
]

Such db/collection will receive queries on its body and name fields.

You may have noticed the whitespaces and special characters ("ñ") in (name) some documents. That is precisely where my question is.

I am using eve to publish this db/collection through a read-only (GET) interface. With the following DOMAIN in the settings,

DOMAIN = {
    'centroids': {
        'item_title': 'crater centroid',
        'url': 'centroid/<regex("[\w]+"):body>/<regex("[\w ]+"):name>'
    }
}

, Eve answers just fine to a request like:

$ curl 'http://127.0.0.1:5000/centroid/mercury/hokusai'

or,

$ curl 'http://127.0.0.1:5000/centroid/mars/anseris%20cavus'

when there is whitespace in name (notice the whitespace in the settings for name <regex("[\w ]+"):name>).

The question is: how should I handle special characters -- like ñ -- in such environment? Who should handle encoding/decoding: the user, the interface (Eve) or the database (MongoDB)?


Solution

  • OK, I got it. I figured it out when I tested the app through the browser, where everything works fine. I'll keep this question here and add the answer here 'cause may be useful to somebody; I learned and hopefully others will too.

    TL;DR: the client is the responsible to encode the query, and then decode the result if necessary.

    After my first tests using the command-line I did it through my web browser, where the url http://localhost:5000/centroid/mars/cañas has no problem in being accepted by the app/db (Eve/MongoDB) and give back the answer:

    <resource>
      <resource>
        <body>mars</body>
        <lat>-31.188</lat>
        <lon>89.86</lon>
        <name>cañas</name>
      </resource>
    </resource>
    

    Great. But now I wanted to know how to do that from the terminal/bash.

    First I googled for "url encode" and there was this little lice tool to play with: https://meyerweb.com/eric/tools/dencoder/. Which encoded cañas for me (it uses Javascript encodeURIComponent()), and now I could try with curl:

    $ curl -s 'http://127.0.0.1:5000/centroid/mars/ca%C3%B1as%0A' | json_pp
    {
       "_items" : [
          {
             "lat" : -31.188,
             "body" : "mars",
             "lon" : 89.86,
             "name" : "ca�as"
          }
       ]
    }
    

    Good. The answer is in ISO-8859.

    Then I wanted to do the encoding myself, in my terminal. A second search, now for "encode utf8 bash" brought me to this post: https://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/, where I learned about the iconv tool.

    Using iconv I had a different encoded string -- I did UTF-8 to ISO-8859-1 --, but everything worked just fine:

    $ echo 'http://127.0.0.1:5000/centroid/mars/cañas' | file -
    /dev/stdin: UTF-8 Unicode text
    $ URL=$(echo 'http://127.0.0.1:5000/centroid/mars/cañas' | iconv -f UTF-8 -t ISO-8859-1 -)
    $ echo $URL
    http://127.0.0.1:5000/centroid/mars/ca�as
    $ echo $URL | file -
    /dev/stdin: ISO-8859 text
    $ curl -s $URL | json_pp
    {
       "_items" : [
          {
             "lat" : -31.188,
             "lon" : 89.86,
             "body" : "mars",
             "name" : "ca�as"
          }
       ]
    }