I have a (Mongo) database with locations from multiple planets/moons/asteroids.
My db is called nomenclature
and the collection is centroids
.
Here is a sample of the documents in this collection:
[
{
"name":"kachina chasmata",
"location":{
"type":"Point",
"coordinates":[-116.65,-32.6]
},
"body":"ariel"
},
{
"name":"hokusai",
"location":{
"type":"Point",
"coordinates":[16.65,57.84]
},
"body":"mercury"
},
{
"name":"cañas",
"location":{
"type":"Point",
"coordinates":[89.86,-31.188]
},
"body":"mars"
},
{
"name":"anseris cavus",
"location":{
"type":"Point",
"coordinates":[95.5,-29.708]
},
"body":"mars"
}
]
Such db/collection will receive queries on its body
and name
fields.
You may have noticed the whitespaces and special characters ("ñ
") in (name
) some documents. That is precisely where my question is.
I am using eve to publish this db/collection through a read-only (GET
) interface.
With the following DOMAIN
in the settings,
DOMAIN = {
'centroids': {
'item_title': 'crater centroid',
'url': 'centroid/<regex("[\w]+"):body>/<regex("[\w ]+"):name>'
}
}
, Eve answers just fine to a request like:
$ curl 'http://127.0.0.1:5000/centroid/mercury/hokusai'
or,
$ curl 'http://127.0.0.1:5000/centroid/mars/anseris%20cavus'
when there is whitespace in name
(notice the whitespace in the settings for name <regex("[\w ]+"):name>
).
The question is: how should I handle special characters -- like ñ
-- in such environment? Who should handle encoding/decoding: the user, the interface (Eve) or the database (MongoDB)?
OK, I got it. I figured it out when I tested the app through the browser, where everything works fine. I'll keep this question here and add the answer here 'cause may be useful to somebody; I learned and hopefully others will too.
TL;DR: the client is the responsible to encode the query, and then decode the result if necessary.
After my first tests using the command-line I did it through my web browser, where the url http://localhost:5000/centroid/mars/cañas
has no problem in being accepted by the app/db (Eve/MongoDB) and give back the answer:
<resource>
<resource>
<body>mars</body>
<lat>-31.188</lat>
<lon>89.86</lon>
<name>cañas</name>
</resource>
</resource>
Great. But now I wanted to know how to do that from the terminal/bash.
First I googled for "url encode" and there was this little lice tool to play with: https://meyerweb.com/eric/tools/dencoder/. Which encoded cañas
for me (it uses Javascript encodeURIComponent()), and now I could try with curl
:
$ curl -s 'http://127.0.0.1:5000/centroid/mars/ca%C3%B1as%0A' | json_pp
{
"_items" : [
{
"lat" : -31.188,
"body" : "mars",
"lon" : 89.86,
"name" : "ca�as"
}
]
}
Good. The answer is in ISO-8859
.
Then I wanted to do the encoding myself, in my terminal.
A second search, now for "encode utf8 bash" brought me to this post: https://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/, where I learned about the iconv
tool.
Using iconv
I had a different encoded string -- I did UTF-8
to ISO-8859-1
--, but everything worked just fine:
$ echo 'http://127.0.0.1:5000/centroid/mars/cañas' | file -
/dev/stdin: UTF-8 Unicode text
$ URL=$(echo 'http://127.0.0.1:5000/centroid/mars/cañas' | iconv -f UTF-8 -t ISO-8859-1 -)
$ echo $URL
http://127.0.0.1:5000/centroid/mars/ca�as
$ echo $URL | file -
/dev/stdin: ISO-8859 text
$ curl -s $URL | json_pp
{
"_items" : [
{
"lat" : -31.188,
"lon" : 89.86,
"body" : "mars",
"name" : "ca�as"
}
]
}