Is there a clean way to change the default "/json" postfix option on data.frames to be column-based versus row-based?
Data.frames in R, if I understand correctly, are really just named lists where each list is the same length as the others. Using jsonlite
, it's simple to show the difference (trivial example, yes):
library(jsonlite)
ll <- list(xx=1:3, yy=6:8)
dd <- data.frame(xx=1:3, yy=6:8)
toJSON(dd)
# [1] "[ { \"xx\" : 1, \"yy\" : 6 }, { \"xx\" : 2, \"yy\" : 7 }, { \"xx\" : 3, \"yy\" : 8 } ]"
toJSON(ll)
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
toJSON(dd, dataframe='column')
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
toJSON(as.list(dd))
# [1] "{ \"xx\" : [ 1, 2, 3 ], \"yy\" : [ 6, 7, 8 ] }"
where the last three are identical. It's easy to force it to look the same either by using the dataframe
argument to toJSON
or by coercing the data.frame
into a list
.
Using OpenCPU's API, the calls look similar:
$ curl http://localhost:7177/ocpu/library/base/R/list/json -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
{
"xx" : [
1,
2,
3
],
"yy" : [
6,
7,
8
]
}
$ curl http://localhost:7177/ocpu/library/base/R/data.frame/json -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
[
{
"xx" : 1,
"yy" : 6
},
{
"xx" : 2,
"yy" : 7
},
{
"xx" : 3,
"yy" : 8
}
]
If I want the data.frame
itself to be JSON-ified column-based then I need to coerce it to a list
:
$ curl http://localhost:7177/ocpu/library/base/R/data.frame -H "Content-Type: application/json" -d '{ "xx":[1,2,3], "yy":[6,7,8] }'
/ocpu/tmp/x000a0fb8/R/.val
/ocpu/tmp/x000a0fb8/stdout
/ocpu/tmp/x000a0fb8/source
/ocpu/tmp/x000a0fb8/console
/ocpu/tmp/x000a0fb8/info
$ curl http://localhost:7177/ocpu/library/base/R/as.list/json -d "x=x000a0fb8"
{
"xx" : [
1,
2,
3
],
"yy" : [
6,
7,
8
]
}
Three questions:
Is there a way to change the default behavior of the OpenCPU auto-JSON-ification to be column-based?
Is there a reason (besides "had to default to something") that it defaults to row-based? (So that I can better understand the underpinnings and efficiencies, not meant as a challenge.)
This is all academic, though, since most (if not all) libraries accepting the JSON output will understand and translate between the formats transparently. Right?
(Win7 x64, R 3.0.3, opencpu 1.2.3, jsonlite 0.9.4)
(PS: Thanks, Jeroen, OpenCPU is awesome! The more I play, the more I like.)
For dataframe
objects you can use HTTP GET
and set the dataframe
argument:
GET http://localhost:7177/ocpu/tmp/x000a0fb8/json?dataframe=rows
For example the Boston
object from the MASS
package is a dataframe as well:
https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns
https://cran.ocpu.io/MASS/data/Boston/json?dataframe=rows
For HTTP GET
requests to a .../json
endpoint, all the http parameters are mapped to arguments in the toJSON
function from the jsonlite package. You can can also specify other toJSON
arguments:
https://cran.ocpu.io/MASS/data/Boston/json?dataframe=columns&digits=4
To see which arguments are available, have a look at the jsonlite manual or this post.
Note that this only works if you do the 2 step procedure: first a HTTP POST
on a function that returns a dataframe
, followed by retrieving that object in json
format with a HTTP GET
request. You can not specify toJSON
parameters when you do the 1-step shortcut where you fix the POST
request with /json
, because in POST
requests the HTTP parameters always get mapped to the function call.
The reason for this default is that the row based design seems to be the most conventional and interoperable way of encoding tabular data. The jsonlite paper/vignette goes into some more detail. Note that it also works the other way around: you don't have to call the data.frame
function to create a dataframe, just posting an argument in the form:
[{"xx":1,"yy":6},{"xx":2,"yy":7},{"xx":3,"yy":8}]
will automatically turn it into a data frame:
curl https://public.opencpu.org/ocpu/library/base/R/summary/console -d object='[{"xx":1,"yy":6},{"xx":2,"yy":7},{"xx":3,"yy":8}]'