Search code examples
pythondataframejuliaquandl

How to use PyCall in Julia to convert Python output to Julia DataFrame


I would like to retrieve some data from quandl and analyse them in Julia. There is, unfortunately, no official API available for this (yet). I am aware of this solution, but it is still quite limited in functionality and doesn't follow the same syntax as the original Python API.

I thought it would be a smart thing to use PyCall to retrieve the data using the official Python API from within Julia. This does yield an output, but I'm not sure how I can convert it to a format that I would be able to use within Julia (ideally a DataFrame).

I have tried the following.

using PyCall, DataFrames
@pyimport quandl

data = quandl.get("WIKI/AAPL", returns = "pandas");

Julia converts this output to a Dict{Any,Any}. When using returns = "numpy" instead of returns = "pandas", I end up with a PyObject rec.array.

How can I get data to be a Julia DataFrame as quandl.jl would return it? Note that quandl.jl is not an option for me because it doesn't support automatic retrieval of multiple assets and lacks several other features, so it's essential that I can use the Python API.

Thank you for any suggestions!


Solution

  • You're running into a difference in Python/Pandas versions. I happen to have two configurations easily available to me; Pandas 0.18.0 in Python 2 and Pandas 0.19.1 in Python 3. The answer @niczky12 provided works well in the first configuration, but I'm seeing your Dict{Any,Any} behavior in the second configuration. Basically, something changes between those two configurations such that PyCall detects a mapping-like interface for Pandas objects and then exposes that interface as a dictionary through an automatic conversion. There are two options here:

    1. Work with the dictionary interface:

      data = quandl.get("WIKI/AAPL", returns = "pandas")
      cols = keys(data)
      df = DataFrame(Any[collect(values(data[c])) for c in cols], map(Symbol, cols))
      
    2. Explicitly disable the auto-conversion and use the PyCall interface to extract the columns as niczky12 demonstrated in the other answer. Note that data[:Open] will do auto-conversion to a mapped dictionary and data["Open"] will just return a PyObject.

      data = pycall(quandl.get, PyObject, "WIKI/AAPL", returns = "pandas")
      cols = data[:columns]
      df = DataFrame(Any[Array(data[c]) for c in cols], map(Symbol, cols))
      

    In both cases, though, note that the all-important date index isn't included in the resulting data frame. You almost certainly want to add that as a column:

    df[:Date] = collect(data[:index])