python numpy python-xarray systemdynamics

Converting a series of xarray.DataArrays to a numpy array

I am using a package called PySD for system dynamics modelling. PySD converts models from Vensim (a system dynamics modelling package) into python and allows the user to replace various equations with more complex routines than Vensim is capable of. I am running a model with a variety of subscripts and this creates an unusual format of the outputs. The outputs are read into a data frame whose individual values for subscripted elements end up being xarray.DataArrays. I am wondering how to take a column of xarray.DataArrays (which ends up being a series) and convert that into a two dimensional array, with the second dimension being the number of subscripts.

import pysd
import numpy as np

model=pysd.load("Example.py")
stocks=model.run
pop=np.array(Population) #Creates an object array
Population=stocks.Populaton #Creates a series

#How to get an array of population values for each country?

The example.py code is as follows (keep in mind this is an arbitrary example to illustrate the problem)

from __future__ import division
import numpy as np
from pysd import utils
import xarray as xr

from pysd.functions import cache
from pysd import functions

_subscript_dict = {
    'Country': ['Canada', 'USA', 'China', 'Norway', 'India', 'England', 
'Mexico', 'Yemen']
}

_namespace = {   
    'TIME': 'time',
    'Time': 'time',
    'Deaths': 'deaths',
    'Births': 'births',
    'Population': 'population',
    'Birth rate': 'birth_rate',
    'Murder rate': 'murder_rate',
    'Natural death rate': 'natural_death_rate',
    'FINAL TIME': 'final_time',
    'INITIAL TIME': 'initial_time',
    'SAVEPER': 'saveper',
    'TIME STEP': 'time_step'
}

@cache('step')
def deaths():

    return murder_rate() * population() + natural_death_rate() * population()


@cache('step')
def births():
    return birth_rate() * population()


@cache('step')
def population():
    return integ_population()


@cache('run')
def birth_rate():

    return utils.xrmerge([
        xr.DataArray(
            data=[5., 5., 5., 5., 5., 5., 5., 5.],
            coords={
                'Country':
                ['Canada', 'USA', 'China', 'Norway', 'India', 'England', 'Mexico', 'Yemen']
            },
            dims=['Country']),
        xr.DataArray(data=[10.], coords={'Country': ['Mexico']}, dims=
['Country']),
        xr.DataArray(data=[8.], coords={'Country': ['Yemen']}, dims=
['Country']),
    ])


@cache('step')
def murder_rate():
    return time()


@cache('run')
def natural_death_rate():
    return utils.xrmerge([
        xr.DataArray(
            data=[3., 3., 3., 3., 3., 3., 3., 3.],
            coords={
                'Country':
                ['Canada', 'USA', 'China', 'Norway', 'India', 'England', 'Mexico', 'Yemen']
            },
            dims=['Country']),
        xr.DataArray(data=[5.], coords={'Country': ['Yemen']}, dims=['Country']),
    xr.DataArray(data=[5.], coords={'Country': ['Mexico']}, dims=['Country']),
    ])


@cache('run')
def final_time():
    return 100


@cache('run')
def initial_time():
    return 0


@cache('step')
def saveper():
    return time_step()


@cache('run')
def time_step():
    return 1


def _init_population():
    return xr.DataArray(
        data=np.ones([8]) * 10,
        coords={
            'Country': ['Canada', 'USA', 'China', 'Norway', 'India', 'England', 'Mexico', 'Yemen']
        },
        dims=['Country'])


@cache('step')
def _dpopulation_dt():
    return births() - deaths()


integ_population = functions.Integ(lambda: _dpopulation_dt(), lambda: _init_population())

My apologies if the example.py file tabs are not in line. Any help would be appreciated!

Solution

Thanks for sharing the example of what this data looks like.

First of all, nesting xarray.DataArray objects as scalars inside a pandas.DataFrame is highly non-standard way of working with xarray and pandas. I don't recommend it. If every entry is a DataArray that shares (some of) the same dimensions, the easiest way to work with your data is as an xarray.Dataset, xarray's version of a multi-dimensional pandas.DataFrame.

That said, it should be straightforward to convert your data from this format into unnested objects are easier to work with. The best place to start is with Series.values, which extract a column as a 1D numpy array. Then you can iterate through the series and convert each DataArray into a numpy array with .values, too. Putting this together:

population_numpy_array = np.stack(
    [data_array.values for data_array in df['Population'].values])

Alternatively, you could stack the DataArray objects using xarray. That would preserve labels, which would make your data easier to work with:

population_data_array = xr.concat(df['Population'].values, dim='row_name')

You could potentially even convert your full object into an xarray.Dataset for joint analysis:

ds = xr.Dataset({k: xr.concat(df[k].values, dim='row_name') for k in df.keys()})

(Arguably, that's exactly what PySD should be doing.)