I am working with a large dataset of named bridges, that each have several attributes. Some of the attributes are permanent, but several attributes are updated yearly. The data I received is a .csv file with columns that include 'Name', 'Year', 'X', 'Y', 'Z' for each item. I'm trying to turn this into a 3d xarray. An example of the dataset is below:
Just to note, this is a significantly reduced array. I'm working with approximately 40,000 bridges over 30 years with 35 different attributes for each. The attributes are mixed type, with about half being string values and half numerical.
I've tried using a dictionary of dataframes or just taking one attribute at a time. I don't think this is very efficient. I've been interested in using xarray for a while but haven't been able to figure out how to make new dimensions based on existing columns in a dataframe. I've tried the following based on the xarray page for creating new DataArrays:
import pandas as pd
import numpy as np
d = {'Name': ['BridgeA','BridgeB','BridgeC',
'BridgeA','BridgeB','BridgeC',
'BridgeA','BridgeB','BridgeC',
'BridgeA','BridgeB','BridgeC'],
'Built': [2000, 1995, 2004,
2000, 1995, 2004,
2000, 1995, 2004,
2000, 1995, 2004],
'Type': ['Steel','Steel','Concrete',
'Steel','Steel','Concrete',
'Steel','Steel','Concrete',
'Steel','Steel','Concrete'],
'Year': [2015, 2015, 2015,
2016, 2016, 2016,
2017, 2017, 2017,
2018, 2018, 2018],
'ConditionX': [10, 10, 10, 10, 9, 7, 9, 5, 5, 2, 8, 4],
'ConditionY': [10, 10, 10, 9, 9, 8, 8, 4, 1, 3, 4, 5],
'ConditionZ': [10, 10, 10, 9, 9, 10, 5, 6, 3, 6, 6, 6]}
df = pd.DataFrame(data=d)
da = xr.DataArray(data = df[['Built','Type','ConditionX','ConditionY','ConditionZ']],
dims = ('Name','Year'),
coords = {'Name': df['Name'],
'Year': df['Year']})
I've tried a few different arrangements of the DataArray call, but I'm not sure how to turn it into "three" dimensions, considering one dimension can be the list of attributes. I also tried pd.Dataframe.to_xarray, which worked until I then tried to add a dimension from the Year column. Previously I've used pandas multi-indexing / Panels but this seems to be deprecated and I'd like to avoid its use moving forward. Ideally, I would have a structure similar to this: Stacked 3d array example
In the end, I want to be able to perform analysis for a single year across multiple bridges (average ConditionX in 2015), as well as condition for particular bridges over the available time range. Does anyone have suggestions on how to get this data into xarray as desired or a different method that may be more applicable?
i hope these codes work for you :)
def app_func(d:pd.DataFrame):
return d.sort_values("Year")
d = df.groupby("Name", dropna=False).apply(app_func)
name = d.Name.unique()
year = d.Year.unique()
attr = d.drop(["Name","Year"], axis=1).columns
da = xr.DataArray(
data = d.drop(["Name","Year"], axis=1).values.reshape(name.size, year.size, attr.size),
dims = ["Name", "Year", "Attr"],
coords = {
"Name":name,
"Year":year,
"Attr":attr
}
)
da