I have an xarray dataset created and converted to pandas like so:
arr = xr.Dataset(
coords={
"test1": range(20000,60000+1,2500),
"test2": range(10, 100+1),
"test3": range(1,10+1),
"count_at_1": 0,
"count_at_5": 0,
"count_at_10": 0,
}
)
df = arr.to_dataframe()
The dataframe looks like this, which seems to be exactly what I want:
count_at_1 count_at_5 count_at_10
test1 test2 test3
20000 10 1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
... ... ... ...
60000 100 6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
However, when I try to access a specific value inside this dataframe it causes some issues:
print(df["count_at_1"][50000][70][5]) # works fine, prints 0 as it should
df.loc["count_at_1"][50000][70][5] = 10 # does not work, KeyError: 'count_at_1'
df.at["count_at_1"][50000][70][5] = 10 # does not work, gives TypeError
I would also like to print out all the count_at_x values for a certain test1, test2, test3. Should look something like this:
print(df[50000][70][5])
count_at_1 count_at_5 count_at_10
0 0 0
You just have the wrong indexing syntax. .loc
and .at
index rows when you give them a scalar, not columns. You can actually give them a tuple of (row, column) instead.
df.loc[(50000, 70, 5), "count_at_1"] = 11
df.at[(50000, 70, 5), "count_at_1"] = 12
You should use something similar for printing the value too, either:
print(df.loc[(50000, 70, 5), "count_at_1"])
print(df.at[(50000, 70, 5), "count_at_1"])
To get all the values on this row, you can use either:
>>> df.loc[(50000, 70, 5)] # Single row = Series
count_at_1 12
count_at_5 0
count_at_10 0
Name: (50000, 70, 5), dtype: int64
>>> df.loc[[(50000, 70, 5)]] # Selection of one row = df
count_at_1 count_at_5 count_at_10
test1 test2 test3
50000 70 5 12 0 0
I'm not terribly familiar with xarray, but part of your confusion might stem from the fact that Pandas DataFrames are fundamentally 2D, so indexing multiple levels doesn't really make sense.
For more info, see the Pandas user guide: