Multi-level indexing of data frames in Julia?

May I know how to apply multi-level indexing on data frames in Julia? Or is there any other method, approach or package to achieve this objective.

Update

Example python code:

import numpy as np
import pandas as pd
arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"]), ]

df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df

Output:->

Thanks!!

Solution

I understand your question but the point is what do you need to use the index for.

Here is how groupby works:

julia> using DataFrames

julia> df = DataFrame(x=repeat(["bar", "baz"], inner=3), y=repeat(["one", "two"], outer=3), z=1:6)
6×3 DataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     two         2
   3 │ bar     one         3
   4 │ baz     two         4
   5 │ baz     one         5
   6 │ baz     two         6

julia> groupby(df, :x) # 1-level index
GroupedDataFrame with 2 groups based on key: x
First Group (3 rows): x = "bar"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     two         2
   3 │ bar     one         3
⋮
Last Group (3 rows): x = "baz"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     two         4
   2 │ baz     one         5
   3 │ baz     two         6

julia> groupby(df, :y) # 1-level index
GroupedDataFrame with 2 groups based on key: y
First Group (3 rows): y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
   3 │ baz     one         5
⋮
Last Group (3 rows): y = "two"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     two         2
   2 │ baz     two         4
   3 │ baz     two         6

julia> groupby(df, [:x, :y]) # 2-level index
GroupedDataFrame with 4 groups based on keys: x, y
First Group (2 rows): x = "bar", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
⋮
Last Group (1 row): x = "baz", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     one         5

Now an example of indexing for 2-level index:

julia> gdf = groupby(df, [:x, :y]) # 2-level index
GroupedDataFrame with 4 groups based on keys: x, y
First Group (2 rows): x = "bar", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
⋮
Last Group (1 row): x = "baz", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     one         5

julia> gdf[("bar", "two")]
1×3 SubDataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     two         2

julia> gdf[("baz", "two")]
2×3 SubDataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     two         4
   2 │ baz     two         6

Now there is a difference between DataFrames.jl and Pandas in indexing. For Pandas you have (see here for benchmarks):

When index is unique, pandas use a hashtable to map key to value O(1). When index is non-unique and sorted, pandas use binary search O(logN), when index is random ordered pandas need to check all the keys in the index O(N).

while for DataFrames.jl no matter what source columns you use for indexing lookup is always O(1).