HuggingFace Dataset with 4 custom splits?

I have 4 JSON containing the same structure and the same amount of data (but the contents are different). I upload 4 files to my HuggingFace dataset repository.

My first try

Uploaded 4 files to repository directory:

data-1.json
data-2.json
data-3.json
data-4.json

As a result, HuggingFace combined 4 files into 1 dataset.

DatasetDict({
    train: Dataset({
        features: ['translation'],
        num_rows: 551964
    })
})

My second try

I renamed the 4 files into alpha.json, beta.json, delta.json, gamma.json. The result is the same.

My third try

I put the 4 files into 4 folders:

alpha/data-1.json
beta/data-2.json
delta/data-3.json
gamma/data-4.json

The result is still the same.

According to this official documentation, it only recognizes certain file & folder patterns.

My goal is to load my dataset like this, with 4 custom splits:

ds = load_dataset("myusername/my-dataset")
print(ds)

and the output is:

DatasetDict({
    alpha: Dataset({ # loads data-1.json
        features: ['translation'],
        num_rows: 137991
    }),
    beta: Dataset({ # loads data-2.json
        features: ['translation'],
        num_rows: 137991
    }),
    delta: Dataset({ # loads data-3.json
        features: ['translation'],
        num_rows: 137991
    }),
    gamma: Dataset({ # loads data-4.json
        features: ['translation'],
        num_rows: 137991
    })
})

The only stupid way I can think of is to create 4 dataset repositories, which is uneasy to manage.

Solution

Just found out that I need a special folder and file naming pattern to achieve my goal:

my_repository/
├── README.md
└── data/
    ├── alpha-00000-of-00001.csv
    ├── beta-00000-of-00001.csv
    ├── delta-00000-of-00001.csv
    ├── gamma-00000-of-00001.csv

which the load_dataset() function will result:

DatasetDict({
    alpha: Dataset({
        features: ['translation'],
        num_rows: 137991
    })
    beta: Dataset({
        features: ['translation'],
        num_rows: 137991
    })
    delta: Dataset({
        features: ['translation'],
        num_rows: 137991
    })
    gamma: Dataset({
        features: ['translation'],
        num_rows: 137991
    })
})