I have the following data
{
"0": "x",
"1": [
[
"x",
{
"app_instance_id": "x",
"app_instance_time": "x",
"page": {
"url": "x"
},
"user_agent": "x",
"timestamp": "x",
"session_id": "x",
"permanent_id": "x",
"event_category": "x",
"customer": "x",
"referrer": {
"url": "x"
},
"ip_address": "x"
}
],
[
"x",
{
"app_instance_id": "x",
"app_instance_time": "x",
"page": {
"url": "x"
},
"user_agent": "x",
"timestamp": "x",
"session_id": "x",
"permanent_id": "x",
"event_category": "x",
"customer": "x",
"referrer": {
"url": "x"
},
"ip_address": "x"
}
]
],
"time": 1627978464738
}{
"event": "x",
"userId": "x",
"badgeId": null,
"levelId": null,
"projectId": "x",
"ua": "x",
"key": "x",
"requestMethod": "x",
"endpoint": "x",
"customerId": "x",
"durationMs": 0,
"responseCode": 200,
"time": 1627978465804
}{
"event": "x",
"userId": "x",
"badgeId": null,
"levelId": null,
"projectId": "x",
"ua": "x",
"key": "x",
"requestMethod": "GET",
"endpoint": "x",
"customerId": "x",
"durationMs": 0,
"responseCode": 200,
"time": 1627978465798
}{
"event": null,
"ua": "x",
"browser.name": "Firefox",
"browser.version": "87.0",
"browser.major": "87",
"engine.name": "Gecko",
"engine.version": "87.0",
"os.name": "Mac OS",
"os.version": "10.15",
"lineCount": 3,
"data": 20,
"carrier": "x",
"spendingNow": 200,
"client": "x",
"time": 1619185462317
}{
"event": null,
"ua": "x",
"browser.name": "Chrome",
"browser.version": "90.0.4430.66",
"browser.major": "90",
"engine.name": "Blink",
"engine.version": "90.0.4430.66",
"os.name": "Android",
"os.version": "10",
"device.vendor": "Samsung",
"device.model": "SM-G965F",
"device.type": "mobile",
"lineCount": 1,
"data": 25,
"carrier": "x",
"spendingNow": 10,
"client": "x",
"time": 1619201845480
}
As you can see, it contains json objects of different schemas in one file. However, when I use the glue crawler to define tables for my data, it creates one single table for the whole file, which contains all the columns in all of the json objects (like 0, 1, time, event, userId, badgeId etc.) as shown in the screenshot below.
What I want to do, is tell the crawler to create multiple tables for each schema, like it does for separate files. What can I do?
I don't think you can. A schema is supposed to describe the structure of usually a directory of file(s). Having multiple schemas for a single file would not even allow to browse the data of this very file, and it wouldn't make any sense
Best is to clean your data, or use separate files (in separate paths) with consistent schema if you really want to detect different schemas