Search code examples
phpmysqlgoogle-cloud-datastoredatastore

How to build a proper schema on Datastore and fetch it out


Currently in my company we have a very inflated database log table (MySQL) with so many rows and I am trying to export it to datastore to make it lighter, but I do not know what would be the best structure/schema for that as this is my first time working with Datastore. As an additional context here, the project is built in PHP & Laravel.

The schema in my mind is something like this:

$data = [
        'hash' => md5(30 . 25 . 14), // cron_id + scheduler_id + integration_id
        'cron_id' => 30,
        'integration_id' => 14,
        'scheduler_id' => 25,
        'status' => 1,
        '...'
];
$entity = $datastore->entity($datastore->key('scheduler_logs', md5(30 . 25 . 14)), $data);
$subEntity1 = $datastore->entity($datastore->key('cron_job', md5(30 . 'Cron')), $cronRow);
$subEntity2 = $datastore->entity($datastore->key('manual_job', md5(30 . 'Manual')), $manualRow);
$entity->setProperty('jobs', [$subEntity1, $subEntity2]);
$datastore->upsert($entity);

So the jobs is where I am getting confused. On the schema above I am creating an array of entities into jobs, but what if I create the cron_job and manual_job separately, would it work as well? Or is the way I am currently doing is correct? Any better idea of how it could be built?


My second question is how I can filter by a nested inner entity? I was trying something like this but it is matching the type and the status individually (mixing manual and cron jobs), which means the status 1 is coming from the Cron type and not the Manual type as desired:

    $query = $datastore->query()
        ->kind('scheduler_logs')
        ->filter('jobs.type', '=', 'Manual')
        ->filter('jobs.status', '=', 1);
    $response = json_encode($datastore->runQuery($query));

This is the added row as an example:

Key
scheduler_logs name:0089bcfd1b3a8eb38ad87eed0e178a37
Key literal
Key(scheduler_logs, '0089bcfd1b3a8eb38ad87eed0e178a37')
URL-safe key
ag5mfmlwYWFzLW9uLWdhZXI0CxIOc2NoZWR1bGVyX2xvZ3MiIDAwODliY2ZkMWIzYThlYjM4YWQ4N2VlZDBlMTc4YTM3DA
attempts
5
created_at
January 12, 2023 at 12:36:57 PM UTC+10
cron_id
30
delay_date_time
January 12, 2023 at 12:36:57 PM UTC+10
has_warnings
0
hash
0089bcfd1b3a8eb38ad87eed0e178a37
integration_id
14
is_failed
0
is_prefetched
0
is_processing
0
jobs
[{"type":"Cron","status":"2"},{"type":"Manual","status":"1"}]
next_sync_from_time
January 12, 2023 at 12:36:57 PM UTC+10
next_sync_to_time
January 12, 2023 at 12:36:57 PM UTC+10
scheduler_id
25
status
1
support_comment
null
support_status
null
updated_at
January 12, 2023 at 12:36:57 PM UTC+10

and here is how the job structure is displayed in the Datastore:

{
  "values": [
    {
      "entityValue": {
        "key": {
          "partitionId": {
            "projectId": "***"
          },
          "path": [
            {
              "kind": "cron_job",
              "name": "e04518d7454ee31f406db54c4c022381"
            }
          ]
        },
        "properties": {
          "status": {
            "integerValue": "2"
          },
          "type": {
            "stringValue": "Cron"
          }
        }
      }
    },
    {
      "entityValue": {
        "key": {
          "partitionId": {
            "projectId": "***"
          },
          "path": [
            {
              "kind": "manual_job",
              "name": "0d0699909185aed8785d539075b3da85"
            }
          ]
        },
        "properties": {
          "status": {
            "integerValue": "1"
          },
          "type": {
            "stringValue": "Manual"
          }
        }
      }
    }
  ]
}

Thank you, appreciate any help.


Solution

  • As per your question description and data format that you have shown in the output above it seems to be an issue with the structuring of the data , it is recommended to always structure your data so that variable data ends up as values, there as well known data is the kind/property names.As seen from shape of your data and what you are trying to do, one common solution is to denormalize the index data into the parent entity. It does require keeping the data in sync, as these are tasks that run on certain time either cron job or manual ,so they will have a timestamp with them,then you should be able to properly use these job entities.
    Now coming to the filtering part by a nested inner entity, the query needs to be altered to maybe store everything inside a single entity,you don't have to query for the exact entities you are looking for. Do the query with a single query and filter the results by code.You can try and fetch only the direct children by doing a comparison operation.
    if( directChildEntity.getKey().getParent().equals( directParentEntity.getKey() ) ) { // directChildEntity is a direct child of directParentEntity }
    It is to be noted that the properties being filtered on must have a corresponding predefined index which can be defined in your index configuration file (index.yaml).

    I would also recommend you to check the following links for more details: