Search code examples
google-app-enginegoogle-cloud-datastoredatastore

How do I load data from Cloud Storage to Cloud Datastore from an AppEngine PHP application?


I have been searching various sources but it is not clear to this neewbie. How do I load data (CSV file) from Cloud Storage to Cloud Datastore from an AppEngine PHP application? I do have an existing method which downloads the file and then loads each row as a transaction. It takes a few hours for a few million rows so this does not seem the best method and have been searching for a more efficient method. I appreciate any guidance.

Editing this as I have switched to trying to use a remote URL from which to load the JSON data into Datastore from GAE. Code is not working though I do not know why (yet):

<?php

require 'vendor/autoload.php';
use Google\Auth\ApplicationDefaultCredentials;
use Google\Cloud\Datastore\DatastoreClient;

/**
 * Create a new product with a given SKU.
 *
 * @param DatastoreClient $datastore
 * @param $sku
 * @param $product
 * @return Google\Cloud\Datastore\Entity
 */
function add_product(DatastoreClient $datastore, $sku, $product)
{
    $productKey = $datastore->key('SKU', $sku);
    $product = $datastore->entity(
        $productKey,
        [
            'created' => new DateTime(),
            'name' => strtolower($product)
        ]);
    $datastore->upsert($product);
    return $product;
}

/*
  Load Cloud DataStore Kind from remote URL

  @param $projectId
  @param $url
*/
function load_datastore($projectId, $url) {
  // Create Datastore client
  $datastore = new DatastoreClient(['projectId' => $projectId]);

  // Enable `allow_url_fopen` to allow reading file from URL
  ini_set("allow_url_fopen", 1);

  // Read the products listing and load to Cloud Datastore.
  // Use batches of 20 for a transaction
  $json = json_decode(file_get_contents($url), true);
  $count = 1;
  foreach($json as $sku_key => $product_val) {
    if ($count == 1) {
          $transaction = $datastore->transaction();
    }
    add_product($datastore, $sku_key, $product_val);
        if ($count == 20) {
          $transaction->commit();
          $count = 0;
        } catch (Exception $err) {
        echo 'Caught exception: ',  $err->getMessage(), "\n";
      $transaction->rollback();
    }
    $count++;
  }
}

try
{
    $projectId = 'development';
    $url = 'https://raw.githubusercontent.com/BestBuyAPIs/open-data-set/master/products.json';
    load_datastore($projectId, $url);
} catch (Exception $err) {
    echo 'Caught exception: ',  $err->getMessage(), "\n";
  $transaction->rollback();
}
?>

Solution

  • This question is similar to Import CSV into google cloud datastore and Google Cloud Datastore: Bulk Importing w Node.js .

    The quick answer is that you can use Apache Beam or Cloud Dataflow to import CSV data into Cloud Datastore.