Search code examples
google-app-engineblobstorepipeline

How can I read a blob that was written to the Blobstore by a Pipeline within the test framework?


I have a pipeline that creates a blob in the blobstore and places the resulting blob_key in one of its named outputs. When I run the pipeline through the web interface I have built around it, everything works wonderfully. Now I want to create a small test case that will execute this pipeline, read the blob out from the blobstore, and store it to a temporary location somewhere else on disk so that I can inspect it. (Since testbed.init_files_stub() only stores the blob in memory for the life of the test).

The pipeline within the test case seems to work fine, and results in what looks like a valid blob_key, but when I pass that blob_key to the blobstore.BlobReader class, it cannot find the blob for some reason. From the traceback, it seems like the BlobReader is trying to access the real blobstore, while the writer (inside the pipeline) is writing to the stubbed blobstore. I have --blobstore_path setup on dev_appserver.py, and I do not see any blobs written to disk by the test case, but when I run it from the web interface, the blobs do show up there.

Here is the traceback:

Traceback (most recent call last):
  File "/Users/mattfaus/dev/webapp/coach_resources/student_use_data_report_test.py", line 138, in test_serial_pipeline
    self.write_out_blob(stage.outputs.xlsx_blob_key)
  File "/Users/mattfaus/dev/webapp/coach_resources/student_use_data_report_test.py", line 125, in write_out_blob
    writer.write(reader.read())
  File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/blobstore/blobstore.py", line 837, in read
    self.__fill_buffer(size)
  File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/blobstore/blobstore.py", line 809, in __fill_buffer
    self.__position + read_size - 1)
  File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/blobstore/blobstore.py", line 657, in fetch_data
    return rpc.get_result()
  File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 604, in get_result
    return self.__get_result_hook(self)
  File "/Users/mattfaus/Desktop/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/blobstore/blobstore.py", line 232, in _get_result_hook
    raise _ToBlobstoreError(err)
BlobNotFoundError

Here is my test code:

def write_out_blob(self, blob_key, save_path='/tmp/blob.xlsx'):
    """Reads a blob from the blobstore and writes it out to the file."""
    print str(blob_key)
    # blob_info = blobstore.BlobInfo.get(str(blob_key))  # Returns None
    # reader = blob_info.open()  # Returns None
    reader = blobstore.BlobReader(str(blob_key))
    writer = open(save_path, 'w')
    writer.write(reader.read())
    print blob_key, 'written to', save_path

def test_serial_pipeline(self):
    stage = student_use_data_report.StudentUseDataReportSerialPipeline(
        self.query_config)

    stage.start_test()
    self.assertIsNotNone(stage.outputs.xlsx_blob_key)    
    self.write_out_blob(stage.outputs.xlsx_blob_key)

Solution

  • Turns out that I was simply missing the .value property, here:

    self.assertIsNotNone(stage.outputs.xlsx_blob_key)    
    self.write_out_blob(stage.outputs.xlsx_blob_key.value)  # Don't forget .value!!
    

    [UPDATE] The SDK dashboard also exposes a list of all blobs in your blobstore, conveniently sorted by creation date. It is available at http://127.0.0.1:8000/blobstore.