Search code examples
aws-sdk-netmedical-imaging

Why does import of S3 "sub-bucket" into HealthImaging fail with "no file matches S3 URI" error?


While attempting to import programmatically a specific "subdirectory" of DICOM objects from an S3 bucket into a HealthImaging data store, I'm getting a "No file matches the provided input S3 URI" error. Meanwhile, I can visually confirm on the S3 console that there are files in that specific bucket including the full "subdirectory" prefix. It seems that the files are there but cannot be found.

The desired steps are:

  • Programmatically upload one or more DICOM files to an S3 bucket (bucketname/level1prefix/level2prefix). This step uses TranferUtility and UploadAsync.

  • Once upload is complete, invoke StartDICOMImportJobAsync to import from the S3 bucket to an existing HealthImaging data store.

The call to StartDICOMImportJobAsync works fine when the InputS3Uri looks like this:

s3://bucketname/level1prefix/

But it fails when the InputS3Uri looks like this:

s3://bucketname/level1prefix/level2prefix/

I'd like it to work the second way because I may have multiple S3 upload jobs that use the same level1prefix (e.g., Accession Number or Study Instance UID) but with unique prefixes at the second level. The level2prefix is a unique job id GUID.

To troubleshoot, I tried the same import in the AWS HealthImaging Console and could successfully Import DICOM data from bucketname/level1prefix/level2prefix into my data store.

I also tried adding a wait after the S3 upload step before attempting the import in case this is a timing issue, but that made no difference. I also tried a ListObjects call to see what was in the sub-bucket (it claims it doesn't exist) while visually confirming in the S3 console that the sub-bucket (with objects) does exist.

Since it works from the console and not the API call, it feels like maybe it is either a timing issue or a prefix parsing error where maybe the API call only takes everything up to the second delimiter (forward slash) or otherwise fails to read my mind. It's entirely possible that my expectation of what should work is incorrect, but I couldn't find anything in the online docs that provides clarity.

Is there a way to programmatically import from an S3 sub-bucket into a HealthImaging data store? Anyone have experience with import from S3 to HealthImaging who could provide a glimmer of insight?

Problem Solved (4/24/24) Discovered this issue was the result of how the job GUID is being used in an internal library developed in-house to upload DICOM images to S3 vs. how a development script was using the same GUID to construct S3 key paths. More in an answer below.


Solution

  • It turns out this issue was cockpit error on my part rather than anything in S3 or Health Imaging. It had to do with how a job GUID was being used to construct S3 key paths.

    An in-house S3 DICOM upload library was doing this:

    string keyPart = job.Guid.ToString("N");
    

    While my development script to import from S3 into HealthImaging was doing this:

    string keyPart = job.Guid.ToString();
    

    Of course, the resulting string versions of the GUIDs was different, with one version looking like this:

    532cf00c03344615a5195f3b3fdab607
    

    Another the other version looking like this:

    532cf00c-0334-4615-a519-5f3b3fdab607
    

    The ultimate solution came down to pair programming and a colleague who said "Show me the full paths" and whose brain hadn't been trained to not see the differences.