Search code examples
javashahl7-fhirhapi-fhir

FHIR Resource logical id generation using SHA256


I am trying to implement code that generates FHIR message from some type of input message. When I create each FHIR resources, I would need to create resource logical id that are unique and repeatedly generated.

From Microsoft's FHIR-Converter github repository, I found that they use SHA256 to hash the input string value to generate some type of 64 character id. I used the same approach to generate UUID in java. Here is code from Microsoft FHIR-Converter in .NET:

        public static string GenerateUUID(string input)
        {
            if (string.IsNullOrWhiteSpace(input))
            {
                return null;
            }

            var bytes = Encoding.UTF8.GetBytes(input);
            var algorithm = SHA256.Create();
            var hash = algorithm.ComputeHash(bytes);
            var guid = new byte[16];
            Array.Copy(hash, 0, guid, 0, 16);
            return new Guid(guid).ToString();
        }

It generates uuid like this: e40b96a6-e62e-a67e-3ac7-69a099830e1c

My questions are:

  1. In order to repeatedly generate the same id, does the string input MUST be same as well? Meaning, if I have an input of 123, it will generate e40b96a6-e62e-a67e-3ac7-69a099830e1c all the time?

  2. If I HAVE to use unique id in order to generate this uuid, what is the advantage of using this extra step? If my input always have unique id for each resources, can I just assign id to be (Resource name)-(id)?

  3. Is there a way to generate id without having unique id? I have some resources that do not have something unique. Are there other techniques where I can generate a unique input that can be repeated in different platforms? I don't see how I can do this without providing unique id from input..


Solution

    1. A given string will always generate the same id. A different string should generate a different id, though there's a very slim chance of two strings generating the same hash.

    2. There are rules for the format of the id (only certain characters permitted, maximum length allowed), but other than that, no obvious benefit I can see. It's fine to use your 'native' identifier as the resource id. (That said, resource ids generally shouldn't be real-world identifiers like social security numbers, license numbers, etc. as that can leak protected information.)

    3. The expectation in FHIR is that a unique resource id corresponds to a unique real-world object. If you don't have a real identifier on the object, there's a possibility you could have multiple instances that correspond to distinct real-world objects. E.g. multiple Practitioner instances where all you have is a name of "A. Smith" would not be appropriate to presume are always the same instance. If you have no 'identity', you might be better off using the 'contained' mechanism rather than generating an id just from the content.