Does anyone know if the Cerner Bunsen library (https://github.com/cerner/bunsen) will load FHIR R4 bundles and persist the data to spark sql databases? If anyone can offer any guidance or point me to any, that would be great. At the moment I'm just trying to load a bundled sample from https://simplifier.net/ukcore. The ultimate objective is to persist incoming Bundles to a hive database to be accessed by Apache Spark clusters.
The sample code to try to load a single entry Bundle is:
Bundles bundles = Bundles.forR4();
URL fileUrl = R4Test.class.getClassLoader().getResource("ukcore/UKCore-AllergyIntolerance-Amoxicillin-Example.json");
JavaRDD bundlesRdd = bundles.loadFromDirectory(spark, fileUrl.toExternalForm(), 200);
Object c = bundlesRdd.collect();
bundles.saveAsDatabase(spark, bundlesRdd, "r4database", "AllergyIntolerance");
On the bundlesRdd.collect()
I get the following warnings:
INFO WholeTextFileRDD: Input split: Paths:/path/to/ukcore/UKCore-AllergyIntolerance-Amoxicillin-Example.json:0+2017
WARN LenientErrorHandler: Unknown element 'meta' found while parsing
WARN LenientErrorHandler: Unknown element 'clinicalStatus' found while parsing
WARN LenientErrorHandler: Unknown element 'verificationStatus' found while parsing
WARN LenientErrorHandler: Unknown element 'type' found while parsing
WARN LenientErrorHandler: Unknown element 'category' found while parsing
WARN LenientErrorHandler: Unknown element 'code' found while parsing
WARN LenientErrorHandler: Unknown element 'patient' found while parsing
WARN LenientErrorHandler: Unknown element 'encounter' found while parsing
WARN LenientErrorHandler: Unknown element 'recordedDate' found while parsing
WARN LenientErrorHandler: Unknown element 'recorder' found while parsing
WARN LenientErrorHandler: Unknown element 'asserter' found while parsing
WARN LenientErrorHandler: Unknown element 'reaction' found while parsing
And when trying to saveAsDatabase()
it fails with:
java.lang.IllegalArgumentException: Unsupported FHIR version: R4
at com.cerner.bunsen.definitions.StructureDefinitions.create(StructureDefinitions.java:120)
at com.cerner.bunsen.spark.SparkRowConverter.forResource(SparkRowConverter.java:75)
at com.cerner.bunsen.spark.SparkRowConverter.forResource(SparkRowConverter.java:54)
at com.cerner.bunsen.spark.Bundles.extractEntry(Bundles.java:211)
at com.cerner.bunsen.spark.Bundles.saveAsDatabase(Bundles.java:290)
I'm currently running with the following dependencies:
<dependencies>
<dependency>
<groupId>com.cerner.bunsen</groupId>
<artifactId>bunsen-r4</artifactId>
<version>0.4.5</version>
</dependency>
<dependency>
<groupId>com.cerner.bunsen</groupId>
<artifactId>bunsen-core</artifactId>
<version>0.5.7</version>
</dependency>
<dependency>
<groupId>com.cerner.bunsen</groupId>
<artifactId>bunsen-spark</artifactId>
<version>0.5.7</version>
</dependency>
<!--
to resolve java.lang.IllegalAccessError:
"tried to access method com.google.common.base.Stopwatch.<init>()V from class
org.apache.hadoop.mapreduce.lib.input.FileInputFormat"
-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<!-- Spark dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.5</version>
</dependency>
</dependencies>
Many thanks,
Dave
At present R4 version is not supported due to major changes made in 0.5.X release and its in our roadmap but we do not have an ETA yet.
If you are trying to explore sample data, please test with 0.4.6 release which supports both STU3 and R4. Please note the older releases are not maintained any more.
Thanks, Amaresh