Search code examples
javaxmlxsltrdbms

How to convert a given hierarchical XML file into a denormalized Relational Database Table using JAVA


Given the following XML Structure:

<clinical_study>
  <primary_outcome>
    <measure></measure>
    <time_frame></time_frame>
    <safety_issue></safety_issue>
    <description></description>
  </primary_outcome>
  <secondary_outcome>
    <measure></measure>
    <time_frame></time_frame>
    <safety_issue></safety_issue>
    <description></description>
  </secondary_outcome>
</clinical_study>

I want to parse through the values within these attribute tags and dump them in an Oracle table named "CLINICAL_STUDY" with the following column structure:

desc clinical_study
Name                         Null     Type              
---------------------------- -------- -----------------    
PRIMARY_OUTCOME_MEASURE               VARCHAR2(50)      
PRIMARY_OUTCOME_TIME_FRAME            VARCHAR2(50)      
PRIMARY_OUTCOME_SAFETY_ISSUE          VARCHAR2(50)      
PRIMARY_OUTCOME_DESCRIPTION           VARCHAR2(4000)    
SECONDARY_OUTCOME_MEASURE               VARCHAR2(50)      
SECONDARY_OUTCOME_TIME_FRAME            VARCHAR2(50)      
SECONDARY_OUTCOME_SAFETY_ISSUE          VARCHAR2(50)      
SECONDARY_OUTCOME_DESCRIPTION           VARCHAR2(4000)   

I realize there are several ways I could implement this, but am not quite sure which would be the simplest. I'm leaning towards XSLTs that would "flatten" the data and then be easily converted into the table structure, but curious if there's an easier way. Thanks in advance.


Solution

  • If you are programming with Java, there should be no need to run a XSLT transformation. You should be able to just read in the data from the XML file and write it out to the database.

    There is certainly more than one way to do it, but one approach would be to read in the data using JAXB. (This will basically work if you have small enough data that it is OK to hold it in memory while processing. If you have large amounts of data, you may want to use a streaming XML parser API like StAX instead.)

    First, you could create a pair of classes to represent your input data, annotated with JAXB annotations which define the mapping from XML.

    @XmlRootElement(name="clinical_study")
    @XmlAccessorType(XmlAccessType.FIELD)
    public class ClinicalStudy {
    
        @XmlElement(name="primary_outcome")
        private Outcome primaryOutcome;
    
        @XmlElement(name="secondary_outcome")
        private Outcome secondaryOutcome;
    
        // getters and setters omitted for brevity
    }
    

    and

    @XmlAccessorType(XmlAccessType.FIELD)
    public class Outcome {
    
        @XmlElement(name="measure")
        private String measure;
    
        @XmlElement(name="time_frame")
        private String timeFrame;
    
        @XmlElement(name="safety_issue")
        private String safetyIssue;
    
        @XmlElement(name="description")
        private String description;
    
        // getters and setters omitted for brevity
    }
    

    Then you can just read, or "unmarshal", your data from XML (assuming inputStream is a stream of the XML content).

        JAXBContext context = JAXBContext.newInstance(ClinicalStudy.class);
        Unmarshaller unmarshaller = context.createUnmarshaller();
        ClinicalStudy clinicalStudy = (ClinicalStudy) unmarshaller.unmarshal(inputStream);
    

    And insert into your database (assuming conn is a JDBC database connection).

        PreparedStatement pstmt = conn.prepareStatement(
                "INSERT INTO clinical_study ("
                + "PRIMARY_OUTCOME_MEASURE, "
                + "PRIMARY_OUTCOME_TIME_FRAME, "
                + "PRIMARY_OUTCOME_SAFETY_ISSUE, "
                + "PRIMARY_OUTCOME_DESCRIPTION, "
                + "SECONDARY_OUTCOME_MEASURE, "
                + "SECONDARY_OUTCOME_TIME_FRAME, "
                + "SECONDARY_OUTCOME_SAFETY_ISSUE, "
                + "SECONDARY_OUTCOME_DESCRIPTION) "
                + "VALUES (?, ?, ?, ?, ?, ?, ?, ?)");
        pstmt.setString(1, clinicalStudy.getPrimaryOutcome().getMeasure());
        pstmt.setString(2, clinicalStudy.getPrimaryOutcome().getTimeFrame());
        pstmt.setString(3, clinicalStudy.getPrimaryOutcome().getSafetyIssue());
        pstmt.setString(4, clinicalStudy.getPrimaryOutcome().getDescription());
        pstmt.setString(5, clinicalStudy.getSecondaryOutcome().getMeasure());
        pstmt.setString(6, clinicalStudy.getSecondaryOutcome().getTimeFrame());
        pstmt.setString(7, clinicalStudy.getSecondaryOutcome().getSafetyIssue());
        pstmt.setString(8, clinicalStudy.getSecondaryOutcome().getDescription());
        pstmt.execute();