Search code examples
jsonpentahokettledata-integration

Use JSON Input step to process uneven data


I'm trying to process the following with an JSON Input step:

{"address":[
  {"AddressId":"1_1","Street":"A Street"},
  {"AddressId":"1_101","Street":"Another Street"},
  {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
  {"AddressId":"1_102","Locality":"New York"}
]}

However this seems not to be possible:

Json Input.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : 
The data structure is not the same inside the resource! 
We found 1 values for json path [$..Locality], which is different that the number retourned for path [$..Street] (3509 values). 
We MUST have the same number of values for all paths.

The step provides Ignore Missing Path flag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.

This limits the power of this step to read uneven data, which was really one of my priorities.

My step Fields are defined as follows:

JSON Input Fields definition

Am I missing something? Is this the correct behavior?


Solution

  • What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

    {"address":[
        {"AddressId":"1_1","Street":"A Street"},  
        {"AddressId":"1_101","Street":"Another Street"},  
        {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},   
        {"AddressId":"1_102","Locality":"New York"} 
    ]}
    

    This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

    var AddressId = getFromMap('AddressId', jsonRow);
    var Street = getFromMap('Street', jsonRow);
    var Locality = getFromMap('Locality', jsonRow);
    

    In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:

    function getFromMap(key,jsonRow){
      try{
       var map = JSON.parse(jsonRow);
      }
      catch(e){
       var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
       var nr_errors = 1;
       var field = "jsonRow";
       var errcode = "JSON_PARSE";
       _step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
       trans_Status = SKIP_TRANSFORMATION;
       return null;
      }
    
      if(map[key] == undefined){
       return null;
      }
      trans_Status = CONTINUE_TRANSFORMATION;
      return map[key]
    }