Search code examples
muledataweavemule4

How to avoid repeating headers while writing file in batch job?


I am querying salesforce object and creating a flatfile. Header are getting repeated while writing it to a file in the batch step.

<scheduler doc:name="Scheduler" doc:id="09f2a42a-9eea-4407-9667-8627e24015e3" disallowConcurrentExecution="true">
            <scheduling-strategy >
                <fixed-frequency frequency="1" timeUnit="MINUTES"/>
            </scheduling-strategy>
        </scheduler>
        <set-variable value="#[now()]" doc:name="Set Variable" doc:id="ecdb6baa-36ed-4035-80a6-a08416c624f8" variableName="startTime"/>
        <set-variable value="#['C:\trjournal_'++ now() as DateTime as String {format: 'yyyyMMddHHmmss'} ++ '.txt']" doc:name="Set Variable" doc:id="91ffe280-3ade-4529-8e8d-47d52a455c9f" variableName="path" />
        <set-variable value="#[1]" doc:name="Set Variable" doc:id="dcb2c09d-606c-410e-ad29-117587559ed4" variableName="counter" />
        <logger level="INFO" doc:name="Logger" doc:id="722aef3c-156a-4772-bb54-44dc1e0cd9a8" message="Request Started: #[vars.startTime]" />
        <salesforce:query-all doc:name="Fetch all the records from Transaction Journal" doc:id="03374e79-d036-45c6-a374-7fa32b335a90" config-ref="Salesforce_Config">
            <salesforce:salesforce-query ><![CDATA[SELECT ASC__c,CDF23__c,Option_Set__c,Statement_Message_ID__c,TSYS_ID__c FROM TransactionJournal WHERE TSYS_ID__c != null  LIMIT 10]]></salesforce:salesforce-query>
        </salesforce:query-all>
        <ee:transform doc:name="Transform Message" doc:id="eb1795f1-112e-4c6c-9ed0-736156d872f3">
            <ee:message>
                <ee:set-payload><![CDATA[%dw 2.0
output application/java 
---
payload]]></ee:set-payload>
            </ee:message>
        </ee:transform>
        <batch:job jobName="td-crimp-papi-to-vpsBatch_Job" doc:id="e7207bbe-5e71-436c-9872-c38a781cfda0" >
            <batch:process-records >
                <batch:step name="Batch_Step" doc:id="93750fd8-bd02-48f8-9fbe-eeec899320ab" >
                    <ee:transform doc:name="Transform Message" doc:id="5785121d-927c-463c-aa26-d4607b514fdc">
            <ee:message>
                <ee:set-payload><![CDATA[%dw 2.0
output application/flatfile schemaPath = "schemas/VPS.ffd", structureIdent="MultiSegment", missingValues="spaces"


var fieldsMapping = {
    "Statement_Message_ID__c": "014205",
    "Option_Set__c": "019303",
    "ASC__c": "011206",
    "CDF23__c": "008902"
}
---
{
  vps: {
    vps: [
      {
        ":USRDSB:-SOURCE": {
          ":USRDSB:-SOURCE": "SOURCE",
          ":USRDSB:-SOURCE-ID ": "MULESOFT"
        },
        ":USRDSB:-HEADER": {
          ":USRDSB:-HEADER": "HEADER",
          ":USRDSB:-TRANSMISSION-ID": "SPLNSCRB"
        },
        ":USRDSB:-DETAIL-RECORD":  entriesOf(fieldsMapping) map {
                ":USRDSB:-FIELD-IND": $.value,
                ":USRDSB:-CARD-NBR": payload.TSYS_ID__c as Number,
                ":USRDSB:-FIELD-DATA": if($.value ~="014205") payload.Statement_Message_ID__c 
                                       else if($.value ~="019303") payload.Option_Set__c
                                       else if($.value ~="008902") payload.CDF23__c
                                       else payload.ASC__c
            }
      },
    ]
  }
}]]></ee:set-payload>
            </ee:message>
        </ee:transform>
                    <logger level="INFO" doc:name="Logger" doc:id="41d1a93a-0217-49f7-8756-faf0a17fa576" message="Incrementing counter #[vars.counter]" />
                    <file:write doc:name="Write" doc:id="226a764b-6ac6-4b1b-8d5a-d2e585c551e8" config-ref="File_Config" path="#[vars.path]" mode="APPEND" />
                </batch:step>
            </batch:process-records>
        </batch:job>
    </flow>

** Current output:**

  SOURCE MULESOFT
  HEADER SPLNSCRB 
  014205 00012392907   61734                                                                                    
  019303 00012392907   802                                                                                      
  011206 00012392907   V03                                                                                      
  008902 00012392907   3                                                                                        
  SOURCE MULESOFT
  HEADER SPLNSCRB 
  014205 00012392908   61735                                                                                    
  019303 00012392908   802                                                                                      
  011206 00012392908                                                                                            
  008902 00012392908 

Expected output:


  SOURCE MULESOFT
  HEADER SPLNSCRB 
  014205 00012392907   61734                                                                                    
  019303 00012392907   802                                                                                      
  011206 00012392907   V03                                                                                      
  008902 00012392907   3                                                                                        
  014205 00012392908   61735                                                                                    
  019303 00012392908   802                                                                                      
  011206 00012392908                                                                                            
  008902 00012392908 

How to avoid headers getting repeated? If it is not possible using batch. How I can optimally process the records from salesforce(expected 25k in a give time)?

TIA!


Solution

  • It is not possible using Batch in this way. Each time that the batch step is generated the output will be a full schema flat file so it will have a header.

    Additionally using a file write in a batch steps is not a good idea. Unless limiting the concurrency to 1 the step could be executed by different threads and the file writes could overwrite each other and corrupt the file.

    Note that even without using Batch processing flat files in DataWeave has high memory usage, as explained in the documentation:

    Flat File in DataWeave supports files of up to 15 MB, and the memory requirement is roughly 40 to 1. For example, a 1-MB file requires up to 40 MB of memory to process, so it’s important to consider this memory requirement in conjunction with your TPS needs for large flat files. This is not an exact figure; the value might vary according to the complexity of the mapping instructions.

    Ensure that there is enough memory for processing the volume you are using.

    A solution could be to use foreach with a batch size to process a limited number of records at a time. You will need to test to find a suitable number of records for your requirements of performance.

    Example in pseudo code:

    Query
    File write header
    foreach batchSize="10"  // 10 is just an example
       Transform records
       File write transformed records with append