I am querying salesforce object and creating a flatfile. Header are getting repeated while writing it to a file in the batch step.
<scheduler doc:name="Scheduler" doc:id="09f2a42a-9eea-4407-9667-8627e24015e3" disallowConcurrentExecution="true">
<scheduling-strategy >
<fixed-frequency frequency="1" timeUnit="MINUTES"/>
</scheduling-strategy>
</scheduler>
<set-variable value="#[now()]" doc:name="Set Variable" doc:id="ecdb6baa-36ed-4035-80a6-a08416c624f8" variableName="startTime"/>
<set-variable value="#['C:\trjournal_'++ now() as DateTime as String {format: 'yyyyMMddHHmmss'} ++ '.txt']" doc:name="Set Variable" doc:id="91ffe280-3ade-4529-8e8d-47d52a455c9f" variableName="path" />
<set-variable value="#[1]" doc:name="Set Variable" doc:id="dcb2c09d-606c-410e-ad29-117587559ed4" variableName="counter" />
<logger level="INFO" doc:name="Logger" doc:id="722aef3c-156a-4772-bb54-44dc1e0cd9a8" message="Request Started: #[vars.startTime]" />
<salesforce:query-all doc:name="Fetch all the records from Transaction Journal" doc:id="03374e79-d036-45c6-a374-7fa32b335a90" config-ref="Salesforce_Config">
<salesforce:salesforce-query ><![CDATA[SELECT ASC__c,CDF23__c,Option_Set__c,Statement_Message_ID__c,TSYS_ID__c FROM TransactionJournal WHERE TSYS_ID__c != null LIMIT 10]]></salesforce:salesforce-query>
</salesforce:query-all>
<ee:transform doc:name="Transform Message" doc:id="eb1795f1-112e-4c6c-9ed0-736156d872f3">
<ee:message>
<ee:set-payload><![CDATA[%dw 2.0
output application/java
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
<batch:job jobName="td-crimp-papi-to-vpsBatch_Job" doc:id="e7207bbe-5e71-436c-9872-c38a781cfda0" >
<batch:process-records >
<batch:step name="Batch_Step" doc:id="93750fd8-bd02-48f8-9fbe-eeec899320ab" >
<ee:transform doc:name="Transform Message" doc:id="5785121d-927c-463c-aa26-d4607b514fdc">
<ee:message>
<ee:set-payload><![CDATA[%dw 2.0
output application/flatfile schemaPath = "schemas/VPS.ffd", structureIdent="MultiSegment", missingValues="spaces"
var fieldsMapping = {
"Statement_Message_ID__c": "014205",
"Option_Set__c": "019303",
"ASC__c": "011206",
"CDF23__c": "008902"
}
---
{
vps: {
vps: [
{
":USRDSB:-SOURCE": {
":USRDSB:-SOURCE": "SOURCE",
":USRDSB:-SOURCE-ID ": "MULESOFT"
},
":USRDSB:-HEADER": {
":USRDSB:-HEADER": "HEADER",
":USRDSB:-TRANSMISSION-ID": "SPLNSCRB"
},
":USRDSB:-DETAIL-RECORD": entriesOf(fieldsMapping) map {
":USRDSB:-FIELD-IND": $.value,
":USRDSB:-CARD-NBR": payload.TSYS_ID__c as Number,
":USRDSB:-FIELD-DATA": if($.value ~="014205") payload.Statement_Message_ID__c
else if($.value ~="019303") payload.Option_Set__c
else if($.value ~="008902") payload.CDF23__c
else payload.ASC__c
}
},
]
}
}]]></ee:set-payload>
</ee:message>
</ee:transform>
<logger level="INFO" doc:name="Logger" doc:id="41d1a93a-0217-49f7-8756-faf0a17fa576" message="Incrementing counter #[vars.counter]" />
<file:write doc:name="Write" doc:id="226a764b-6ac6-4b1b-8d5a-d2e585c551e8" config-ref="File_Config" path="#[vars.path]" mode="APPEND" />
</batch:step>
</batch:process-records>
</batch:job>
</flow>
** Current output:**
SOURCE MULESOFT
HEADER SPLNSCRB
014205 00012392907 61734
019303 00012392907 802
011206 00012392907 V03
008902 00012392907 3
SOURCE MULESOFT
HEADER SPLNSCRB
014205 00012392908 61735
019303 00012392908 802
011206 00012392908
008902 00012392908
Expected output:
SOURCE MULESOFT
HEADER SPLNSCRB
014205 00012392907 61734
019303 00012392907 802
011206 00012392907 V03
008902 00012392907 3
014205 00012392908 61735
019303 00012392908 802
011206 00012392908
008902 00012392908
How to avoid headers getting repeated? If it is not possible using batch. How I can optimally process the records from salesforce(expected 25k in a give time)?
TIA!
It is not possible using Batch in this way. Each time that the batch step is generated the output will be a full schema flat file so it will have a header.
Additionally using a file write in a batch steps is not a good idea. Unless limiting the concurrency to 1 the step could be executed by different threads and the file writes could overwrite each other and corrupt the file.
Note that even without using Batch processing flat files in DataWeave has high memory usage, as explained in the documentation:
Flat File in DataWeave supports files of up to 15 MB, and the memory requirement is roughly 40 to 1. For example, a 1-MB file requires up to 40 MB of memory to process, so it’s important to consider this memory requirement in conjunction with your TPS needs for large flat files. This is not an exact figure; the value might vary according to the complexity of the mapping instructions.
Ensure that there is enough memory for processing the volume you are using.
A solution could be to use foreach with a batch size to process a limited number of records at a time. You will need to test to find a suitable number of records for your requirements of performance.
Example in pseudo code:
Query
File write header
foreach batchSize="10" // 10 is just an example
Transform records
File write transformed records with append