I have a HUGE file with a lot of HL7 segments. It must be split into 1000 (or so ) smaller files. Since it has HL7 data, there is a pattern (logic) to go by. Each data chunk starts with "MSH|" and ends when next segment starts with "MSH|".
The script must be windows (cmd) based or VBS as I cannot install any software on that machine.
File structure:
MSH|abc|123|....
s2|sdsd|2323|
...
..
MSH|ns|43|...
...
..
..
MSH|sdfns|4343|...
...
..
asds|sds
MSH|sfns|3|...
...
..
as|ss
File in above example, must be split into 2 or 3 files. Also, the files comes from UNIX, so newlines must remain as they are in the source file.
Any help?
This is a sample script that I used to parse large hl7 files into separate files with the new file names based on the data file. Uses REBOL which does not require installation ie. the core version does not make any registry entries.
I have a more generalised version that scans an incoming directory and splits them into single files and then waits for the next file to arrive.
Rebol [
file: %split-hl7.r
author: "Graham Chiu"
date: 17-Feb-2010
purpose: {split HL7 messages into single messages}
]
fn: %05112010_0730.dat
outdir: %05112010_0730/
if not exists? outdir [
make-dir outdir
]
data: read fn
cnt: 0
filename: join copy/part form fn -4 + length? form fn "-"
separator: rejoin [ newline "MSH"]
parse/all data [
some [
[ copy result to separator | copy result to end ]
(
write to-file rejoin [ outdir filename cnt ".txt" ] result
print "Got result"
?? result
cnt: cnt + 1
)
1 skip
]
]