This question is not Perl-specific, (although the unpack
function will most probably figure into my implementation).
I have to deal with files where multiple formats exist to hierarchically break down the data into meaningful sections. What I'd like to be able to do is parse the file data into a suitable data structure.
Here's an example (commentary on RHS):
# | Format | Level | Comment
# +--------+-------+---------
**DEVICE 109523.69142 # 1 1 file-specific
.981 561A # 2 1
10/MAY/2010 24.15.30,13.45.03 # 3 2 group of records
05:03:01 AB23X 15.67 101325.72 # 4 3 part of single record
* 14 31.30474 13 0 # 5 3 part of single record
05:03:15 CR22X 16.72 101325.42 # 4 3 new record
* 14 29.16264 11 0 # 5 3
06:23:51 AW41X 15.67 101323.9 # 4 3
* 14 31.26493219 0 # 5 3
11/MAY/2010 24.07.13,13.44.63 # 3 2 group of new records
15:57:14 AB23X 15.67 101327.23 # 4 3 part of single record
* 14 31.30474 13 0 # 5 3 part of single record
15:59:59 CR22X 16.72 101331.88 # 4 3 new record
* 14 29.16264 11 0 # 5
The logic I have at the moment is fragile:
The big question here is about what strategies I can employ to determine which format needs to be used for which line. I'd be interested to know if others have faced similar situations and what they've done to address it.
Depending what you want to do with this, it might be a good place to actually write a formal grammar, using Parse::RecDescent, for instance. This will allow you to feed the entire file to the parser, and get a datastructure out of it.