I'm attempting to parse a document that looks similar to this:
<PRESOL>
<DATE>1112
<YEAR>12
<AGENCY>Defense Logistics Agency
<OFFICE>DLA Acquisition Locations
<LOCATION>DLA Land and Maritime
<ZIP>43218-3990
<CLASSCOD>59
<DESC>Proposed procurement for NSN 5365013055528 SPACER,PLATE:
Line 0001 Qty 70.00 UI EA Deliver To: ARIZONA INDUSTRIES FOR THE BLIND By: 0180 DAYS ADOThe solicitation is an RFQ and will be available at the link provided in this notice. Hard copies of this solicitation are not available. Digitized drawings and Military Specifications and Standards may be retrieved, or ordered, electronically.
All responsible sources may submit a quote which, if timely received, shall be considered.
Quotes must be submitted electronically.
<SETASIDE>HUBZone
.......
</PRESOL>
As you can see its a bizarre, but perhaps it used to be some standard. The entire document appears to use a limited set of white space characters, for instance I see no [tab], however I do see line breaks within some of the larger data blocks.
Does this look familiar to anyone?
I'm looking for a rails gem that might parse this.
(Understand that I haven't seen this before - this is all the result of some digging around)
This is the format for a Presolicitation Notice, as published by the United States' Federal Business Opportunities... something. This is one of fifteen data interchange formats defined by that organization.
I could find no description of the base format for that template. Which is unfortunate, because there are a ton of gotchas in SGML (as I mentioned in the comments, this sure looks a lot like SGML) that will bite you if you're not prepared for them. Here's an interesting example from Wikipedia: <QUOTE></QUOTE>
can also be written as: <QUOTE//
, or <QUOTE>
.
The template documentation is limited to the format of the data expected in each field. For example:
<CLASSCOD>
Either one alphabetic code or a two-digit code for service or supply that the synopsis should be listed under. Valid classification code (FAR, Section 5.207(g))