A several line document has a header/title section and then about 10 listings under each. I need to put the header/title info in with each of the listings so that they can be properly uploaded into a website (using comma and pipe delimiters). It looks like this:
SectionName1 and TitleName1
1111 - The SubSectionName A
222 - The SubSectionName B
3333 - The SubSectionName C
SectionName2 and TitleName2
444 - The SubSectionName D
55555 - The SubSectionName E
66 - The SubSectionName F
Repeating several hundred times. What I need is to produce something like:
SectionName1,TitleName1,1111,SubSectionNameA
SectionName1,TitleName1,222,SubSectionNameB
SectionName1,TitleName1,3333,SubSectionNameC
SectionName2,TitleName2,444,SubSectionNameD
SectionName2,TitleName2,55555,SubSectionNameE
SectionName2,TitleName2,66,SubSectionNameF
I realize there can multiple approaches to this solution, but I'm having a difficult time pulling the trigger on any one method. I understand submatches, joins and getline but I am not good at practical use of them in this scenario.
Any help to get me mentally started would be greatly appreciated.
Let me propose the following quite general Ex command solving the issue.1
:g/^\s*\h/d|let@"=substitute(@"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|
\ 'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=@".','.submatch(1).','/|'i,'js/\s\+//g
At the top level, this is the :global
command that enumerates the lines
starting with zero or more whitespace characters followed by a Latin letter or
an underscore (see :help /\h
). The lines matching this pattern are supposed
to be the header lines containing section and title names. The rest of the
command, after the pattern describing the header lines, are instructions to be
executed for each of those lines.
The actions to be performed on the headers can be divided into three steps.
Delete the current header line, at the same time extracting section and title names from it.
:d|let@"=substitute(@"[:-2],'\s\+and\s\+',',','')
First, remove the current line, saving it into the unnamed register,
using the :delete
command. Then, update the contents of that
register (referred to as @"
; see :help @r
and :help ""
) to be
result of the substitution changing the word and
surrounded by
whitespace characters, to a single comma. The actual replacement is
carried out by the substitute()
function.
However, the input is not the exact string containing the whole header
line, but its prefix leaving out the last character, which is
a newline symbol. The [:-2]
notation is a short form of the
[0:-2]
subscript expression that designates the substring from the
very first byte to the second one counting from the end (see :help
expr-[:]
). This way, the unnamed register holds the section and the
title names separated by comma.
Determine the range of dependent subsection lines.
:ki|/\n\s*\h\|\%$/kj
After the first step, the subsection records belonging to the just
parsed header line are located starting from the current line (the one
followed the header) until the next header line or, if there is no
such line below, the end of buffer. The numbers of these lines are
stored in the marks i
and j
, respectively. (See :helpg ^A mark
is
for description of marks.)
The marks are placed using the :k
command that sets a specified mark
at the last line of a given range which is the current line, by
default. So, unlike the first line of the considered block, the last
one requires a specific line range to point out its location.
A particular form of range, denoting the next line where a given
pattern matches, is used in this case (see :help :range
). The
pattern defining the location of the line to be found, is composed in
such a way that it matches a line immediately preceding a header (a
line starting with possible whitespace followed by an alphabetical
character), or the very last line. (See :help pattern
for details
about syntax of Vim regular expressions.)
Transform the delineated subsection lines according to desired format, prepending section and title names found in the corresponding header line.
:'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=@".','.submatch(1).','/|'i,'js/\s\+//g
This step comprised of the two :substitute
commands that are run
over the range of lines delimited by the locations labelled by the
marks i
and j
(see :help [range]
).
The first substitution command matches the beginning of a subsection
line—an identifier followed by a hyphen and the word The
, all
floating in a whitespace—and replaces it with the contents of the
unnamed register, holding the section and title names concatenated
with a comma, the matched identifier, and another comma. The second
substitution finalizes the transformation by squeezing all whitespace
characters on the line to gum the subsection name and the following
letter together.
To construct the replacement string in the first :substitute
command, the substitute-with-an-expression feature is used (see :help
sub-replace-\=
). The substitution part of the command should start
with \=
for Vim to interpret the remaining text not in a regular
way, but as an expression (see :help expression
). The result of
that expression's evaluation becomes the substitution string. Note
the use of the submatch()
function in the substitute expression to
retrieve the text of a submatch by its number.
1 The command is wrapped for better readability, its one-line version is listed below for ease of copy-pasting into Vim command line. Note that the wrapped command can be used in a Vim script without any change.
:g/^\s*\h/d|let@"=substitute(@"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=@".','.submatch(1).','/|'i,'js/\s\+//g