I have to mark up a text inserting tags in a string as follows:
mystring = '123456789'
postions = [(2,4),(6,8)]
result = '12<tag1>34</tag1>56<tag2>78</tag2>9'
positions is a list of tuples that gives the start and end of the tag. (In principle the spans of the tags do not overlap)
I started assumend that I would need all the positions of the tags flatten out:
import itertools
pos2 = list(itertools.chain(*positions))
which gives:
[2, 4, 6, 8]
That allows me to do this:
stringWithTags= ''
opentag=True
nrtag=1
for j,character in enumerate(mystring):
i=j+1
if i in pos2:
if opentag:
toadd = character + '<tag' + str(nrtag) +'>'
else:
toadd = character + '</tag' + str(nrtag) +'>'
nrtag = nrtag + 1
stringWithTags = stringWithTags + toadd
opentag = not(opentag)
else:
stringWithTags = stringWithTags + character
stringWithTags
that works, but its quite horrible code, and incredible verbose. This problem should be well known and there might be out of the box solutions I am not aware of.
Any suggestions?
EDIT 1: I also considered my code not water tight because the moment the spans overlap this will not fly and a solution valid in both cases
I think that easy way is to split string to list and work with it by indexes from positions like this: split string, loop over positions-list, add prefix tag or suffix tag by position, finally join list to new string.
mystring = '123456789'
postions = [(2,4),(6,8)]
str_list = [c for c in mystring]
for r in postions:
for _, c in enumerate(r):
str_list[c] = ('</tag>{}' if _ else '<tag>{}').format(str_list[c])
new_string = ''.join(str_list)