Insert elements in front of specific list elements

I have pandas data frame with two columns:

sentence - fo n bar
annotations [B-inv, B-inv, O, I-acc, O, B-com, I-com, I-com]

I want to insert additional 'O' elements in the annotations list in front of each annotation starting with 'B', which will look like this:

[O, B-inv, O, B-inv, O, I-acc, O, O, B-com, I-com, I-com]
' f o n  bar'

And then insert additional whitespace in front of each element with an index equal to the 'B' annotation indexes from the initial annotation: meaning inserting in front of each char from the sentence with index in this list [0,1,5]

Maybe to make it more visibly appealing I should represent it this way:

Initial sentence:

Ind	Sentence char	Annot
0	f	B-inv
1	o	B-inv
2	whitespace	O
3	n	I-acc
4	whitespace	O
5	b	B-com
6	a	I-com
7	r	I-com

End sentence:

Ind	Sentence char	Annot
0	whitespace	O
1	f	B-inv
2	whitespace	O
3	o	B-inv
4	whitespace	O
5	n	I-acc
6	whitespace	O
7	whitespace	O
8	b	B-com
9	a	I-com
10	r	I-com

Solution

Updated answer (list comprehension)

from itertools import chain
annot = ['B-inv', 'B-inv', 'O', 'I-acc', 'O', 'B-com', 'I-com', 'I-com']
sent = list('fo n bar')

annot, sent = list(map(lambda l: list(chain(*l)), list(zip(*[(['O', a], [' ', s]) if a.startswith('B') else ([a], [s]) for a,s in zip(annot, sent)]))))

print(annot)
print(''.join(sent))

chain from itertools allow you to chain together a list of lists to form a single list. Then the rest is some clumsy use of zip together with list unpacking (the prefix * in argument names) to get it in one line. map is only used to apply the same operation to both lists basically.

But a more readable version, so you can also follow the steps better, could be:

# find where in the annotations the element starts with 'B'
loc = [a.startswith('B') for a in annot]
# Use this locator to add an element and Merge the list of lists with `chain`
annot = list(chain.from_iterable([['O', a] if l else [a] for a,l in zip(annot, loc)]))
sent = ''.join(chain.from_iterable([[' ', a] if l else [a] for a,l in zip(sent, loc)])) # same on sentence

Note that above, I do not use map as we process each list separately, and there is less zipping and casting to lists. So most probably, a much cleaner, and hence preferred solution.

Old answer (pandas)

I am not sure it is the most convenient to do this on a DataFrame. It might be easier on a simple list, before converting to a DataFrame.

But anyway, here is a way through it, assuming you don't really have meaningful indices in your DataFrame (so that indices are simply the integer count of each row).

The trick is to use .str strings functions such as startswith in this case to find matching strings in one of the column Series of interest and then you could loop over the matching indices ([0, 1, 5] in the example) and insert at a dummy location (half index, e.g. 0.5 to place the row before row 1) the row with the whitespace and 'O' data. Then sorting by sindices with .sort_index() will rearrange all rows in the way you want.

import pandas as pd
annot = ['B-inv', 'B-inv', 'O', 'I-acc', 'O', 'B-com', 'I-com', 'I-com']
sent = list('fo n bar')
df = pd.DataFrame({'sent':sent, 'annot':annot})

idx = np.argwhere(df.annot.str.startswith('B').values) # find rows where annotations start with 'B'

for i in idx.ravel(): # Loop over the indices before which we want to insert a new row
  df.loc[i-0.5] = [' ', 'O'] # made up indices so that the subsequent sorting will place the row where you want it

df.sort_index().reset_index(drop=True) # this will output the new DataFrame