Currently I'm extracting data from pdf's and putting it in a csv file. I'll explain how this works.
First I create an empty dataframe:
ndataFrame = pandas.DataFrame()
Then I read the data. Assume for simplicity reasons the data is the same for each pdf:
data = {'shoe': ['a', 'b'], 'fury': ['c','d','e','f'], 'chaos': ['g','h']}
dataFrame = pandas.DataFrame({k:pandas.Series(v) for k, v in data.items()})
Then I append this data to the empty dataframe:
ndataFrame = ndataFrame.append(dataFrame)
The is the output:
shoe fury chaos
0 a c g
1 b d h
2 NaN e NaN
3 NaN f NaN
However, now comes the issue. I need some columns (let's say 4) to be empty between the columns fury and chaos. This is my desired output:
shoe fury chaos
0 a c g
1 b d h
2 NaN e NaN
3 NaN f NaN
I tried some stuff with reindexing but I couldn't figure it out. Any help is welcome.
By the way, my desired output might be confusing. To be clear, I need some columns to be completely empty between fury and chaos(this is because some other data goes in there manually).
Thanks for reading
This answer assumes you have no way to change the way the data is being read in upstream. As always, it is better to handle these types of formatting changes at the source. If that is not possible, here is a way to do it after parsing.
You can use reindex
here, using numpy.insert
to add your four columns:
dataFrame.reindex(columns=np.insert(dataFrame.columns, 2, [1,2,3,4]))
shoe fury 1 2 3 4 chaos
0 a c NaN NaN NaN NaN g
1 b d NaN NaN NaN NaN h
2 NaN e NaN NaN NaN NaN NaN
3 NaN f NaN NaN NaN NaN NaN