Search code examples
pythonpython-3.xpandasreindex

Skip some columns between two columns when appending dataframe to existing empty dataframe


Currently I'm extracting data from pdf's and putting it in a csv file. I'll explain how this works.

First I create an empty dataframe:

ndataFrame = pandas.DataFrame()

Then I read the data. Assume for simplicity reasons the data is the same for each pdf:

data = {'shoe': ['a', 'b'], 'fury': ['c','d','e','f'], 'chaos': ['g','h']}
dataFrame = pandas.DataFrame({k:pandas.Series(v) for k, v in data.items()})

Then I append this data to the empty dataframe:

ndataFrame = ndataFrame.append(dataFrame)

The is the output:

  shoe fury chaos
0    a    c     g
1    b    d     h
2  NaN    e   NaN
3  NaN    f   NaN

However, now comes the issue. I need some columns (let's say 4) to be empty between the columns fury and chaos. This is my desired output:

  shoe fury                        chaos
0    a    c                         g
1    b    d                         h
2  NaN    e                         NaN
3  NaN    f                         NaN

I tried some stuff with reindexing but I couldn't figure it out. Any help is welcome.

By the way, my desired output might be confusing. To be clear, I need some columns to be completely empty between fury and chaos(this is because some other data goes in there manually).

Thanks for reading


Solution

  • This answer assumes you have no way to change the way the data is being read in upstream. As always, it is better to handle these types of formatting changes at the source. If that is not possible, here is a way to do it after parsing.


    You can use reindex here, using numpy.insert to add your four columns:

    dataFrame.reindex(columns=np.insert(dataFrame.columns, 2, [1,2,3,4]))
    

      shoe fury   1   2   3   4 chaos
    0    a    c NaN NaN NaN NaN     g
    1    b    d NaN NaN NaN NaN     h
    2  NaN    e NaN NaN NaN NaN   NaN
    3  NaN    f NaN NaN NaN NaN   NaN