Search code examples
pythonpandasdataframesubtraction

How to subtract two partial columns with pandas?


I'm just getting started with Pandas so I may be missing something important, but I can't seem to successfully subtract two columns I'm working with. I have a spreadsheet in excel that I imported as follows:

df = pd.read_excel('/path/to/file.xlsx',sheetname='Sheet1')

My table when doing df.head() looks similar to the following:

      a         b        c        d
0     stuff     stuff    stuff    stuff
1     stuff     stuff    stuff    stuff
2     data      data     data     data
...   ...       ...      ...      ...
89    data      data     data     data

I don't care about the "stuff;" I would like to subtract two columns of just the data and make this its own column. Therefore, it seemed obvious that I should trim off the rows I'm not interested in and work with what remains, so I have tried the following:

dataCol1 = df.ix[2:,0:1]
dataCol2 = df.ix[2:,1:2]
print(dataCol1.sub(dataCol2,axis=0))

But it results in

             a          b
2           NaN        NaN
3           NaN        NaN
4           NaN        NaN
...         ...        ...
89          NaN        NaN

I get the same result if I also simply try print(dataCol1-dataCol2). I really don't understand how both of these subtraction operations not only result in all NaN's, but also two columns instead of just one with the end result. Because when I print(dataCol1), for example, I do obtain the column I want to work with:

      a      
2     data     
3     data   
4     data     
...   ...
89    data      

Is there any way to both work simply and directly from an Excel spreadsheet and perform basic operations with a truncated portion of the columns of said spreadsheet? Maybe there is a better way to go about this than using df.ix and I am definitely open to those methods as well.


Solution

  • The problem is the misallignment of your indices.

    One thing to do would be to subtract the values, so you don't have to deal with alignment issues:

    dataCol1 = df.iloc[2: , 0:1]   # ix is deprecated
    dataCol2 = df.iloc[2: , 1:2]
    
    result = pd.DataFrame(dataCol1.values - dataCol2.values)