Search code examples
pythonmatrixsubtraction

python subtract every even column from previous odd column


Sorry if this has been asked before -- I couldn't find this specific question.

In python, I'd like to subtract every even column from the previous odd column:

so go from:

292.087 190.238 299.837 189.488 255.525 187.012
300.837 190.887 299.4   188.488 248.637 187.363
292.212 191.6   299.038 188.988 249.65  187.5
300.15  192.4   307.812 189.125 247.825 188.113

to

101.849 110.349 68.513
109.95  110.912 61.274
100.612 110.05  62.15
107.75  118.687 59.712

There will be an unknown number of columns. should I use something in pandas or numpy?

Thanks in advance.


Solution

  • You can accomplish this using pandas. You can select the even- and odd-indexed columns separately and then subtract them.

    @hiro protagonist, I didn't know you could do that StringIO magic. That's spicy.

    import pandas as pd
    import io
    
    data = io.StringIO('''ROI121  ROI122  ROI124  ROI125  ROI126  ROI127
                          292.087 190.238 299.837 189.488 255.525 187.012
                          300.837 190.887 299.4   188.488 248.637 187.363
                          292.212 191.6   299.038 188.988 249.65  187.5
                          300.15  192.4   307.812 189.125 247.825 188.113''')
    
    df = pd.read_csv(data, sep='\s+')
    

    Note that the even/odd terms may be counterintuitive because python is 0-indexed, meaning that the signal columns are actually even-indexed and the background columns odd-indexed. If I understand your question properly, this is contrary to your use of the even/odd terminology. Just pointing out the difference to avoid confusion.

    # strip the columns into their appropriate signal or background groups
    bg_df = df.iloc[:, [i for i in range(len(df.columns)) if i%2 == 1]]
    signal_df = df.iloc[:, [i for i in range(len(df.columns)) if i%2 == 0]]
    
    # subtract the values of the data frames and store the results in a new data frame
    result_df = pd.DataFrame(signal_df.values - bg_df.values)
    

    result_df contains columns which are the difference between the signal and background columns. You probably want to rename these column names, though.

    >>> result_df
             0        1       2
    0  101.849  110.349  68.513
    1  109.950  110.912  61.274
    2  100.612  110.050  62.150
    3  107.750  118.687  59.712