Search code examples
pythonpandasdataframeobject-slicing

Python 3.X Pandas - ValueError: Must pass DataFrame with boolean values only


I have multiple pandas df that I'm trying to slice every column after a certain column (Target Col). The problem is Target Col will have a different index number each time I try to slice it. The Pandas dfs would look like this:

+------------+------+------+
| Target Col | Col2 | Col3 |
+------------+------+------+
| Data       | Data | Data |
+------------+------+------+

+------+------------+------+
| Col1 | Target Col | Col3 |
+------+------------+------+
| Data | Data       | Data |
+------+------------+------+

And what I want to pull is every column after the Target Col on each df:

+------------+------+
| Target Col | Col3 |
+------------+------+
| Data       | Data |
+------------+------+
+------------+------+------+
| Target Col | Col2 | Col3 |
+------------+------+------+
| Data       | Data | Data |
+------------+------+------+

What I have for code so far is (shortened for clarity):

for files in dir:    
    df = pd.read_excel(files)    
    target_cols = [col for col in df if col.startswith('Target Col')]
    list_data = list(df.columns)
    table_tail = df.iloc[:, list_data.index(target_cols[0]):]

The error I get is "ValueError: Must pass DataFrame with boolean values only"

The code is written that way (in and out of lists, a little convoluted) due to trying to write code to slice multiple Pandas dfs based on an index number. If someone has a shorter and less convoluted way to get this to work, I'm happy to hear some options.


Solution

  • You could consider using masks and np.cumsum

    import pandas as pd
    import numpy as np
    
    cols1 = ["Target Col", "Col2", "Col3"]
    cols2 = ["Col1", "Target Col", "Col3"]
    df = pd.DataFrame(np.random.randn(4,3),columns=cols1)
    
    target_cols = [col=='Target Col' for col in df]
    target_cols = np.cumsum(target_cols).astype(bool)
    
    df = df[df.columns[target_cols]]
    print(df)
    
       Target Col      Col2      Col3
    0   -0.191493  1.382337  1.030406
    1   -0.008358  0.262019 -1.744335
    2   -0.218022  0.010588  0.373674
    3   -0.585362 -0.664626 -1.030293
    
    
    df = pd.DataFrame(np.random.randn(4,3),columns=cols2)
    target_cols = [col=='Target Col' for col in df]
    target_cols = np.cumsum(target_cols).astype(bool)
    
    df = df[df.columns[target_cols]]
    print(df)
    
       Target Col      Col3
    0   -1.677061  0.123344
    1   -0.616199 -0.277216
    2   -0.541302 -0.635904
    3    0.821543 -0.826233