I have multiple pandas df that I'm trying to slice every column after a certain column (Target Col). The problem is Target Col will have a different index number each time I try to slice it. The Pandas dfs would look like this:
+------------+------+------+
| Target Col | Col2 | Col3 |
+------------+------+------+
| Data | Data | Data |
+------------+------+------+
+------+------------+------+
| Col1 | Target Col | Col3 |
+------+------------+------+
| Data | Data | Data |
+------+------------+------+
And what I want to pull is every column after the Target Col on each df:
+------------+------+
| Target Col | Col3 |
+------------+------+
| Data | Data |
+------------+------+
+------------+------+------+
| Target Col | Col2 | Col3 |
+------------+------+------+
| Data | Data | Data |
+------------+------+------+
What I have for code so far is (shortened for clarity):
for files in dir:
df = pd.read_excel(files)
target_cols = [col for col in df if col.startswith('Target Col')]
list_data = list(df.columns)
table_tail = df.iloc[:, list_data.index(target_cols[0]):]
The error I get is "ValueError: Must pass DataFrame with boolean values only"
The code is written that way (in and out of lists, a little convoluted) due to trying to write code to slice multiple Pandas dfs based on an index number. If someone has a shorter and less convoluted way to get this to work, I'm happy to hear some options.
You could consider using masks and np.cumsum
import pandas as pd
import numpy as np
cols1 = ["Target Col", "Col2", "Col3"]
cols2 = ["Col1", "Target Col", "Col3"]
df = pd.DataFrame(np.random.randn(4,3),columns=cols1)
target_cols = [col=='Target Col' for col in df]
target_cols = np.cumsum(target_cols).astype(bool)
df = df[df.columns[target_cols]]
print(df)
Target Col Col2 Col3
0 -0.191493 1.382337 1.030406
1 -0.008358 0.262019 -1.744335
2 -0.218022 0.010588 0.373674
3 -0.585362 -0.664626 -1.030293
df = pd.DataFrame(np.random.randn(4,3),columns=cols2)
target_cols = [col=='Target Col' for col in df]
target_cols = np.cumsum(target_cols).astype(bool)
df = df[df.columns[target_cols]]
print(df)
Target Col Col3
0 -1.677061 0.123344
1 -0.616199 -0.277216
2 -0.541302 -0.635904
3 0.821543 -0.826233