My dataset looks like this
Name Subset Value
A 67-A-5678 14
A 58-ABC-87555 187
A 45-ASH-87954 5465
S 34-A-8785 454
S 58-ASO-98978 54
S 23-ASH-87895 784
X 98-X-87876 455
X 87-ABC-54578 4545
X 56-ASH-89667 854
Y 09-D-98644 45
Y 87-ABC-78834 98
Y 87-ASH-87455A 4566
L 67-A-87545 78
L 89-GHS-08753 12
L 78-PHU-09876 655
I want to keep only those groups of rows whose "subset" columns are of pattern; *
, *ABC
, *ASH
(Note: *
is any alphabet or digit).
For example, output should look like
Name Subset Value
A 67-A-5678 14
A 58-ABC-87555 187
A 45-ASH-87954 5465
X 98-X-87876 455
X 87-ABC-54578 4545
X 56-ASH-89667 854
Y 09-D-98644 45
Y 87-ABC-78834 98
Y 87-ASH-87455A 4566
P.S. Actual dataset can be of many columns/rows.
Try this:
filtered = df[df.groupby('Name')['Subset'].transform(lambda x: len(x) >= 3 and'-ABC-' in x.iloc[1] and '-ASH-' in x.iloc[2])]
Output:
>>> filtered
Name Subset Value
0 A 67-A-5678 14
1 A 58-ABC-87555 187
2 A 45-ASH-87954 5465
6 X 98-X-87876 455
7 X 87-ABC-54578 4545
8 X 56-ASH-89667 854
9 Y 09-D-98644 45
10 Y 87-ABC-78834 98
11 Y 87-ASH-87455A 4566