Using wildcard in Python

I have a df with 3 columns (col1, col2, and col3), and I have a user input that can enter a value for columns, just one, two or three, and of any combination, like (col1 or col1&col2, or col2&col3, etc.). Based on the user input, I need to select rows that have these values. For example, if I have the following table:

col1   col2   col3
1      3      3
3      4      5
5      2      1
3      4      2

and so on, the user can enter value for col2 only, (4 in this case), then I have to display rows (2 and 4), or if they enter (1) for col1 then only row(1) to display, my logical formula would be like this:

x = input1
y = input2
z = imput3
a = df['col1'] == x and df['col2'] == y and df['col1'] == z

So the x, y and z can be any value based on the input, including nil (nil means all).

Any suggestion on how to write the code for such a formula?

Solution

Use boolean combinations.

a = ((df['col1'] == x) | (x is None)) \
  & ((df['col2'] == y) | (y is None)) \
  & ((df['col3'] == z) | (z is None))
df1 = df[a]

None is the wildcard here.

Note that & and | are not short-circuiting operators, so this could be expensive. Even when you have a wildcard for a column, it will still compare everything in that column with None. A better way would be to construct the conditions dynamically.

condition = True

if x is not None:
    condition &= df['col1'] == x

if y is not None:
    condition &= df['col2'] == y

if z is not None:
    condition &= df['col3'] == z

df1 = df[condition]

You could do this more dynamically by creating a dictionary of column names and conditions.