I have a df with 3 columns (col1, col2, and col3), and I have a user input that can enter a value for columns, just one, two or three, and of any combination, like (col1 or col1&col2, or col2&col3, etc.). Based on the user input, I need to select rows that have these values. For example, if I have the following table:
col1 col2 col3
1 3 3
3 4 5
5 2 1
3 4 2
and so on, the user can enter value for col2 only, (4 in this case), then I have to display rows (2 and 4), or if they enter (1) for col1 then only row(1) to display, my logical formula would be like this:
x = input1
y = input2
z = imput3
a = df['col1'] == x and df['col2'] == y and df['col1'] == z
So the x, y and z can be any value based on the input, including nil (nil means all).
Any suggestion on how to write the code for such a formula?
Use boolean combinations.
a = ((df['col1'] == x) | (x is None)) \
& ((df['col2'] == y) | (y is None)) \
& ((df['col3'] == z) | (z is None))
df1 = df[a]
None
is the wildcard here.
Note that &
and |
are not short-circuiting operators, so this could be expensive. Even when you have a wildcard for a column, it will still compare everything in that column with None
. A better way would be to construct the conditions dynamically.
condition = True
if x is not None:
condition &= df['col1'] == x
if y is not None:
condition &= df['col2'] == y
if z is not None:
condition &= df['col3'] == z
df1 = df[condition]
You could do this more dynamically by creating a dictionary of column names and conditions.