Search code examples
pythonpandasdataframepandas-loc

Pandas dataframe loop all values for certain column


Working with pandas dataframe suppose i have data frame with similar structure as following:

import pandas as pd
a_choise = ["True", "False", "False", "False", "True", "False", "False", "True", "True"]
b_choise = ["True", "True", "False", "False", "False", "False", "True", "True", "True"]
c_choise = ["False", "False", "True", "False", "True", "True", "False", "True", "False"]
a_n = ["a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9"]
b_n = ["b1", "b2", "b3", "b4", "b5", "b6", "b7", "b8", "b9"]
c_n = ["c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8", "c9"]
df = pd.DataFrame(
    {"a": list(range(1, 10)), "b": list(range(11, 20)), "c": range(21, 30), 
     "a_Cho":a_choise, "b_Cho":b_choise, "c_Cho":c_choise,
     "a_n":a_n, "b_n":b_n, "c_n":c_n}
)
    a   b   c   a_Cho   b_Cho   c_Cho   a_n b_n c_n
0   1   11  21  True    True    False   a1  b1  c1
1   2   12  22  False   True    False   a2  b2  c2
2   3   13  23  False   False   True    a3  b3  c3
3   4   14  24  False   False   False   a4  b4  c4
4   5   15  25  True    False   True    a5  b5  c5
5   6   16  26  False   False   True    a6  b6  c6
6   7   17  27  False   True    False   a7  b7  c7
7   8   18  28  True    True    True    a8  b8  c8
8   9   19  29  True    True    False   a9  b9  c9

I want a new 2 columns (Choise, Value) that meet the follwoing conditions for all values in "a_Cho", "b_Cho", And "c_Cho"

  • if "a_Cho" = true then choise = "a_n", value = a for the corresponding value of "a_Cho" elif "a_Cho" = false then move to next
  • if "b_Cho" = true then choise = "b_n", value = b for the corresponding value of "b_Cho"elif "b_Cho" = false then move to next
  • if "c_Cho" = true then choise = "c_n", value = c for the corresponding value of "c_Cho"elif "c_Cho" = false then move to next
  • if "x_Cho" = false then value and choise = "Invalide"

Solution

  • Edited

    Thank you for updating your question. Following your edit, I believe .loc will be useful to you. .loc allows us to perform "logical indexing", grabbing rows based on column equalities.

    For example, the following gets all rows where column a_Cho is equal to "True",

    >>> df.loc[df.a_Cho.eq('True'), ['a_n', 'a']]
        a   b   c   a_Cho   b_Cho   c_Cho   a_n b_n c_n
    0   1   11  21  True    True    False   a1  b1  c1
    4   5   15  25  True    False   True    a5  b5  c5
    7   8   18  28  True    True    True    a8  b8  c8
    8   9   19  29  True    True    False   a9  b9  c9
    

    We can also select a subset of columns using .loc.

    >>> df.loc[df.a_Cho.eq("True"), ["a_n", "a"]].rename(columns={"a_n": "choise", "a": "value"})
    

    where I've used .rename() to rename the columns.

    Using the above approach, we can perform logical indexing for each of the conditions you state then concatenate the results.

    def new_col_names(x):
        return {x + "_n": "choise", x: "value"}
    
    # logical criteria
    only_a = df.a_Cho.eq("True")
    only_b = df.a_Cho.eq("False") & df.b_Cho.eq("True")
    only_c = df.a_Cho.eq("False") & df.b_Cho.eq("False") & df.c_Cho.eq("True")
    invalid = df.a_Cho.eq("False") & df.b_Cho.eq("False") & df.c_Cho.eq("False")
    
    df_a = df.loc[only_a, ["a_n", "a"]].rename(columns=new_col_names("a"))
    df_b = df.loc[only_b, ["b_n", "b"]].rename(columns=new_col_names("b"))
    df_c = df.loc[only_c, ["c_n", "c"]].rename(columns=new_col_names("c"))
    df_inv = df.loc[invalid].assign(choise="invalide", value="invalide").copy()
    df_inv = df_inv[["choise", "value"]]
    
    df_new = pd.concat([df_a, df_b, df_c, df_inv])
    

    The final DataFrame looks like:

        choise    value
    0   a1         1
    4   a5         5
    7   a8         8
    8   a9         9
    1   b2        12
    6   b7        17
    2   c3        23
    5   c6        26
    3   invalide  invalide
    

    Note that the index values along the left-hand side show the original row numbers of each entry. If you don't care about these numbers you can pass the ignore_index=True option to .concat.

    Is this closer to what you wanted?

    Original

    Hello and welcome to StackOverflow! I'm not sure if I completely understand your question. For example, in your sample code it does not appear that you use the loop variable x in each iteration. It may help to see an example DataFrame that has the same structure as the one you are working with.

    My impression is that your question may be similar to this question which uses pd.melt.

    Is your DataFrame structured like the following?

    >>> import pandas as pd
    >>> df = pd.DataFrame(
        {"A": list(range(1, 10)), "B": list(range(11, 20)), "C": range(21, 30)}
    )
    >>> df.head()
    
        A   B   C
    0   1   11  21
    1   2   12  22
    2   3   13  23
    3   4   14  24
    4   5   15  25
    5   6   16  26
    6   7   17  27
    7   8   18  28
    8   9   19  29
    

    If so, you can use pd.melt to restructure it to have two columns, a "column names" column and a "column values" column, as you describe in your question.

    The command and output for the example DataFrame above would be:

    >>> pd.melt(df, value_vars=['A', 'B', 'C'])
    
      variable  value
    0   A   1
    1   A   2
    2   A   3
    3   A   4
    4   A   5
    5   A   6
    6   A   7
    7   A   8
    8   A   9
    9   B   11
    10  B   12
    11  B   13
    12  B   14
    13  B   15
    14  B   16
    15  B   17
    16  B   18
    17  B   19
    18  C   21
    19  C   22
    20  C   23
    21  C   24
    22  C   25
    23  C   26
    24  C   27
    25  C   28
    26  C   29
    
    

    Is this similar to what you are asking? If not, could you provide an example of the DataFrame you are working with and an example of what you want the final result to look like? It could be a simplified or "mock" example.