Original
DF has values that are calculated from a table.
It may have type
missing for few ids.
However, the goal is explicitly list if the flag exists
.
If not, type
row should be added with no
in the exists
column.
id=1 is good in this example. If I filter on id=2 only, and run the following code, it returns c. which I can then append
.
aa = set(type)
bb = set(b['type'].to_list())
list(aa - bb)
However, as you can tell, this is not feasible when applying to all id.
I needed help so the code can be applied for all the ids.
This returns the type that are no
type = [a, b, c]
Original
id type flag
1 a y
1 b y
1 c y
2 a y
2 b y
3 b y
Goal
id type flag exists
1 a y yes
1 b y yes
1 c y yes
2 a y yes
2 b y yes
2 c y no
3 a y yes
3 b y no
3 b y no
Hoping there is a better way to handle this.
The key is to generate a Cartesian product between set of ids and set of types. I've used this Cartesian product axiom
data = '''id , type , flag
1 , a , y
1 , b , y
1 , c , y
2 , a , y
2 , b , y
3 , b , y '''
da = [[i.strip() for i in l.split(",")] for l in data.split("\n")]
df = pd.DataFrame(da[1:], columns=da[0])
# generate cartesian product https://apassionatechie.wordpress.com/2018/01/15/pandas-equivalent-of-sql-cross-join-cartesian-product/
cp = pd.DataFrame(["a","b","c"], columns=["type"])\
.assign(foo=1)\
.merge(pd.DataFrame(df["id"].unique(), columns=["id"]).assign(foo=1)).drop('foo',1)
df = cp.merge(df, how="outer", on=["id","type"])
df["exists"] = ~df["flag"].isnull()
df.fillna(method="ffill")
output
type id flag exists
0 a 1 y True
1 a 2 y True
2 a 3 y False
3 b 1 y True
4 b 2 y True
5 b 3 y True
6 c 1 y True
7 c 2 y False
8 c 3 y False