Search code examples
pythonpandasuniquedistinctconcatenation

get the distinct column values and union the dataframes


I am trying to convert sql statement

    SELECT distinct table1.[Name],table1.[Phno]
    FROM table1
    union
    select distinct table2.[Name],table2.[Phno] from table2
    UNION 
    select distinct table3.[Name],table3.[Phno] from table3;

Now I have 4 dataframes: table1, table2, table3.

table1   
     Name        Phno
0  Andrew  6175083617
1  Andrew  6175083617
2   Frank  7825942358
3   Jerry  3549856785
4     Liu  9659875695
table2
     Name        Phno
0   Sandy  7859864125
1  Nikhil  9526412563
2   Sandy  7859864125
3    Tina  7459681245
4   Surat  9637458725
table3
     Name        Phno
0   Patel  9128257489
1    Mary  3679871478
2  Sandra  9871359654
3    Mary  3679871478
4    Hali  9835167465

now I need to get distinct values of these dataframes and union them and get the output to be:

sample output
  Name        Phno
0   Andrew  6175083617
1    Frank  7825942358
2    Jerry  3549856785
3      Liu  9659875695
4    Sandy  7859864125
5   Nikhil  9526412563
6     Tina  7459681245
7    Surat  9637458725
8    Patel  9128257489
9     Mary  3679871478
10  Sandra  9871359654
11    Hali  9835167465

I tried to get the unique values for one dataframe table1 as shown below:

table1_unique = pd.unique(table1.values.ravel()) #which gives me 
table1_unique
array(['Andrew', 6175083617L, 'Frank', 7825942358L, 'Jerry', 3549856785L,
   'Liu', 9659875695L], dtype=object)

But i get them as an array. I even tried converting them as dataframe using:

table1_unique1 = pd.DataFrame(table1_unique)
table1_unique1
            0
0      Andrew
1  6175083617
2       Frank
3  7825942358
4       Jerry
5  3549856785
6         Liu 
7  9659875695

How do I get unique values in a dataframe, so that I can concat them as per my sample output. Hope this is clear. Thanks!!


Solution

  • a = table1df[['Name','Phno']].drop_duplicates()
    b = table2df[['Name','Phno']].drop_duplicates()
    c = table3df[['Name','Phno']].drop_duplicates()
    result = pd.concat([a,b,c])