Search code examples
pythonpandasdataframenumpymulti-index

pandas multIndex from product - ignore same row comparison


I have a pandas dataframe like as shown below

Company,year                                   
T123 Inc Ltd,1990
T124 PVT ltd,1991
ABC Limited,1992
ABCDE Ltd,1994

tf = pd.read_clipboard(sep=',')
tf['Company_copy'] = tf['Company']

I would like to compare each value from tf['company'] against each value of tf['company_copy] but exclude same matching row number or index number, string

For ex: I want T123 Inc Ltd to be compared with remaining 3 items. Similarly, I want ABCDE Ltd to be compared only with remanining 3 items.

So, I tried the below with the help of this post here

compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()

but it produces some incorrect comparison as shown below. I want to avoid duplicate comparison

enter image description here.

I expect my output to be like as below. You can see it doesn't have duplicate/same row comparison

Company       Company_copy    
       
    T123 Inc Ltd      T124 PVT ltd    (    T123 Inc Ltd,     T124 PVT ltd)
                      ABC Limited      (    T123 Inc Ltd,     ABC Limited)
                      ABCDE Ltd          (    T123 Inc Ltd,     ABCDE Ltd)
    T124 PVT ltd      T123 Inc Ltd    (    T124 PVT ltd,     T123 Inc Ltd)
                      ABC Limited      (    T124 PVT ltd,     ABC Limited)
                      ABCDE Ltd          (    T124 PVT ltd,     ABCDE Ltd)
    ABC Limited       T123 Inc Ltd     (    ABC Limited,     T123 Inc Ltd)
                      T124 PVT ltd     (    ABC Limited,     T124 PVT ltd)
                      ABCDE Ltd           (    ABC Limited,     ABCDE Ltd)
    ABCDE Ltd         T123 Inc Ltd       (    ABCDE Ltd,     T123 Inc Ltd)
                      T124 PVT ltd       (    ABCDE Ltd,     T124 PVT ltd)
                      ABC Limited         (    ABCDE Ltd,     ABC Limited)

Solution

  • You can compare both levels of MultiIndex for not equal, comapre first and second level:

    compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company_copy'].astype(str)]).to_series()
    compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]
    print (compare)
    Company       Company_copy
    T123 Inc Ltd  T124 PVT ltd    (T123 Inc Ltd, T124 PVT ltd)
                  ABC Limited      (T123 Inc Ltd, ABC Limited)
                  ABCDE Ltd          (T123 Inc Ltd, ABCDE Ltd)
    T124 PVT ltd  T123 Inc Ltd    (T124 PVT ltd, T123 Inc Ltd)
                  ABC Limited      (T124 PVT ltd, ABC Limited)
                  ABCDE Ltd          (T124 PVT ltd, ABCDE Ltd)
    ABC Limited   T123 Inc Ltd     (ABC Limited, T123 Inc Ltd)
                  T124 PVT ltd     (ABC Limited, T124 PVT ltd)
                  ABCDE Ltd           (ABC Limited, ABCDE Ltd)
    ABCDE Ltd     T123 Inc Ltd       (ABCDE Ltd, T123 Inc Ltd)
                  T124 PVT ltd       (ABCDE Ltd, T124 PVT ltd)
                  ABC Limited         (ABCDE Ltd, ABC Limited)
    dtype: object