Search code examples
pythonpandasindexingfiltermultiple-columns

Problem with Unalignable boolean Series with two columns filter


In this case I am working with 2 columns that are substracted from 2 Dataframes. The columns are ["# Externo","Nro Envio ML"]]

My target is to recieve the numbers that exist in "# Externo" but no exist in"Nro Envio ML" , only that number/ or numbers that fill to that condition.

To take a look what I am talking about:

dfn.info()
dfn

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1143 entries, 0 to 2151
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   # Externo     404 non-null    object
 1   Nro Envio ML  894 non-null    object
dtypes: object(2)
memory usage: 26.8+ KB

    # Externo   Nro Envio ML
0   41764660663     NaN
1   41765189264     NaN
2   41765105927     NaN
3   41765931626     NaN
4   41766474810     NaN
...     ...     ...
2143    NaN     41768876815
2146    NaN     41784067107
2147    NaN     41784051958
2149    NaN     41785977098
2151    NaN     41796142562

1143 rows × 2 columns


# Externo       True
Nro Envio ML    True
dtype: bool

I am expecting ro recieve 41764660663 if exist in column "# Externo" and not in "Nro Envio ML"

This is What I tried: 
df1 = df1[df1['Unnamed: 26'] == 'Flex'] 
df2= pd.concat([df, df1], axis=1)
df2

import numpy as np
df2['Nro Envio ML']=df2['Unnamed: 13']
dfn=df2[["# Externo","Nro Envio ML"]]
print(dfn.notnull().any(axis=0))
dfn= dfn.loc[:,dfn.notnull().any(axis=0)]
print(dfn)
print(dfn.dropna(axis=1,how='all'))
dfn.loc[~df['# Externo'].isin(dfn['Nro Envio ML'].tolist())]

The error I recieve:

IndexingError                             Traceback (most recent call last)
<ipython-input-144-54d975e5ad81> in <module>
      3 print(dfn)
      4 print(dfn.dropna(axis=1,how='all'))
----> 5 dfn.loc[~df['# Externo'].isin(dfn['Nro Envio ML'].tolist())]

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in check_bool_indexer(index, key)
   2387         if mask.any():
   2388             raise IndexingError(
-> 2389                 "Unalignable boolean Series provided as "
   2390                 "indexer (index of the boolean Series and of "
   2391                 "the indexed object do not match)."

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Here is not filtered:

dfn3= dfn.loc[~ dfn['# Externo'].isin(dfn['Nro Envio ML']),'# Externo'].values

dfn3


array(['41764660663', '41765189264', '41765105927', '41765931626',
       '41766474810', '41766693570', '41767023186', '41766664967',
       '41765527475', '41766933520', '41758431387', '41767065141',
       '41766834461', '41763758747', '41767007000', '41764139836',
       '41767128958', '41767958453', '41768439109', '41767519460',
       '41768746394', '41767537504', '41768245931', '41768435988',
       '41768710593', '41767850751', '41769343996', '41768163019',
       '41767792365', '41769430226', '41769362435', '41767613260',
       '41767399871', '41769237788', '41769335922', '41768743591',
       '41768970216', '41768972816', '41767801455', '41767351959',
       '41768856005', '41769069211', '41768960289', '41768876815',
       '41768796242', '41768606054', '41769594117', '41768301217',
       '41769316065', '41769275644', '41768747851', '41768992109',
       '41767973684', '41768588967', '41769021462', '41768655275',
       '41769195649', '41771323517', '41770997916', '41770624787',
       '41771124135', '41767953692', '41771990757', '41771503073',
       '41771518432', '41770587159', '41771770302', '41771264986',
       '41770622684', '41771712719', '41770043750', '41769920549',
       '41771890393', '41771093881', '41770335018', '41769851289',
       '41769691702', '41770178002', '41770083356', '41771478219',
       '41771689312', '41770310781', '41770503120', '41771320102',
       '41770872304', '41772333923', '41773077420', '41774107375',
       '41774470025', '41772354195', '41774278154', '41774516055',
       '41764063012', '41773238895', '41770358839', '41773410325',
       '41772497677', '41772207643', '41774095335', '41774540961',
       '41773924133', '41772759005', '41772493934', '41773676496',
       '41772632879', '41772582155', '41772586341', '41772592180',
       '41774973883', '41775140655', '41775320466', '41775999294',
       '41775447715', '41776040324', '41774633931', '41775257392',
       '41775471162', '41771934549', '41775499496', '41774856789',
       '41775136607', '41775410928', '41776142924', '41776094067',
       '41775191189', '41775749633', '41775907614', '41774792841',
       '41776033160', '41775490223', '41778623933', '41777644508',
       '41780014741', '41778994962', '41777323701', '41776219972',
       '41780552222', '41777798847', '41779796901', '41780799923',
       '41780472850', '41772305897', '41780180889', '41780555214',
       '41778280294', '41780767290', '41779889603', '41780667613',
       '41778248797', '41778766814', '41780236744', '41779887066',
       '41776670687', '41777525040', '41780960139', 'nan', '41777644374',
       '41779923800', '41777002840', '41777753678', '41778182378',
       '41776301694', '41779886597', '41779667714', '41781000946',
       '41777189468', '41780087137', '41780155654', '41780775906',
       '41778329111', '41783067184', '41782721889', '41781632703',
       '41783780618', '41783873395', '41783998100', '41783931503',
       '41782490708', '41778620781', '41776593233', '41783231988',
       '41782256463', '41783528314', '41782914027', '41784027619',
       '41781822829', '41784004699', '41783211341', '41784033505',
       '41782545928', '41784051958', '41781766311', '41783040125',
       '41783951875', '41784068580', '41783813820', '41783067755',
       '41783016716', '41784060487', '41783803363', '41782531020',
       '41781388743', '41785977098', '41786030848', '41786287968',
       '41784805290', '41786552267', '41786879966', '41786460175',
       '41786610058', '41785551493', '41786710599', '41786958316',
       '41781724264', '41785445012', '41786594197', '41785477465',
       '41786482621', '41784728916', '41786163574', '41785240433',
       '41784798439', '41786406137', '41786330557', '41787005790',
       '41786634121', '41786210955', '41784198119', '41786024295',
       '41785069315', '41782349052', '41786708909', '41788240277',
       '41788955033', '41789046308', '41784596066', '41788063455',
       '41787694599', '41789136771', '41787403317', '41787409226',
       '41789241747', '41787555666', '41787430932', '41787309404',
       '41788910204', '41787568748', '41789414846', '41788177940',
       '41789528530', '41789382342', '41789803654', '41788514458',
       '41784831727', '41787377624', '41787828042', '41789205824',
       '41789308552', '41789288899', '41789701434', '41787553674',
       '41787681573', '41789442389', '41789190629', '41780044631',
       '41789895907', '41788809900', '41789122350', '41788438919',
       '41787977304', '41788642761', '41789281426', '41791796789',
       '41791686344', '41790988049', '41787229497', '41790708372',
       '41791150645', '41790453941', '41791020142', '41790384927',
       '41790434960', '41791900221', '41791780863', '41792045890',
       '41789979877', '41790213389', '41792328962', '41791367184',
       '41791135752', '41792275060', '41791890035', '41792546856',
       '41791884595', '41790134693', '41792095927', '41790458720',
       '41791526022', '41792143565', '41791680878', '41790832413',
       '41792463288', '41791972322', '41791084950', '41791591750',
       '41792018279', '41791891437', '41790340322', '41792490749',
       '41791949185', '41792273084', '41792942400', '41793195303',
       '41793116161', '41793560497', '41793420765', '41793390721',
       '41792995107', '41792853373', '41794017254', '41792829460',
       '41794146341', '41794097400', '41793917806', '41793085795',
       '41793153713', '41793479285', '41793672321', '41794163188',
       '41792913806', '41795638686', '41796745322', '41796518007',
       '41796793000', '41795214845', '41796220240', '41796073319',
       '41796781702', '41795312941', '41797871757', '41797732193',
       '41796831262', '41798441839', '41792712332', '41794174553',
       '41798690031', '41798308119', '41798875026', '41798261237',
       '41796142562', '41794298123', '41798116617', '41798838185',
       '41798387675', '41794457006', '41797766954', '41798516007',
       '41797807112', '41797868790', '41797073652', '41798109141',
       '41797925241', '41798587922', '41798206365', '41795797834',
       '41798921136', '41798844409', '41797860445', '41798137866',
       '41798816124', '41794976940', '41795115092', '41794826346',
       '41798335167', '41797220545', '41797338131', '41798519643',
       '41798503487', '41796975986', '41796122923', '41797229414',
       '41799541385', '41800565067', '41801544241', '41800619941',
       '41800606765', '41801602923', '41800814367', '41799433986',
       '41800528875', '41798885157', '41799587807', '41800708489',
       '41799422642', '41801323370', '41799993602', '41800526158',
       '41801058190', '41799946619', '41800698887', '41801171856',
       '41801569361', '41800715567', '41800154420'], dtype=object)

Also I tried without luck:

newdf = dfn.drop_duplicates(
  subset = ['Nro Envio ML', '# Externo'],
  keep = 'last').reset_index(drop = True)
newdf

Solution

  • I think you were quite close. Does this achieve what you're trying to do?

    dfn['Externo'][~dfn['Externo'].isin(dfn['Nro Envio ML'])].dropna().tolist()
    

    It returns all non-NaN values in the 'Externo' column that are not in the 'Nro Envio ML' column as a list.

    I think the IndexingError you received may have been due to using df instead of dfn in the last line of your code, but I'm not 100% sure because df is not defined in your snippet.