Search code examples
pythonpandasseriesvalueerror

Compare Misaligned Series columns Pandas


Comparing 2 series objects of different sizes:

IN[248]:df['Series value 1']
Out[249]: 
0     70
1     66.5
2     68
3     60
4     100
5     12
Name: Stu_perc, dtype: int64

IN[250]:benchmark_value 
#benchamrk is a subset of data from df2 only based on certain filters
Out[251]: 
0    70
Name: Stu_perc, dtype: int64

Basically I wish to compare df['Series value 1'] with benchmark_value and return the values which are greater than 95% of benchark value in a column Matching list. Type of both of these is Pandas series. However sizes are different for both, hence it is not comparing.

Input given:

IN[252]:df['Matching list']=(df2['Series value 1']>=0.95*benchmark_value)
OUT[253]: ValueError: Can only compare identically-labeled Series objects

Output wanted:

 [IN]:
 df['Matching list']=(df2['Stu_perc']>=0.95*benchmark_value)
 #0.95*Benchmark value is 66.5 in this case.

 df['Matching list']
 [OUT]:
0     70
1     66.5
2     68
3     NULL
4     100
5     NULL

Solution

  • Because benchmark_value is Series, for scalar need select first value of Series by Series.iat and set NaNs by Series.where:

    benchmark_value = pd.Series([70], index=[0])
    
    val = benchmark_value.iat[0]
    df2['Matching list']= df2['Stu_perc'].where(df2['Stu_perc']>=0.95*val)
    print (df2)
         Stu_perc Matching list
    0       70.0           70.0
    1       66.5           66.5
    2       68.0           68.0
    3       60.0            NaN
    4      100.0          100.0
    5       12.0            NaN
    

    General solution also working if benchmark_value is empty is next with iter for return first value of Series and if not exist use default value - here 0:

    benchmark_value = pd.Series([])
    
    val = next(iter(benchmark_value), 0)
    df2['Matching list']= df2['Stu_perc'].where(df2['Stu_perc']>=0.95*val)
    print (df2)
        Stu_perc  Matching list
    0       70.0           70.0
    1       66.5           66.5
    2       68.0           68.0
    3       60.0           60.0
    4      100.0          100.0
    5       12.0           12.0