Search code examples
pythonrecommendation-engine

local variable 'index_three' referenced before assignment error while developing a recommendation engine using Python


I'm developing a recommendation engine to recommend items to a local retail chain of stores and I'm using the code that I learnt in developing a movie recommendation system using the movie lens dataset and the code that works for recommending the movies doesn't seem to work now here.

a function to get the correlation between the items in the level three

def get_movie_similarity(level3Id):  
    index_three = list(index_three).index(level3Id)
    return corr_matrixthree[index_three]

a function to get items similar to the itmes purchased most by the user by arranging the itmes similar to the ones the user purchased in ascending order based on their pearson coreation score

    def get_movie_recommendations(merged):  
    movie_similarities = np.zeros(corr_matrixthree.shape[0])
    for level3Id in merged:
        movie_similarities = movie_similarities + get_movie_similarity(level3Id)
    similarities_df = pd.DataFrame({'level3Id': index_three,'sum_similarity': movie_similarities})
    similarities_df = similarities_df[~(similarities_df.level3Id.isin(merged))]
    similarities_df = similarities_df.sort_values(by=['sum_similarity'], ascending=False)
    return similarities_df`

the similarity matrix i generated is between the users and the items they have purchased with the values being the amount thy have spent on each of the item.

sample_user = 42140122376
merged[merged.cust_id==sample_user].sort_values(by=['amount_extended'], ascending=False)


sample_user_movies = merged[merged.cust_id==sample_user].level3Id.tolist()  
recommendations = get_movie_recommendations(sample_user_movies)

#We get the top 20 recommended movies
recommendations.level3Id.head(20)

and the error i'm getting is :

local variable 'index_three' referenced before assignment

Index_three is the index of all the items in the dataset whereas, corr_matrix three is the matrix of similarity between itmes generated using the pearson's score. merged is my dataset

Can you please help me out?

i can share the code i've in the jupyter notebook!


Solution

  • For this you need to understand how variable scope works. Take a look at this!

    def my_func():
        index3 =5000
        print(index3)
    
    index3=10;
    print(index3)
    my_func()
    

    output:

    10
    5000
    

    Note: Even though there are two index3 you might think they are the same. But they are NOT

    The index3 within the my_func is a local variable. While the one in your program (the one not in the function) index3 is different!. So what happens in the above code is that first print(index3) prints the index3 in my code (not in any functions..just in my program) then my_func() gets called and print(index3) within my_func() prints the local variable index3

    Take a look at this!

    def my_func():
        print(index3)
    
    index3=10;
    print(index3)
    my_func()
    

    output:

    10
    10
    

    See now both times the index3 which is same 10 this means it prints the global variable two times.

    Now comes your problem!:

    def my_func():
        index3 =index3+1
    
    index3=10;
    print(index3)
    my_func()
    

    output:

    10
    Traceback (most recent call last):
      File "/home/mr/func.py", line 6, in <module>
        my_func()
      File "/home/mr/func.py", line 2, in my_func
        index3 =index3+1
    UnboundLocalError: local variable 'index3' referenced before assignment
    

    Why?

    Because of this index3 =index3+1 So the moment it sees a index3= it creates a local variable. So index3=0 means assign 0 to local variable.

    However index3 =index3+1 would confuse it! It thinks

    Wait you want me to assign local variable index3 as local variable index3+1 ? But you haven't even declared it yet!

    def my_func():
        global index3
        index3 =index3+1
        print(index3)
    
    index3=10
    print(index3)
    my_func()
    print(index3)
    

    output:

    10
    11
    11
    

    Now it takes the global value within the function and it changes. So index3 is changed by the function.

    NOTE: Using global variables is bad coding practice.

    def getIndex3():
        return index3
    
    def my_func():
        index3 = getIndex3()
        index3 =index3+1
        print(index3)
    
    index3=10
    print(index3)
    my_func()
    print(index3)
    

    Now output:

    10
    11
    10
    

    You get the difference right? That's why your program shows that error. That's exactly what this means local variable 'index_three' referenced before assignment