Search code examples
pythonpandasfor-loopstatisticsdata-analysis

Why does the program return different value when I order the list differently?


I'm trying to learn how to analyze large data better, and I wanted to make a program where by inputting a CSV of keywords you can look for the occurrence of each in a second data csv. I setup this code as an example, I created a list of keywords but when I switch the order of the first word the occurrence it returns is incorrect. For example when "matlab" is first it returns 97 which is right. but when I put either of the other words first it returns 0. It doesn't make sense to me because in my head it is iterating over the data set csv for every single word in the list, and checking. Could I get some help and clarification.

I've tried putting a print statement after first for loop and it is iterating over each word, confused as to why its not executing the later parts correctly.

import csv 
from pandas import *
import pandas as pd
from array import array
import csv
keywords = read_csv("Book1.csv")
with open('ss.csv','r') as csvfile: 
    reader = csv.DictReader(csvfile) 
    list=["matlab","souren","deez"]
    Attended=0
    no_show=0
    Registered=0
    word = 'matlab'
    nuts=[]


    for x in list:
        for row in reader:
        
            if x in row['one']:
                Registered=Registered+1
                

    print(Registered)

             
        



EDIT:

import csv 

import pandas as pd
from array import array
import csv
keywords = pd.read_csv("Book2.csv")
biomedical = keywords['Biomedical'].tolist()
Registered=0
counts = dict.fromkeys(biomedical, 0)


with open('ss.csv','r') as csvfile: 
    reader = csv.DictReader(csvfile) 
    lines=list(reader)
  
pd.set_option("display.max_rows", None)  
df = pd.read_csv('ss.csv')


ss=df.stack().value_counts()
            
print(ss)


##print(ss)
#for x in biomedical:
#    print(x)

            

<br>

solidworks, microsoft office, lua","Python, MATLAB, C++, HTML, CSS, Javascript, C,

<br>

SOLIDWORKS, Microsoft Office, LUA","I would like to further my experience in SOLIDWORKS,

<br>

my extra personal skills, and develop my coding skills.",Female,,Yes,No,Yes,No,"Mississauga, Canada",No,No,Terese Kattar,Student was approved to submit Fall 2022 application,Yes,,,10/31/2022,11/14/2022,,,,,,,
Mechanical Engineering,2022 - Fall,Application Accepted (Final Status),11/14/2022,BaoAnh Le,yes,10/31/2022,Lauren,Sena,Shirley,Dacanay,,,Melissa,tkattar@ryerson.ca,(647) 518-3977,,Mechanical Engineering,,,Mississauga,L5M 6N3,Female,Canadian,In Canada,N/A,false,Active,Uploaded,2.7,No,,No,,No,No,"Accommodation and food services, Administrative and support, waste management and remediation services, Construction, Financial services and insurance, Health care and social assistance, Information and cultural industries, Management of companies and enterprises, Manufacturing, Mining, quarrying, and oil and gas extraction, Other services (except public administration), Professional, scientific and technical services, Public administration, Real estate and rental and leasing, Retail Trade, Transportation and warehousing, Wholesale trade, Educational Services",Yes,Yes,"expert in c and javascript.

<br>

can proficiently use matlab.

<br>

skillful in microsoft word, office and excel.

<br>

knows the basic of vue and vuetify.

<br>

expert in html and css.

<br>

expert in google docs, slides, sheets.

<br>

skillful in cad software, such as: fusion 360 and solidworks..

<br>```


Solution

  • Using Mustafa's first suggestion, I did it using for loops however there is a much better way of doing it with pandas which I didn't spend time to figure out.

    import csv 
    from pandas import *
    import pandas as pd
    from array import array
    import csv
    keywords = read_csv("input1.csv")
    biomedical = keywords['Biomedical'].tolist()
    Registered=0
    counts = dict.fromkeys(biomedical, 0)
    
    
    with open('ss2.csv','r') as csvfile: 
        reader = csv.DictReader(csvfile) 
        lines=list(reader)
      
    
    for x in biomedical:
        for row in lines:
            for col in row:
                if x in row[col]:
                    #print(row,col,row[col])
                    counts[x]+=1
                    #print(x)
                    
    with open('hh.csv', 'w') as f:
        for key in counts.keys():
            f.write("%s,%s\n"%(key,counts[key]))
                    
    
    print(counts)
    #for x in biomedical:
    #    print(x)