Search code examples
pythonpandascsvopencsv

Python Pandas CSV - reading


i've a problem, i have to read a CSV file and take the value from each rows.

in example

Name Surname Sex Date
Franco Puppi Male 01/01/2022
Max   Pezzali Male 03/4/2022
Fuffi Fuffi  female 03/8/202

the content above is my csv file composed, i want to proceed in reading this kind of CSV file, processing each column alone. In example

dfin = pd.read_csv(var_an.csv)
for index1 in dfin.iterrows():

Name = 
Surname = 
Sex = 
Date = 

how you would extract that one? i tried with str(dfin["Name"]), but i got the error that should be integer value inside the tuple, i then changed the "Name" with 0,1,2 but at the first column says that it's ouf of the index. What i'm wrong? i had and easy success with xlsx file.

def analytics(var_an):
    from termcolor import colored, cprint
    import pandas as pd
    dfin = pd.read_csv(var_an)
    for index1 in dfin.iterrows():
        print(index1)
        cprint(f'Found on file : {var_an}', 'red')
       # cprint(f'Obd = {obd} | pallet = {pallet} | loggerid = {loggerid} | system_date = {system_date} | system_time = {system_time} | house = {house} | hub = {hub}', 'on_green')

when i did this above it extract the entire row, but i can't manage it each file alone like

Name = 
Surname = 
Sex =

Solution

  • That's not a CSV which expects comma-separated values in each line. When you used read_csv, you got a table with a single column named "Name Surname Sex Date". Turning your fragments into a running script

    import pandas as pd
    import io
    
    the_file = io.StringIO("""Name Surname Sex Date
    Franco Puppi Male 01/01/2022
    Max   Pezzali Male 03/4/2022
    Fuffi Fuffi  female 03/8/202""")
    
    dfin = pd.read_csv(the_file)
    print(dfin.columns)
    

    outputs

    1 columns: Index(['Name Surname Sex Date'], dtype='object')
    

    So, the file didn't parse correctly. You can change the separator from a comma to a regular expression and use all whitespace as column separators and you'll get the right values for this sample data

    import pandas as pd
    import io
    
    the_file = io.StringIO("""Name Surname Sex Date
    Franco Puppi Male 01/01/2022
    Max   Pezzali Male 03/4/2022
    Fuffi Fuffi  female 03/8/202""")
    
    dfin = pd.read_csv(the_file, sep=r"\s+")
    print(dfin.columns)
    
    for i, row in dfin.iterrows():
        print(f"====\nRow {i}:\n{row}")
        Name = row["Name"]
        Surname = row["Surname"]
        Sex = row["Sex"]
        Date = row["Date"]
        print("Extracted:", Name, Surname, Sex, Date)
    

    This gets the right stuff:

    Index(['Name', 'Surname', 'Sex', 'Date'], dtype='object')
    ====
    Row 0:
    Name           Franco
    Surname         Puppi
    Sex              Male
    Date       01/01/2022
    Name: 0, dtype: object
    Extracted: Franco Puppi Male 01/01/2022
    ====
    Row 1:
    Name             Max
    Surname      Pezzali
    Sex             Male
    Date       03/4/2022
    Name: 1, dtype: object
    Extracted: Max Pezzali Male 03/4/2022
    ====
    Row 2:
    Name          Fuffi
    Surname       Fuffi
    Sex          female
    Date       03/8/202
    

    Kinda good. But there is still a huge problem. What if one of these people have a space in their name? Pandas would split each part of the name into a separate column and the parsing would fail. You need a better file format than what you've been given.