Search code examples
pythoncsvtextexport-to-csvdata-wrangling

How do I output results to a .csv file in Python?


I am new to Python and would like to write a script that that takes a .txt file as input and outputs the results to a .csv file.

The .txt files look as follows

text:eub1
region:euboea
μενανδρεσεμεεποισε

I would like to write a script that creates a new row for each instance of μ or ν in the third line above. I also want each row to contain the text and region identifier. So the result should look like this:

text,region,letter  
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,μ

I don't really know where to start with the coding, so I'd be grateful for any advice on how to do this.


Solution

  • Try:

    import pandas as pd
    
    data = {}
    with open("your_file.txt", "r") as f_in:
        for line in map(str.strip, f_in):
            if line == "":
                continue
            if line.startswith("text:"):
                data["text"] = line.split(":", maxsplit=1)[-1]
            elif line.startswith("region:"):
                data["region"] = line.split(":", maxsplit=1)[-1]
            else:
                data["letter"] = [ch for ch in line if ch in "μν"]
    
    df = pd.DataFrame(data)
    print(df)
    
    df.to_csv("data.csv", index=False)
    

    Prints:

       text  region letter
    0  eub1  euboea      μ
    1  eub1  euboea      ν
    2  eub1  euboea      ν
    3  eub1  euboea      μ
    

    and saves data.csv:

    text,region,letter
    eub1,euboea,μ
    eub1,euboea,ν
    eub1,euboea,ν
    eub1,euboea,μ
    

    Content of your_file.txt:

    text:eub1
    region:euboea
    μενανδρεσεμεεποισε
    

    EDIT: To load from this file:

    text:eub1
    region:euboea
    μενανδρεσεμεεποισε
    text:eub2
    region:xxx
    μμμ
    text:eub3
    region:zzz
    abc
    

    you can try:

    import pandas as pd
    
    data = {}
    with open("your_file.txt", "r") as f_in:
        for line in map(str.strip, f_in):
            if line == "":
                continue
            if line.startswith("text:"):
                data.setdefault("text", []).append(line.split(":", maxsplit=1)[-1])
            elif line.startswith("region:"):
                data.setdefault("region", []).append(
                    line.split(":", maxsplit=1)[-1]
                )
            else:
                data.setdefault("letter", []).append(
                    [ch for ch in line if ch in "μν"]
                )
    
    df = pd.DataFrame(data).explode("letter")
    print(df)
    
    df.to_csv("data.csv", index=False)
    

    Prints:

       text  region letter
    0  eub1  euboea      μ
    0  eub1  euboea      ν
    0  eub1  euboea      ν
    0  eub1  euboea      μ
    1  eub2     xxx      μ
    1  eub2     xxx      μ
    1  eub2     xxx      μ
    2  eub3     zzz    NaN
    

    and saves data.csv:

    text,region,letter
    eub1,euboea,μ
    eub1,euboea,ν
    eub1,euboea,ν
    eub1,euboea,μ
    eub2,xxx,μ
    eub2,xxx,μ
    eub2,xxx,μ
    eub3,zzz,