Search code examples
pythonpandastextutc

python panda help from text file to custom format


I am looking for help in python where I can convert the following into columns.

Data in text file:

---- [ Job Information : 2926 ] ----
Name                : Run26
User                : abc
Account             : xyz
Partition           : q_24hrs
Nodes               : node3
Cores               : 36
State               : COMPLETED
ExitCode            : 0:0
Submit              : 2020-12-15T10:23:22
Start               : 2020-12-15T10:23:22
End                 : 2020-12-15T14:13:50
Waited              :   00:00:00
Reserved walltime   : 1-00:00:00
Used walltime       :   03:50:28
Used CPU time       :   00:00:00

Required output:- [ keeping this header contant ]

Job id,Name,User,Account,Partition,Nodes,Cores
2926,abc,xyz,q_24hrs,node3,36

Thank in advance....


Solution

  • You can use this example how to parse the text file using re module:

    import re
    
    with open("your_file.txt", "r") as f_in:
        data = f_in.read()
    
    job_ids = re.findall(r"Job Information : (\d+)", data)
    names = re.findall(r"Name\s*:\s*(.*)", data)
    users = re.findall(r"User\s*:\s*(.*)", data)
    accounts = re.findall(r"Account\s*:\s*(.*)", data)
    partitions = re.findall(r"Partition\s*:\s*(.*)", data)
    nodes = re.findall(r"Nodes\s*:\s*(.*)", data)
    cores = re.findall(r"Cores\s*:\s*(.*)", data)
    
    df = pd.DataFrame(
        zip(job_ids, names, users, accounts, partitions, nodes, cores),
        columns=[
            "Job id",
            "Name",
            "User",
            "Account",
            "Partition",
            "Nodes",
            "Cores",
        ],
    )
    print(df)
    df.to_csv("data.csv", index=False)
    

    Creates data.csv:

    Job id,Name,User,Account,Partition,Nodes,Cores
    2926,Run26,abc,xyz,q_24hrs,node3,36