Search code examples
pythonpandascsvfilefopen

Python Pandas iterating values list and writing to seperate text files


I am working with a CSV dataset that looks like below:

                  img_id                 obj_id   xcen        ycen       width       height
0   94a69b66-23f0-11e9-a78e-2f2b7983ac0d    0   0.377734    0.091667    0.071094    0.183333
1   94a6a3a4-23f0-11e9-a78f-ebd9c88ef3e8    0   0.375781    0.090972    0.075000    0.181944
2   94a6a430-23f0-11e9-a790-2b5f72f1667a    0   0.378516    0.091667    0.069531    0.183333
3   94a6a48a-23f0-11e9-a791-fb958b6ab6b3    0   0.391406    0.106944    0.076563    0.213889
4   94a6a4da-23f0-11e9-a792-f320b734bd9b    0   0.395313    0.106250    0.068750    0.212500
5   94a6a534-23f0-11e9-a793-c7e8fecc9fa8    0   0.362109    0.127778    0.105469    0.225000

What I am trying to do is to write each row to a seperate text file, each column value separated by a comma on one line.

I am only dropping img_id from being written inside the text file because I am using img_id for naming the individual text files.

I have been trying different methods but I am having issues getting each row written to its respective text file. I have successfully gotten each individual text file to be named by its img_id.

An example would be that the first img_id text file would contain something like this

0, 0.377734, 0.091667, 0.071094, 0.183333

Currently, I am trying iterate on one column and instead of each row going into the respective text file, it takes the entire list that I got from using the .values() method and puts into each text file like this

[0,0,0,0,0,0.......0,0,0,0,0,0]

Also some of the img_ids are the same so I want to prevent overwriting a txt file with another txt file of the same name when my code creates it and instead if there is more than one img_id then instead of creating another textfile and (I assume) overwrite the previous text file with the same img_id, it adds that row to the text file so now there are 2 lines like this:

Contents of 94a6a54-23f0-11e9-a793-c7e8fecc9fa8.txt

0, 0.362109, 0.127778, 0.105469, 0.225000
0, 0.175781, 0.283642, 0.210913, 0.293922

Here is the code that I am currently working with.

file = '{}.txt'
a = df['img_id'].values
b = df['object_class'].values
c = df['xcen'].values
d = df['ycen'].values
e = df['width'].values
f = df['height'].values
b = str(b)
c = str(c)
d = str(d)
e = str(e)
f = str(f)
for x in a:
    with open(file.format(x), 'w') as f:
        for i in b:
            f.write(i)

Solution

  • Sample data:

    >>> df
                                      img_id  object_class      xcen      ycen     width    height
    0   b192cbd4-7958-4a82-8f90-42217076a66c             4  0.211284  0.428579  0.287383  0.683370
    1   b192cbd4-7958-4a82-8f90-42217076a66c             2  0.840717  0.040433  0.192738  0.545159
    2   9d452f25-aa60-4fe1-9165-a1a8a981a372             2  0.840717  0.040433  0.192738  0.545159
    3   3fa5d0d9-c781-40ad-a8f5-a1eae7d51b98             9  0.741793  0.098438  0.707242  0.102758
    4   706ad967-11a6-4e6f-85bc-24bc204597f4             4  0.786071  0.735364  0.661866  0.453724
    5   1b577e42-d037-4f7b-918e-1c7e6cc7e7a1            17  0.513458  0.012236  0.856802  0.894129
    6   c4c16c64-30cd-450b-ab08-543c4818f1f3            13  0.625725  0.765523  0.007714  0.678993
    7   329b908e-ce41-4fd1-b671-b20909c3b31d            10  0.784206  0.831250  0.728761  0.809600
    8   fd83b03c-2a84-4cb3-834d-714167475104             7  0.508803  0.137691  0.290492  0.206802
    9   6ce64fd0-ca9b-47e8-ae1d-049a87468197            13  0.919442  0.168500  0.995826  0.250895
    10  66ddffca-fdea-444f-ae79-d4ee284b9385            12  0.920211  0.803805  0.360863  0.866571
    

    Export files:

    for filename, data in df.groupby("img_id"):
        data.drop(columns="img_id").to_csv(f"{filename}.txt", header=None, index=None)