Search code examples
pythonpython-3.xpandasencoding

Store 2 different encoded data in a file in python


I have 2 types of encoded data

  1. ibm037 encoded - a single delimiter variable - value is @@@
  2. UTF8 encoded - a pandas dataframe with 100s of columns.

Example dataframe:

Date Time

1    2

My goal is to write this data into a python file. The format should be:

@@@  1    2

In this way I need to have all the rows of the dataframe in a python file where the 1st character for every line is @@@.

I tried to store this character at the first location in the pandas dataframe as a new column and then write to the file but it throws error saying that two different encodings can't be written to a file.

Tried another way to write it:

df_orig_data = pandas dataframe, Record_Header = encoded delimiter

    f = open("_All_DelimiterOfRecord.txt", "a")
        for row in df_orig_data.itertuples(index=False):
            f.write(Record_Header)
            f.write(str(row))
    f.close()

It also doesn't work.

Is this kind of data write even possible? How can I write these 2 encoded data in 1 file?

Edit:

StringData = StringIO(
    """Date,Time
1,2
1,2
"""
)

df_orig_data = pd.read_csv(StringData, sep=",")

Record_Header = "2 "
    f = open("_All_DelimiterOfRecord.txt", "a")
    for index, row in df_orig_data.iterrows():
        f.write(
            "\t".join(
                [
                    str(Record_Header.encode("ibm037")),
                    str(row["Date"]),
                    str(row["Time"]),
                ]
            )
        )
    f.close()

Solution

  • I would suggest doing the encoding yourself, and writing a bytes object to the file. This isn't a situation where you can rely on the built-in encoding do it.

    That means that the program opens the file in binary mode (ab), all of the constants are byte-strings, and it works with byte-strings whenever possible.

    The question doesn't say, but I assumed you probably wanted a UTF8 newline after each line, rather than an IBM newline.

    I also replaced the file handling with a context manager, since that makes it impossible to forget to close a file after you're done.

    import io
    import pandas as pd
    StringData = io.StringIO(
        """Date,Time
    1,2
    1,2
    """
    )
    
    df_orig_data = pd.read_csv(StringData, sep=",")
    Record_Header = "2 "
    
    
    with open("_All_DelimiterOfRecord.txt", "ab") as f:
        for index, row in df_orig_data.iterrows():
            f.write(Record_Header.encode("ibm037"))
            row_bytes = [str(cell).encode('utf8') for cell in row]
            f.write(b'\t'.join(row_bytes))
            # Note: this is an UTF8 newline, not an IBM newline.
            f.write(b'\n')