Write bytes from savReader using csv.writer without date to float conversion

I want to read data from a .sav (SPSS) file and rewrite it to .csv for further use. For reading I use savReaderWriter.SavReader and it returns all strings in byte notation: b'string' instead of 'string'.

The following is my code in python 3.6:

import savReaderWriter
import csv

with savReaderWriter.SavReader('input_filename.sav') as reader:
    header = reader.header
    with open('output_filename.csv','w',newline='') as output:
        w = csv.writer(output,delimiter=',')
        w.writerow(header)
        for line in reader:
            w.writerow(line)

One solution I've found is to specify ioUtf8=True in SavReader but then all date variables are converted to float: b'2017-09-02' becomes 13723689600.0 which is then read by datetime.fromtimestamp as year 2404.

Another thing that works is

w.writerow([h.decode('utf-8') for h in header])

but only for header, as other rows contain floats and nan-s and hence produce errors.

Specifying 'wb' instead of 'w' in open also returns an error:

TypeError: a bytes-like object is required, not 'str'

Any ideas of how to read and write this kind of data properly?

Solution

I found a temporary solution, although I'm not proud of it. Maybe somebody else can improve it.

import savReaderWriter
import csv

utf_errors = 0
with savReaderWriter.SavReader('input_filename.sav') as reader:
    header = reader.header
    header = [h.decode('utf-8') for h in header]
    with open('output_filename.csv','w',newline='') as output:
        w = csv.writer(output,delimiter=',')
        w.writerow(header)
        for line in reader:
            newline = []
            for l in line:
                try:
                    newline += [l.decode('utf-8')]
                except AttributeError:
                    # for non-string (floats and nan-s)
                    newline += [l]
            try:        
                w.writerow(newline)
            except UnicodeEncodeError:
                    # omit row when an unknown character is found
                    utf_errors += 1
                    pass

read_output = pd.read_csv(path+'output_filename.csv', encoding='latin1')

Strange thing about the data is that however I decode it, there are always symbols that can't be read. I found it the most efficient with .decode('utf-8') (omits 4 lines) compared to .decode('latin1') (omits 29 lines) but then I have to read it with encoding='latin1', otherwise I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 9: invalid continuation byte