Why does csv.reader with TextIOWrapper include new line characters?

I have two functions, one downloads individual csv files and the other downloads a zip with multiple csv files.

The download_and_process_csv function works correctly with response.iter_lines() which seems to delete new line characters.

'Chicken, water, cornmeal, salt, dextrose, sugar, sodium phosphate, sodium erythorbate, sodium nitrite. Produced in a facility where allergens are present such as eggs, milk, soy, wheat, mustard, gluten, oats, dairy.'

The download_and_process_zip function seems to include new line characters for some reason (\n\n). I've tried newline='' in io.TextIOWrapper however it just replaces it with \r\n.

'Chicken, water, cornmeal, salt, dextrose, sugar, sodium phosphate, sodium erythorbate, sodium nitrite. \n\nProduced in a facility where allergens are present such as eggs, milk, soy, wheat, mustard, gluten, oats, dairy.'

Is there a way to modify download_and_process_zip so that new line characters are excluded/replaced or do I have to iterate over all the rows and manually replace the characters?

@request_exceptions
def download_and_process_csv(client, url, model_class):
    with closing(client.get(url, stream=True)) as response:
        response.raise_for_status()
        response.encoding = 'utf-8'
        reader = csv.reader(response.iter_lines(decode_unicode=True))
        process_copy_from_csv(model_class, reader)


@request_exceptions
def download_and_process_zip(client, url):
    with closing(client.get(url, stream=True)) as response:
        response.raise_for_status()

        with io.BytesIO(response.content) as buffer:
            with zipfile.ZipFile(buffer, 'r') as z:
                for filename in z.namelist():
                    base_filename, file_extension = os.path.splitext(filename)
                    model_class = apps.get_model(base_filename)
                    if file_extension == '.csv':
                        with z.open(filename) as csv_file:
                            reader = csv.reader(io.TextIOWrapper(
                                csv_file,
                                encoding='utf-8',
                                # newline='',
                            ))
                            process_copy_from_csv(model_class, reader)

Solution

I've played around with a mock server which serves this CSV file:

"foo
bar"

The CSV has a single field, "foo\nbar", in a single row. I call a newline in the data an embedded newline.

When I use the iter_content method on the Response object:

print("Getting CSV")
resp = requests.get("http://localhost:8999/csv")
x = resp.iter_content(decode_unicode=True)

reader = csv.reader(x)
for row in reader:
    print(row)

I get the correct output, a single row prints out with a single field of data:

Getting CSV
['foo\nbar']

If I change iter_content to iter_lines, I get the wrong output:

Getting CSV
['foobar']

I suspect, based on the name, that iter_lines looks for any newline-like character sequence and stops there, before handing the line to the csv reader (without the newline), and so the embedded newline is effectively removed. I cannot speak for your result where the newline appeared to be replaced with a space... there's no replacement going on, just effectively deleting.

This popular SO, Use python requests to download CSV, asks the general question about downloading a CSV with the requests module, but every answer seems tailored to the fact that the CSV in question doesn't contain embedded newlines, and so there are a lot of answers with iter_lines. I don't know when iter_content() was added to requests, but no answer makes mention of it.