Search code examples
python-3.xcsvgoogle-sheetsgoogle-api-python-clientgspread

How do I replace new lines characters in CSV files downloaded using the Google sheets API?


I'm using Python 3.9 and the Google Sheets SDK for Python v 3.6.0. I want to download a Google sheet as a CSV and I would like to replace any new line characters that occur in the cell data with nothing. I tried the below

    client = gspread.authorize(creds)
    sheet = client.open('ChiCommons_Directory')

    # get the third sheet of the Spreadsheet.  This
    # contains the data we want
    sheet_instance = sheet.get_worksheet(3)

    url = 'https://docs.google.com/spreadsheets/d/' + sheet.id + '/gviz/tq?tqx=out:csv&gid=' + str(sheet_instance.id)
    headers = {'Authorization': 'Bearer ' + client.auth.token}
    res = requests.get(url, headers=headers)
    output = re.sub(r'[\n\r]', '', res.text)
    print(output)

This doesn't seem to be replacing anything. Specifically the line output = re.sub(r'[\n\r]', '', res.text) does not seem to be doing its job. However, I only want to replace new lines in teh cell data, not the new lines that divide each row of data. Not sure the best way to do that.


Solution

  • I believe your goal as follows.

    • You want to replace the line breaks in each cell.
    • You don't want to replace the line breaks for each row in the CSV data.

    Modification points:

    • When I tested your script, the line breaks for each row are also removed. So, in this case, I would like to propose the following flow.
      1. Parse the CSV data and convert it to a list.
      2. Remove the line breaks in each cell.
      3. Convert the list to the CSV data.

    When above points are reflected to your script, it becomes as follows.

    Modified script:

    Please modify your script as follows.

    From:
    output = re.sub(r'[\n\r]', '', res.text)
    print(output)
    
    To:
    ar = csv.reader(io.StringIO(res.text, newline=""))
    output = "\n".join([",".join(map(str, ['"' + c.replace('\n', '') + '"' for c in r])) for r in ar])
    # or, output = "\n".join([",".join(map(str, ['"' + re.sub(r'[\n\r]', '', c) + '"' for c in r])) for r in ar])
    print(output)
    
    • In this case, please use import csv and import io.

    • If you don't want to add " for each cell, please modify as follows.

        output = "\n".join([",".join(map(str, [c.replace('\n', '') for c in r])) for r in ar])