Search code examples
pythonpandascsvgoogle-sheets-apigspread

Create a csv file from gspread skipping some rows with pandas


I'm creating a csv file to iterate it and send emails using Python. Code works for me, but I think I'm doing too much coding just to get data from Google SpreadSheet, skip some rows, and iterate it.

Here is my code:

import csv
import pandas as pd
import smtplib
from settings_djg import SENDER_EMAIL, SENDER_PASS  # Email settings file
from datetime import date, datetime
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from email.message import EmailMessage
from itertools import islice

scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)

sheet = client.open("Hosting_Domains").sheet1
list_of_hashes = sheet.get_all_values() 

new_list = pd.DataFrame(list_of_hashes)

# Here I'm using .iloc to skip some rows
new_list.iloc[[0,2,3,4,5,6,7,13,14,15,16,17,18,19,20]].to_csv('my_csv.csv', index=False, header=False) 

msg = EmailMessage()
now = date.today()

# Here starts iteration
with open('my_csv.csv') as csv_file:   
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader)

    for row in csv_reader:

Data from Google SpreadSheet is like this:

X,DOMAIN,EXPIRE,OWNER,REGISTRANT,HOSTING
 ,,,CUSTOMERS,,,
B,company.com,2021-02-13,Customer,SITEGROUND,SITEGROUND
D,companyb.com,2021-05-17,Customer,GODDADY,GODDADY
E,companyc.com,2021-09-19,Customer,GODDADY,GODDADY
D,companye.com,2021-01-23,Customer,GODDADY,GODDADY

Solution

  • I believe your goal as follows.

    • You want to retrieve the values from Google Spreadsheet as CSV data by selecting rows using gspread with python.
      • In this case, you don't want to use iloc and save it as a file.
    • As a sample, you want to retrieve the rows from [0,2,3,4,5,6,7,13,14,15,16,17,18,19,20] of a sheet.
      • In this case, 0 is included. So I thought that 0 is the 1st row.

    Modification point:

    • When [0,2,3,4,5,6,7,13,14,15,16,17,18,19,20] is the index of array of list_of_hashes, I think that the required rows can be retrieved using [0,2,3,4,5,6,7,13,14,15,16,17,18,19,20].

    When this point is reflected to your script, it becomes as follows.

    Modified script:

    list_of_hashes = sheet.get_all_values()
    values = [list_of_hashes[e] for e in [0, 2, 3, 4, 5, 6, 7, 13, 14, 15, 16, 17, 18, 19, 20]]
    df = pd.DataFrame(values)
    
    • values is the array retrieved the required rows.
    • df is the dataframe converted from values.