Search code examples
python-3.xpandassimple-salesforce

Read large Salesforce query into pandas quickly


Using the simple_salesforce connector my query returned c.150k records and the following way of reading the data into a dataframe was taking so long that I just went into SF, ran a report, downloaded it and read it to pandas. Is there a quicker way? Thanks

import pandas as pd
from simple_salesforce import Salesforce

fields = ['field' + str(i) for i in range(1, 10)]
fields_str = ", ".join(fields)
query_str = "select {} from account".format(fields_str)

sf = Salesforce(username= myusername, password= mypwd, security_token = mytoken)
df = sf.query_all(query_str)

sf_df = pd.DataFrame(columns = fields)

for account in range(df['totalSize']):
     account_dict = {}
     for field in fields:
         account_dict[field] = df['records'][account][field]
     dict_df = pd.DataFrame.from_dict([account_dict])
     sf_df = sf_df.append(dict_df, sort=False)
     del(account_dict)

Solution

  • You can pull records directly by using the ['records'] key.

    df = sf.query_all('SELECT ID, CreatedDate FROM Account LIMIT 10')['records']
    df = pd.DataFrame(df)
    df
    

    or as a single code line:

    df = pd.DataFrame(sf.query_all('SELECT ID, Createddate FROM Account LIMIT 10')['records'])
    df
    

    if the attributes column does not contain data you want to view, you can use .drop(columns=['attributes'] to remove it from the returned dataframe.

    df = sf.query_all('SELECT ID, CreatedDate FROM Account LIMIT 10')['records']
    df = pd.DataFrame(df)
    df.drop(columns=['attributes'],inplace=True)
    df
    

    or as a single code line:

    df = pd.DataFrame(sf.query_all('SELECT ID, Createddate FROM Account LIMIT 10')['records']).drop(columns=['attributes'])
    df