Search code examples
pythonpandasdataframesalesforceordereddict

Pandas dataframe OrderedDict extract data


I've a Database.csv file with one column and 3 rows, those are data exported from salesforce with simple-salesforce, I try to get the 'Name' Value from the OrderedDict cell data('Name', 'Demand').

Dataframe

Type__c
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
    OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')])

with below code i read the csv file and for each row i am assigning the value that contains "OrderedDict([....) to name, and i want to print the "Name" field value,

import pandas as pd
from collections import OrderedDict
df = pd.read_csv('Database.csv')
for index, row in df.iterrows():
    name = df._get_value(index, 'Type__c')
    base=OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234500000001erAAA')])), ('Name', 'Private')])
    print("\n",name['Name'])
    print(base['Name'])

Results:

.
print("\n",name['Name'])
    TypeError: string indices must be integers

when i change print("\n",name['Name']) to print("\n",name) to test the manual added OrderedDict, i can see it works

import pandas as pd
from collections import OrderedDict
df = pd.read_csv('Database.csv')
for index, row in df.iterrows():
    name = df._get_value(index, 'Type__c')
    test=OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234500000001erAAA')])), ('Name', 'Private')])
    ***print("\n",name)***
    print(test['Name'])

Result:

 OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
Private

 OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
Private

 OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')])
Private

I've a file with thousands of rows and there must be way to create a new column named "Name" and add the text data

at the End i want to achieve FROM

Type__c
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')])
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')])

TO

Name
Demand
Demand
Stand

Solution

  • Welcome to the SO-community Pamuk!

    Rather than iterating through the rows of a dataframe, it is much more efficient to "apply" a particular function to an entire column (or even subset of a dataframe). That way, pandas will handle performance for you, and it is usually more readable (since you don't have to add any iterating logic).

    Here is how you can obtain that "Name" column with apply and a quick lambda function (only the last line is relevant for you, the rest is for replicating your sample dataframe):

    from collections import OrderedDict
    import pandas as pd
    
    data = [
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')]),
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/1234400000001ddAAA')])), ('Name', 'Demand')]),
        OrderedDict([('attributes', OrderedDict([('type', 'Type__c'), ('url', '/services/data/v42.0/sobjects/Type__c/123430000004C93AAE')])), ('Name', 'Stand')]),
    ]
    
    df = pd.DataFrame({"Type__c": data})
    df["Name"] = df["Type__c"].apply(lambda x: x["Name"])
    

    See Pandas Series.apply since df["Type__c"] docs for more examples and details.

    For more complicated row-wise operations that involve using more columns at once, you can check the docs for DataFrame.apply and apply a function to the dataframe itself (or a subset of it).

    For a more readable part of the docs that is very relevant to what you are doing, check the Pandas User Guide section on Function Application.