Search code examples
pythonpandasdataframedatareader

Using Dictionaries instead of globals()


I am doing a business case on retrieving stock information. The teacher uses the code below to create DataFrames with stock information.

#The tech stocks we'll use for this analysis

tech_list = ['AAPL','GOOG','MSFT','AMZN']

#Set up End and Start times for data grab
end = datetime.now()

start = datetime(end.year - 1,end.month,end.day)

#For loop for grabing yahoo finance data and setting as a dataframe

for stock in tech_list: 
# Set DataFrame as the Stock Ticker
     globals()[stock] = DataReader(stock,'yahoo',start,end)

He uses globals() to create the 4 dataframes with the techstock. I read in the question below that you can also use dictionary to achieve the same goal.

pandas set names of dataframes in loop

MY QUESTION is that i do not understand this line of code in the answer:

frames = {i:dat for i, dat in data.groupby('Sport')}

Can someone explain?


Solution

  • In this case, frames is a dictionary that is being built using a dictionary comprehension. The call data.groupby() is returning a pair of values, which are being called i and dat in the comprehension, and the notation {i:dat for i, dat in ...} is building a new dictionary out of all such pairs, using i as the key and dat as the value. The result is stored in frames.

    The general syntax is (for the case where the iterator returns 2 elements):

    {key: value for key, value in iterator}
    

    The answers to this question do a good job explaining what an iterator is in python. Usually (but not always), when used in a dictionary comprehension, the iterator's __next__() method will return two elements. At least one of the elements must be hashable so that it can be used as the dictionary key.

    iterator doesn't necessarily need to return two elements (although that is a common use pattern). This works:

    print(dict([(i, chr(65+i)) for i in range(4)]))
    {0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'}
    

    and also shows that dictionary comprehensions are really just special syntax using same mechanics as list comprehensions and the dict() method, which is what the comment by @Barmar is doing:

    frames = dict(data.groupby('Sport'))
    

    In this case, data.groupby() does need to return two elements, and the order does matter, as it is shorthand for (roughly) this:

    dict([(key, value) for key, value in data.groupby('Sport')])