I have a dataframe which looks like this:
code to create the df:
dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
'EVANS TRAILER PARK'],
'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'coordinates': ['30.44126118, -86.6240656099999',
'30.7190163500001, -86.5716222299999',
'30.5115772500001, -86.4628417499999',
'30.4424195300001, -86.64733076',
'30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
'30.4427800300001, -86.62941091'],
'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}
df2 = pd.DataFrame(data=dd)
What I want to do is to create a dictionary with the following structure:
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK ; 30.44417349, -86.59951996'}
As you may see, each value must contain name;coordinates from second row to the last row. I am using the following code to do that:
d1 = {f"destination{k}":v + "; " + i for k in range(1, len(df1)-1) for v,i in zip(df1.name, df1.coordinates)}
However, this is the output I am getting:
{'destination1': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination2': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination3': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination4': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination5': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
It is only reading the last line from the dataframe and each key has the same value but what I want is that for each key, its value must come from each row from the dataframe from the second row to the last row.
If anyone has any idea of how to do that I would really appreciate your help.
The dict comprehension in your example has two for-loops:
d1 = {
f"destination{k}":v + "; " + i
for k in range(1, len(df1)-1)
for v,i in zip(df1.name, df1.coordinates)
}
In these loops, k is being iterated independently from v and i. There are a number of issues with the second loop (to understand them, just step through the operation df1.name
, df1.coordinates
, and zip(df1.name, df1.coordinates)
to see how this doesn't work - note that df1.name is a reserved attribute and refers to the dataframe's name, not to the column "name").
What you really want is to loop over multiple elements in df1 for each row. To do this, just use the first loop, but access the elements you want from the df when building the values:
d1 = {
f"destination{k}": (df1.loc[k, 'name'] + "; " + df1.loc[k, 'coordinates'])
for k in range(1, len(df1)-1)
}
Check out this FullStack Python guide's section on comprehensions for more info.
Alternatively, (and preferably) use pandas!
d1 = pd.Series(
df1['name'] + '; ' + df['coordinates'],
index=('destination' + df.index.astype(str)),
)
If at this point you really want a dictionary, you can convert the series to a dictionary with d1 = d1.to_dict()