I'm new at the site but I've been looking to know the difference of the applications of ":
" and "=
".
I know that "=
" is an assignation operator, like a= 12
, so like b= any DataFrame, c could be c= b.copy()
... but I was looking the following code and didn't understand why instead of using "=
" they use ":
" to assign the values of the pandas columns. I know it follows a dictionary structure, but I read that for a dictionary the values can't be repeated when they're first created. So in this example of a df creation:
n = 100 # Number of registrations
age = np.random.randint(18, 25, n)
gender = np.random.choice(['M', 'F'], n)
math_score = np.random.randint(0, 100, n)
english_score = np.random.randint(0, 100, n)
physics_score = np.random.randint(0, 100, n)
# Compute Average
average_score = (math_score + english_score + physics_score) / 3
# Create dataFrame
data = pd.DataFrame({
'Age': age,
'Gender': gender,
'Math Score': math_score,
'English Score': english_score,
'Physics Score': physics_score,
'Average Score': average_score
})
I can't fully understand the match between this structure and the stuff I read (HubSpot) that there should be unique values... if every value is given randomized is mostly probable than in 100 reps at least one value is the same. But I also feel familiar this dataFrame construction, with a dictionary like structure. However, how I'm relatively new to programmation it confuses me...
Anyways, my doubt is also because I was inspecting the kwargs of Seaborn Boxplot and I found that to customize median and mean visualization it's something like this (this is what I've done until now and works fine, but I don't feel comfortable making something work but not knowing what I did...):
sns.boxplot(x="Age", data= df, orient ="h", color= "darkblue", medianprops={"color": "red"}, boxprops={"facecolor": (0, 0, 1, .7)},
showmeans= True, meanprops={"marker": "o", "markerfacecolor": "black", "markeredgecolor": "black", "markersize": "6"})
Thanks in advance, I know it's pretty long and I'm plenty sure it could be wrote in fewer lines... working on that! =)
In both cases, when changing ":
" to "=
", like:
data = pd.DataFrame({
'Age'= age,
'Gender'= gender,
'Math Score'= math_score,
'English Score'= english_score,
'Physics Score'= physics_score,
'Average Score'= average_score
})
It underlines "Age", "Gender", ..., "Average Score" telling:
for Age -> SyntaxError: cannot assign to literal here. Maybe you mean "==
" instead of "=
"?
for the "Gender" and the rest -> Expected parameter name
=
is used to assign a variable a value. :
is used to separate a key from its value in a dictionary construction. The key is not a variable, so it doesn't use =
. The key is an index in the dictionary. Variables and dictionary keys are similar but distinct concepts (in fact, you can obtain a dictionary containing your variables, but I don't want to confuse you).
A dictionary key is what you pass, as a string, to []
to get its value. For example, if you have a dictionary:
d = { 'Age': 33, 'Gender': 'male' }
you would print the age with print(d['Age'])
. (Here d
is a variable that holds a dictionary.) If Age
were a variable, you'd have:
Age = 33
and print it with print(Age)
. Note that variables are not in quotes and dictionary key literals are. But this allows you to do this:
key='Age'
print(d[key])
Output:
33
This doesn't work for a variable*. This allows you to do access values programmatically like
dict_list = [ f'{k}: {v}' for k,v in d.items()]
print('\n'.join(dict_list))
The output of this is:
Age: 33
Gender: male
* I'm simplifying things for newbies. There are ways to do this but we're glossing over these as advanced, often inadvisable topics.