Search code examples
pythonassignment-operator

What's the difference or when should I apply ":" vs "=" in Python?


I'm new at the site but I've been looking to know the difference of the applications of ":" and "=".

I know that "=" is an assignation operator, like a= 12, so like b= any DataFrame, c could be c= b.copy()... but I was looking the following code and didn't understand why instead of using "=" they use ":" to assign the values of the pandas columns. I know it follows a dictionary structure, but I read that for a dictionary the values can't be repeated when they're first created. So in this example of a df creation:

n = 100 # Number of registrations
age = np.random.randint(18, 25, n)
gender = np.random.choice(['M', 'F'], n)
math_score = np.random.randint(0, 100, n)
english_score = np.random.randint(0, 100, n)
physics_score = np.random.randint(0, 100, n)
# Compute Average
average_score = (math_score + english_score + physics_score) / 3
# Create dataFrame
data = pd.DataFrame({
'Age': age,
'Gender': gender,
'Math Score': math_score,
'English Score': english_score,
'Physics Score': physics_score,
'Average Score': average_score
})

I can't fully understand the match between this structure and the stuff I read (HubSpot) that there should be unique values... if every value is given randomized is mostly probable than in 100 reps at least one value is the same. But I also feel familiar this dataFrame construction, with a dictionary like structure. However, how I'm relatively new to programmation it confuses me...

Anyways, my doubt is also because I was inspecting the kwargs of Seaborn Boxplot and I found that to customize median and mean visualization it's something like this (this is what I've done until now and works fine, but I don't feel comfortable making something work but not knowing what I did...):

sns.boxplot(x="Age", data= df, orient ="h", color= "darkblue", medianprops={"color": "red"}, boxprops={"facecolor": (0, 0, 1, .7)},
            showmeans= True, meanprops={"marker": "o", "markerfacecolor": "black", "markeredgecolor": "black", "markersize": "6"})

Thanks in advance, I know it's pretty long and I'm plenty sure it could be wrote in fewer lines... working on that! =)

In both cases, when changing ":" to "=", like:

data = pd.DataFrame({
'Age'= age,
'Gender'= gender,
'Math Score'= math_score,
'English Score'= english_score,
'Physics Score'= physics_score,
'Average Score'= average_score
})

It underlines "Age", "Gender", ..., "Average Score" telling:

for Age -> SyntaxError: cannot assign to literal here. Maybe you mean "==" instead of "="? for the "Gender" and the rest -> Expected parameter name


Solution

  • = is used to assign a variable a value. : is used to separate a key from its value in a dictionary construction. The key is not a variable, so it doesn't use =. The key is an index in the dictionary. Variables and dictionary keys are similar but distinct concepts (in fact, you can obtain a dictionary containing your variables, but I don't want to confuse you).

    A dictionary key is what you pass, as a string, to [] to get its value. For example, if you have a dictionary:

    d = { 'Age': 33, 'Gender': 'male' }
    

    you would print the age with print(d['Age']). (Here d is a variable that holds a dictionary.) If Age were a variable, you'd have:

    Age = 33
    

    and print it with print(Age). Note that variables are not in quotes and dictionary key literals are. But this allows you to do this:

    key='Age'
    print(d[key])
    

    Output:

    33
    

    This doesn't work for a variable*. This allows you to do access values programmatically like

    dict_list = [ f'{k}: {v}' for k,v in d.items()]
    print('\n'.join(dict_list))
    

    The output of this is:

    Age: 33
    Gender: male
    

    * I'm simplifying things for newbies. There are ways to do this but we're glossing over these as advanced, often inadvisable topics.