Search code examples
pythonpandasjupytertypeerrordata-analysis

I am noticing extra and unnecessary data in my table after fixing the type error cannot convert the series to class float


For example, the first thing I noticed when obtaining the total number of students was that when I ran the cell of my code,

# Get the total number of students.
student_count = school_data_complete_df.count()
student_count

I get the following results as expected: enter image description here

Even the sampled output provides the same results, so I am correct. However, when I run the data frames onto a table, I get the following results for the Total Students column:

enter image description here

this is what the correct sampled output is supposed to be:

enter image description here

I am noticing similar anomalies in later sections when running my passing percentages for math and reading students. To start off, I am determining the passing grades for math and reading assessment tests:

passing_math = school_data_complete_df["math_score"] >= 70
passing_reading = school_data_complete_df["reading_score"] >= 70

The output I am getting is slightly different from the expected output:

enter image description here

I noticed this just now

Here is the correct output:

enter image description here

The rest of my code runs normally

# Get all the students that are passing reading in a new DataFrame.
passing_reading = school_data_complete_df[school_data_complete_df["reading_score"] >= 70]

# Calculate the number of students passing math.
passing_math_count = passing_math["student_name"].count()

# Calculate the number of students passing reading.
passing_reading_count = passing_reading["student_name"].count()

print(passing_math_count)
print(passing_reading_count)

Until I reached this error:

# Calculate the percent that passed math.
passing_math_percentage = passing_math_count / float(student_count) * 100

# Calculate the percent that passed reading.
passing_reading_percentage = passing_reading_count / float(student_count) * 100

The information after this cell of code says the following:

enter image description here

However, when I tried to run the code, I was receiving a type error cannot convert the series to class float. I mitigated this issue by editing my cell of code to look like this:

# Calculate the percent that passed math.
passing_math_percentage = passing_math_count / student_count.astype("float") * 100

# Calculate the percent that passed reading.
passing_reading_percentage = passing_reading_count / student_count.astype("float") * 100

Now I am not getting an error but this is what my overall table looks like now after creating a district summary dataframe:

# Adding a list of values with keys to create a new DataFrame.
district_summary_df = pd.DataFrame(
          [{"Total Schools": school_count,
          "Total Students": student_count,
          "Total Budget": total_budget,
          "Average Math Score": average_math_score,
          "Average Reading Score": average_reading_score,
          "% Passing Math": passing_math_percentage,
         "% Passing Reading": passing_reading_percentage,
        "% Overall Passing": overall_passing_percentage}])
district_summary_df

enter image description here

I only want the percentage values to appear on my table and one number to appear in the column of total students. So the correct sampled output looks like this:

enter image description here

enter image description here enter image description here enter image description here enter image description here


Solution

    1. student_count should be a float so count a single column
    2. For the passing count, use sum to count all the rows that are True. You are currently using count which will count all rows (both True and False).
    #count the number of students using a single column so the result is a float
    student_count = school_data_complete_df["Student ID"].count()
    
    #get passing math and reading masks
    passing_math = school_data_complete_df["math_score"] >= 70
    passing_reading = school_data_complete_df["reading_score"] >= 70
    
    # Calculate the number of students
    passing_math_count = passing_math.sum()
    passing_reading_count = passing_reading.sum()
    
    passing_math_percentage = passing_math_count/student_count * 100
    passing_reading_percentage = passing_reading_count/student_count * 100