Search code examples
pythonregressionscatter-plot

Creating scatterplot / regression line using python


I am stuck on this problem but cannot figure out why it isn't working as intended. I have a text file with a bunch of x and y coordinates which I need to use to find the average of all x and y values in order to calculate the slope for my regression line. It seems like stamping the individual coordinates works but apparently appending each x or y value to my lists isn't working right as the error I am getting is "ZeroDivisionError: division by zero".

Here's my code:

import turtle
t = turtle.Turtle()
wn = turtle.Screen()
turtle.setworldcoordinates(-100, -100, 100, 100)
wn.bgcolor('lightblue')
t.pencolor('red')
filename = open('data.txt', 'r')

def plotregression():
    sum_of_x = []
    mean_of_x = sum(sum_of_x) / len(sum_of_x)  #doesnt work as intended
    sum_of_y = []
    mean_of_y = sum(sum_of_y) / len(sum_of_x)   #doesnt work as intended
    #slope =
    for line in filename:
        values = line.split()
        sum_of_x = sum_of_x.append(values[1])
        sum_of_y = sum_of_y.append(values[1])
        t.up()
        t.goto(int(values[0]), int(values[1]))
        t.down()
        t.stamp()
        t.down()

plotregression()
filename.close()
wn.exitonclick()

I really appreciate any input.


Solution

  • I tried out your code. The reason for the "divide by zero" occurs because your calculation of mean values occurs immediately after you have defined your "sum_of_x" and "sum_of_y" lists. So on the initial go, there are no data points in those lists and thus the numerator and denominator are going to be zero. As a test, I moved the calculation of those mean values after the retrieval of data from the file as noted in the following code snippet.

    def plotregression():
        sum_of_x = []
        
        sum_of_y = []
        
        #slope =
        for line in filename:
            values = line.split()
            sum_of_x.append(int(values[0]))
            sum_of_y.append(int(values[1]))
            mean_of_x = sum(sum_of_x) / len(sum_of_x)  #doesnt work as intended
            mean_of_y = sum(sum_of_y) / len(sum_of_x)  #doesnt work as intended
            print('mean_of_x ', mean_of_x, 'mean_of_y ', mean_of_y)
            t.up()
            t.goto(int(values[0]), int(values[1]))
            t.down()
            t.stamp()
            t.down()
    

    I just used some made up data points in placed them into a file named "data.txt" just to see if the program would run and it did. Not a very impressive image but it did produce output.

    Sample Window

    Hope that helps you out.

    Regards.