Search code examples
rlinear-regression

R: do multiple linear regressions in a data table


I have a data table that looks like this (from the CSV) outlining voting data. What I need to know is how many votes come in per day (average) by year, by doing a linear regression over votesneeded ~ dayuntilelection. The slope would be the average votes coming in per day.

How can I run a linear regression function over this dataframe by year?

date,year,daysuntilelection,votesneeded
2018-01-25,2018,9,40
2018-01-29,2018,5,13
2018-01-30,2018,4,-11
2018-02-03,2018,0,-28
2019-01-23,2019,17,81
2019-02-01,2019,8,-4
2019-02-09,2019,0,-44
2020-01-17,2020,22,119
2020-01-24,2020,15,58
2020-01-30,2020,9,12
2020-02-03,2020,5,-4
2020-02-07,2020,1,-12
2021-01-08,2021,29,120
2021-01-26,2021,11,35
2021-01-29,2021,8,17
2021-02-01,2021,5,-2
2021-02-03,2021,3,-8
2021-02-06,2021,0,-10

The preferred output would be a dataframe looking something like this

year     averagevotesperday
2018       8.27
2019       7.40
2020       6.55
2021       4.60

note: full data sets and analyses are at https://github.com/robhanssen/glenlake-elections, for the curious.


Solution

  • Do you need something like this?

    library(dplyr) 
    
    dat |>
        group_by(year) |>
        summarize(
            avgVoteDay = coef(lm(votesneeded ~ daysuntilelection))[2]
        )
    

    Output is slightly differs from yours:

    # A tibble: 4 x 2
       year avgvote_day
      <int>       <dbl>
    1  2018        7.76
    2  2019        7.40
    3  2020        6.41
    4  2021        4.74