Search code examples
ruby-on-rails-5poisson

Statsample-glm gem IndexError: Specified vector y does not exist


I am trying to create a Poisson regression for some school performance data, and this seems like the best gem so far.

Going through the practice analysis from this post, I come up with this error:

irb(main):001:0> require 'daru'
  require 'statsample-glm'
=> false
=> false
irb(main):003:0> data_set = Daru::DataFrame.from_csv "logistic_mle.csv"
=> #<Daru::DataFrame(200x4)>
                    a          b          c          y
          0 0.75171213 -3.2683591 1.70092606          0
          1 0.55421406 -2.9565972 2.66368360          0
          2 -1.8533164 -2.8293733 3.34679611          0
          3 -2.8861015 -0.7389824 4.74970154          0
          4 -2.6055309 0.56102031 5.48308397          0
          5 -4.2735321 1.62383436 5.35813425          0
          6 -4.7701259 1.22025583 6.41070111          0
          7 -6.9231483 2.86547174 8.73185919          0
          8 -7.5641950 4.94028695 8.94193466          0
          9 -8.6309366 4.27420502 9.27002100          0
         10 -8.9911114 5.10389362 11.7669513          0
         11 -9.9905763 7.87484596 12.4794035          0
         12 -10.381878 8.84300238 13.7498993          0
         13 -11.047682 9.44613324 13.5025027          0
         14 -12.434424 9.70515870 15.1221173          0
         15 -13.627294 10.4190343 16.3289942          0
         16 -15.620222 11.3788332 17.7367653          0
         17 -16.292239 13.1516565 18.6939344          0
         18 -16.715913 14.9076297 18.0246863          0
         19 -17.950125 15.8533651 20.6826094          0
         20 -18.989884 15.4331557 20.9101142          0
         21 -19.908508 16.8542366 22.0721145          0
         22 -21.146652 18.6785324 23.4977598          0
         23 -21.367574 18.3208056 23.9121114          0
         24 -22.131396 20.7616214 24.1683442          0
         25 -23.163631 21.1293492 25.2695476          0
         26 -24.136076 21.7035705 27.9161820          0
         27 -25.386072 23.3588003 27.8755285          0
         28 -27.254627 24.9201403 28.9810564          0
         29 -28.845061 25.1681854 29.6749936          0
        ...        ...        ...        ...        ...
irb(main):004:0> glm = Statsample::GLM.compute data_set, :y, :logistic, {constant: 1, algorithm: :mle} 
Traceback (most recent call last):
        1: from (irb):4
IndexError (Specified vector y does not exist)

Further inspection of the error reveals this:

Caused by:
IndexError: Specified index :y does not exist

I've tried reformatting the header to "date" instead of "string" based on a comment in this stackoverflow post which is marginally related, with no change in the error.

Any thoughts from the SO community?


Solution

  • Sorry SO, I posted this too quickly. I found a solution that works:

    Instead of

    data_set = Daru::DataFrame.from_csv "logistic_mle.csv"
    

    This line works:

    data_set = Daru::DataFrame.from_csv("logistic_mle.csv", headers: true, header_converters: :symbol)