Search code examples
rubyfileio

Ruby: What's an elegant way to pick a random line from a text file?


I've seen some really beautiful examples of Ruby and I'm trying to shift my thinking to be able to produce them instead of just admire them. Here's the best I could come up with for picking a random line out of a file:

def pick_random_line
  random_line = nil
  File.open("data.txt") do |file|
    file_lines = file.readlines()
    random_line = file_lines[Random.rand(0...file_lines.size())]
  end 

  random_line                                                                                                                                                               
end 

I feel like it's gotta be possible to do this in a shorter, more elegant way without storing the entire file's contents in memory. Is there?


Solution

  • You can do it without storing anything except the most recently-read line and the current candidate for the returned random line.

    def pick_random_line
      chosen_line = nil
      File.foreach("data.txt").each_with_index do |line, number|
        chosen_line = line if rand < 1.0/(number+1)
      end
      return chosen_line
    end
    

    So the first line is chosen with probability 1/1 = 1; the second line is chosen with probability 1/2, so half the time it keeps the first one and half the time it switches to the second.

    Then the third line is chosen with probability 1/3 - so 1/3 of the time it picks it, and the other 2/3 of the time it keeps whichever one of the first two it picked. Since each of them had a 50% chance of being chosen as of line 2, they each wind up with a 1/3 chance of being chosen as of line 3.

    And so on. At line N, every line from 1-N has an even 1/N chance of being chosen, and that holds all the way through the file (as long as the file isn't so huge that 1/(number of lines in file) is less than epsilon :)). And you only make one pass through the file and never store more than two lines at once.

    EDIT You're not going to get a real concise solution with this algorithm, but you can turn it into a one-liner if you want to:

    def pick_random_line
      File.foreach("data.txt").each_with_index.reduce(nil) { |picked,pair| 
        rand < 1.0/(1+pair[1]) ? pair[0] : picked }
    end