Say there is a file too big to be put to memory. How can I get a random line from it? Thanks.
Update: I want to the probabilities of getting each line to be equal.
Here's a solution. Take a look at the choose() method which does the real thing (the main() method repeatedly exercises choose(), to show that the distribution is indeed quite uniform).
The idea is simple: when you read the first line it has a 100% chance of being chosen as the result. When you read the 2nd line it has a 50% chance of replacing the first line as the result. When you read the 3rd line it has a 33% chance of becoming the result. The fourth line has a 25%, and so on....
import java.io.*;
import java.util.*;
public class B {
public static void main(String[] args) throws FileNotFoundException {
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 1000; ++i)
{
String s = choose(new File("g:/temp/a.txt"));
if(!map.containsKey(s))
map.put(s, 0);
map.put(s, map.get(s) + 1);
}
System.out.println(map);
}
public static String choose(File f) throws FileNotFoundException
{
String result = null;
Random rand = new Random();
int n = 0;
for(Scanner sc = new Scanner(f); sc.hasNext(); )
{
++n;
String line = sc.nextLine();
if(rand.nextInt(n) == 0)
result = line;
}
return result;
}
}