Search code examples
javaprobability

Find most common randomly assigned string in ArrayList


I am working on a simulator in which Person objects (stored in an ArrayList) "reproduce" and make babies, and they inherit "genes", represented as 4-letter strings. At the program start, the gene pool for the first people is randomly generated.

At every tick of the timer, I want to calculate what the most common "gene" among all the Person objects is.

The four letters are:
1. G, Z, N, F
2. A, T, C, G
3. B, F, Q, N
4. A, C, T, E

There are 256 possible combinations in this case, and there has to be a more efficient check than 256 if-else statements.

The Person class (minus get/set methods)

public class Person {
    static Random rand = new Random();
    private Person mother;
    private Person father;
    private String genes;
    private char sex;
    private int age, numKids;

    public Person() {
        mother = null;
        father = null;
        genes = createGenes();
        if (rand.nextDouble() <= 0.5)
            sex = 'm';
        else
            sex = 'f';
        age = 18;
        numKids = 0;
    }

    public Person(Person m, Person f) {
        mother = m;
        father = f;
        genes = inheritGenes(m, f);
        if (rand.nextDouble() <= 0.5)
            sex = 'm';
        else
            sex = 'f';
        age = 0;
    }
//create genes for original Persons
    private String createGenes() {
        String genetics = "";

        double first = rand.nextDouble();
        double second = rand.nextDouble();
        double third = rand.nextDouble();
        double fourth = rand.nextDouble();

        if (first <= 0.25)
            genetics += "G";
        else if (first <= 0.68)
            genetics += "Z";
        else if (first <= 0.9)
            genetics += "N";
        else
            genetics += "F";

        if (second <= 0.65)
            genetics += "A";
        else if (second <= 0.79)
            genetics += "T";
        else if (second <= 0.85)
            genetics += "C";
        else
            genetics += "G";

        if (third <= 0.64)
            genetics += "B";
        else if (third <= 0.95)
            genetics += "F";
        else if (third <= 0.98)
            genetics += "Q";
        else
            genetics += "N";

        if (fourth <= 0.37)
            genetics += "A";
        else if (fourth <= 0.58)
            genetics += "C";
        else if (fourth <= 0.63)
            genetics += "T";
        else
            genetics += "E";
        return genetics;

    }
//inherit genes from parents for new Persons
    public String inheritGenes(Person m, Person f) {
        String genetics = "";
        double first = rand.nextDouble();
        double second = rand.nextDouble();
        double third = rand.nextDouble();
        double fourth = rand.nextDouble();

        if (first < 0.5) {
            genetics += m.getGenes().charAt(0);
        } else
            genetics += f.getGenes().charAt(0);

        if (second < 0.5) {
            genetics += m.getGenes().charAt(1);
        } else
            genetics += f.getGenes().charAt(1);

        if (third < 0.5) {
            genetics += m.getGenes().charAt(2);
        } else
            genetics += f.getGenes().charAt(2);

        if (fourth < 0.5) {
            genetics += m.getGenes().charAt(3);
        } else
            genetics += f.getGenes().charAt(3);

        return genetics;
    }
}

Solution

  • Sample code that finds the most common gene from a List<Person>. I just added a getter for the genes String:

    String getGenes() {
        return genes;
    }
    

    The code is as follows:

    List<Person> people = new ArrayList<>();
    
    for (int i = 0; i < 100; i++) {
        people.add(new Person()); // 100 random genes
    }
    
    String mostCommonGene = people.stream()
                    .map(Person::getGenes)
                    .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream()
                    .max(Comparator.comparingLong(Map.Entry::getValue))
                    .get()
                    .getKey();
    
        System.out.println("Most common gene: " + mostCommonGene);
    

    We use Java 8 Streams:

    • we get the stream() of people list.
    • we map() (transform) every Person to String - their genes.
    • we collect() the stream of genes with groupingBy() fed by Function.identity() and Collectors.counting(). This step produces a Map<String, Long> which represents a map of genes and their frequencies. Effectively, this counts the occurrences of the genes from people list.
    • then we call entrySet() on that map and then stream() again - now we have a stream of map entries (you can think of them as pairs - the gene and its frequency inside one object. Convenient).
    • we call max() to find the entry with the highest value (interpreted as frequency). Comparator.comparingLong() tells the max() algorithm how we compare the pairs, but the pairs are not longs - that's why we have to tell it how to convert the entry to a long - we get the value of that entry.
    • then we call get(), since max() returns an Optional<T>. We just want the T (the entry).
    • lastly, we call getKey() on the entry that represents a pair of the most frequent gene and its frequency. A key is the gene and the value is its frequency, as previously mentioned.

    If you are unfamiliar with most concepts described in this answer, I highly suggest you learning about Java 8 Streams. Once you get used to them, you can't stop.