Search code examples
javastringhashmap

Returning occurrences of a character in a string


I've got a task where I'm to discover how many times 'A', 'C', 'G' and 'T' occur in a string, and return it in the following format:

A:count C:count G:count T:count

I'm very new to Java having started learning Java 3 days ago I've come across literature referring to a HashMap being the most viable and efficient means of storing and retrieving this data - As such, I opted in this method. I've managed to create the conditionals and store the data, however I'm struggling with the presentation of data as mentioned above.

Apologise for some offensive code in advance, what I have so far is:

public class DNA {
     static void characterCount(String dna) {
        HashMap<Character, Integer> charCountMap = new HashMap<Character, Integer>();
        char[] dnaArray = dna.toCharArray();
        charCountMap.put('C', 0);
        charCountMap.put('A', 0);
        charCountMap.put('G', 0);
        charCountMap.put('T', 0);
        for (char q : dnaArray) {

            if (q == 'A' || q == 'C' || q == 'G' || q == 'T') {
                charCountMap.put(q, charCountMap.get(q) + 1);
            } else {
                continue;
            } 
        }
        System.out.println(charCountMap);
    }

    public static void main(String[] args) {
        characterCount("ACTGSRSSDSGGGHHTYTCCCFDT");
    }
}

I would appreciate any input, advice or signposting to relevant resources for further learning.

Thank you very much for your time!


Solution

  • tl;dr

    Generate output using String.format. The %d is a placeholder for a long primitive value produced from our passed Long object via auto-boxing.

    String.format( 
        "C:%d A:%d G:%d T:%d" , 
        map.get( "C" ) , map.get( "A" ) , map.get( "G" ) , map.get( "T" ) 
    )
    

    Details

    Streams makes easier work of this.

    Here is a modified version of code from this article.

    The split method returns an array of String objects.

    We convert those to uppercase. You could omit this step if you know your inputs to already be in uppercase.

    We then filter out any strings that are not our desired CAGT.

    Lastly, we use a Collector to get a count of each letter, and store the result in a new map.

    String input = "ACTGSRSSDSGGGHHTYTCCCFDT" ;
    Map < String , Long > map =
        Arrays
        .stream( 
            input.split("") 
        )
        .map( String :: toUpperCase )
        .filter( s -> "CAGT".contains( s ) )
        .collect(
            Collectors.groupingBy( 
                s -> s , TreeMap :: new , Collectors.counting()
            )
        )
    ;  
    

    Generate output.

    String output =String.format( "C:%d A:%d G:%d T:%d" , map.get( "C" ) , map.get( "A" ) , map.get( "G" ) , map.get( "T" ) ) ;
    System.out.println( output ) ;
    

    See this code run live at Ideone.com.

    C:4 A:1 G:4 T:4