Search code examples
javashingles

Separate text in k-shingles without Scanner.class in Java


I am trying to separate a text in k-shingles, sadly I cannot use scanner. If the last shingle is too short, I want to fill up with "_". I came this far:

public class Projektarbeit {

    public static void main(String[] args) {
        testKShingling(7, "ddssggeezzfff");
    }

    public static void testKShingling(int k, String source) {
        //first eliminate whitespace and then fill up with withespaces to match target.length%shingle.length() == 0
        String txt = source.replaceAll("\\s", "");

        //get shingles
        ArrayList<String> shingles = new ArrayList<String>();
        int i;
        int l = txt.length();
        String shingle = "";

        if (k == 1) {
            for(i = 0; i < l; i++){
                shingle = txt.substring(i, i + k);
                shingles.add(shingle);
            };
        }
        else {
            for(i = 0; i < l; i += k - 1){
                try {
                    shingle = txt.substring(i, i + k);
                    shingles.add(shingle);
                }
                catch(Exception e) {
                    txt = txt.concat("_");
                    i -= k - 1;
                };
            };
        }
        System.out.println(shingles);
    }
}

Output: [ddssgge, eezzfff, f______]

It works almost, but in the with the given parameters in the example the last shingle is not necessary (it should be [ddssgge, eezzfff]

Any idea how to do this more beautiful?


Solution

  • To make the code posted work you only need to add break and the end of the catch block:

    catch(Exception e) {
         txt = txt.concat("_");
         i -= k - 1;
          break;
    };
    

    Having said that I wouldn't use an Exception to control the program. Exception are just that: should be used for run time errors. Avoid StringIndexOutOfBoundsException by controlling the loop parameters:

    public static void main(String[] args) {
        testKShingling(3, "ddssggeezzfff");
    }
    
    public static void testKShingling(int substringLength, String source) {
    
        //todo validate input
        String txt = source.replaceAll("\\s", "");
        //get shingles
        ArrayList<String> shingles = new ArrayList<>();
        int stringLength = txt.length();
    
        if (substringLength == 1) {
            for(int index = 0; index < stringLength; index++){
                String shingle = txt.substring(index, index + substringLength);
                shingles.add(shingle);
            };
        }
        else {
            for(int index = 0; index < stringLength -1 ; index += substringLength - 1){
                int endIndex = Math.min(index + substringLength, stringLength);
                String shingle = txt.substring(index, endIndex);
                if(shingle.length() < substringLength){
                    shingle = extend(shingle, substringLength);
                }
                shingles.add(shingle);
    
            };
        }
        System.out.println(shingles);
    }
    
    private static String extend(String shingle, int toLength) {
    
        String s = shingle;
        for(int index = 0; index < toLength - shingle.length(); index ++){
            s = s.concat("_");
        }
        return s;
    }
    

    An alternative implementation of testKShingling:

    public static void testKShingling(int substringLength, String source) {
    
        //todo validate input
        String txt = source.replaceAll("\\s", "");
        ArrayList<String> shingles = new ArrayList<>();
    
        if (substringLength == 1) {
            for(char c : txt.toCharArray()){
                shingles.add(Character.toString(c));
            };
        }
        else {
            while(txt.length() > substringLength) {
                String shingle = txt.substring(0, substringLength); 
                shingles.add(shingle);
                txt = txt.substring(substringLength - 1); //remove first substringLength - 1 chars 
            }
    
            if(txt.length() < substringLength){  //check the length of what's left 
                txt = extend(txt, substringLength); 
            }
            shingles.add(txt); //add what's left 
        }
        System.out.println(shingles);
    }