Search code examples
javaregexcamelcasing

Underscore to camel case except for certain prefixes


I am currently creating a Java program to rewrite some outdated Java classes in our software. Part of the conversion includes changing variable names from containing underscores to using camelCase instead. The problem is, I cannot simply replace all underscores in the code. We have some classes with constants and for those, the underscore should remain.
How can I replace instances like string_label with stringLabel, but DO NOT replace underscores that occur after the prefix "Parameters."?

I am currently using the following which obviously does not handle excluding certain prefixes:

public String stripUnderscores(String line) { 
  Pattern p = Pattern.compile("_(.)");
  Matcher m = p.matcher(line);         
  StringBuffer sb = new StringBuffer(); 
  while(m.find()) { 
    m.appendReplacement(sb, m.group(1).toUpperCase()); 
  } 
  m.appendTail(sb); 
  return sb.toString(); 
}

Solution

  • You could possibly try something like:

    Pattern.compile("(?<!(class\\s+Parameters.+|Parameters\\.[\\w_]+))_(.)")
    

    which uses a negative lookbehind.

    You would probably be better served using some kind of refactoring tool that understood scoping semantics.

    If all you check for is a qualified name like Parameters.is_module_installed then you will replace

    class Parameters {
        static boolean is_module_installed;
    }
    

    by mistake. And there are more corner cases like this. (import static Parameters.*;, etc., etc.)

    Using regular expressions alone seems troublesome to me. One way you can make the routine smarter is to use regex just to capture an expression of identifiers and then you can examine it separately:

    static List<String> exclude = Arrays.asList("Parameters");
    
    static String getReplacement(String in) {
        for(String ex : exclude) {
            if(in.startsWith(ex + "."))
                return in;
        }
    
        StringBuffer b = new StringBuffer();
        Matcher m = Pattern.compile("_(.)").matcher(in);
        while(m.find()) {
            m.appendReplacement(b, m.group(1).toUpperCase());
        }
    
        m.appendTail(b);
        return b.toString();
    }
    
    static String stripUnderscores(String line) { 
        Pattern p = Pattern.compile("([_$\\w][_$\\w\\d]+\\.?)+");
        Matcher m = p.matcher(line);         
        StringBuffer sb = new StringBuffer(); 
        while(m.find()) { 
            m.appendReplacement(sb, getReplacement(m.group())); 
        } 
        m.appendTail(sb); 
        return sb.toString(); 
    }
    

    But that will still fail for e.g. class Parameters { is_module_installed; }.

    It could be made more robust by further breaking down each expression:

    static String getReplacement(String in) {
        if(in.contains(".")) {
            StringBuilder result = new StringBuilder();
    
            String[] parts = in.split("\\.");
    
            for(int i = 0; i < parts.length; ++i) {
                if(i > 0) {
                    result.append(".");
                }
    
                String part = parts[i];
    
                if(i == 0 || !exclude.contains(parts[i - 1])) {
                    part = getReplacement(part);
                }
    
                result.append(part);
            }
    
            return result.toString();
        }
    
        StringBuffer b = new StringBuffer();
        Matcher m = Pattern.compile("_(.)").matcher(in);
        while(m.find()) {
            m.appendReplacement(b, m.group(1).toUpperCase());
        }
    
        m.appendTail(b);
        return b.toString();
    }
    

    That would handle a situation like

    Parameters.a_b.Parameters.a_b.c_d
    

    and output

    Parameters.a_b.Parameters.a_b.cD
    

    That's impossible Java syntax but I hope you see what I mean. Doing a little parsing yourself goes a long way.