Search code examples
javascriptregexuppercaselowercasebraille

Regex for complex uppercase-lowercase scenarios


I'm working on an app that adapts text to braille specifications and it has some tricky rules on how to handle uppercase, I'd like some help. The rules are:

  1. Before a single uppercase letter, add ":"

:This is an :Example

  1. Before multiple uppercase letters and all caps words add another ":"

:This is ::ANOTHER ex::AMple, ::ALRIGHT

  1. If a sequence of uppercase words is made of more than three uppercase words in a row, add "-" to the beggining of the sequence and delete all other "::" within that sequence, except for the last one

:This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example

  1. Finally, if it goes from uppercase to lower case mid word (except when first capitalized letters), add ";"

:This is my fin:A;l ::EXAM;ple

Working with regex, I was able to solve for the simple ones but not all rules.

// adds : before any uppercase
   var firstChange = text.replace(/[A-Z]+/g,':$&'); 

// adds : to double+ uppercase    
   var secondChange = firstChange.replace(/[([A-Z]{2,}/g,':$&'); 

// adds ; to upper-lower change
   var thirdChange = secondChange.replace(/\B[A-Z]+(?=[a-z]/g,'$&;')    

I was trying to build up from simple to complex, then I tried the other way around, then I tried merging some rules, either way they conflict. I'm new to regex and I could use any insight on how to solve this. Thank you.

Edit: To make it more clear, I made a final example that combines all rules.

This is an Example. This is ANOTHER exAmple, ALRIGHT? This is A VERY LONG SENTENCE WITH A SEQUENCE OF ALL CAPS to serve AS AN Example. This is my finAl EXAMple.

Should become:

:This is an :Example. :This is ::ANOTHER ex::AM;ple, ::ALRIGHT? :This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example. :This is my fin:A;l ::EXAM;ple


SOLVED: With the help of @ChrisMaurer and @SaSkY, here is the code to solve the above problem:

(edit: fixed fourth change thanks to @Sasky)

var original = document.getElementById("area1");
var another = document.getElementById("area2");

function MyFunction(area1) {

  // include : before every uppercase
  var firstChange = original.value.replace(/[A-Z]+/g, ':$&');

  // add one more : before multiple uppercase letters
  var secondChange = firstChange.replace(/([([A-Z]{2,}|\b[|A-Z]+\b)/g, ':$&');

  // add - to beggining of long uppercase sequence
  var thirdChange = secondChange.replace(/\B(::[A-Z]+(\s+::[A-Z]+){3,})/g, '-$&');

  // removes extra :: before words within long uppercase sequence
  var fourthChange = thirdChange.replace(/(?<=-::[A-Z]+\s(?:::[A-Z]+\s)*)::(?=[A-Z]+\s)(?![A-Z]+\s(?!::[A-Z]+\b))/g, '');

  // add a lowercase symbol when it changes from uppercase to lowercase mid word
  var fifthChange = fourthChange.replace(/\B[A-Z](?=[a-z])/g, '$&;');

  // update
  area2.value = fifthChange;
}
<html>
<body>
<textarea id="area1"  rows="4" cols="40" onkeyup="MyFunction()">
</textarea>
<textarea id="area2" rows="4" cols="40"></textarea>
</body>
</html>


Solution

  • So I think your approach is good, and the first replace seems to get the single colons into the right place. The second one screws up on single letter words like A and I. I would fix that with an added alternation:

    /([([A-Z]{2,}|\b[A-Z]+\b)/g
    

    Now you need to add two more replacements; one to add the hyphen, and the other to remove the double colons.

    For the hyphen you just search for three or more ::ALLCAPS whitespace combos like this:

    /\B(::[A-Z]+(\s+::[A-Z]+){2,})/g
    

    The \B handles caps at the very beginning of the string. I replaced with hyphen and $1.

    To remove the double colons, I got a little trickier with a lookbehind and a lookahead:

    /(?<=::[A-Z]+\s*)::([A-Z]+)(?=\s*::[A-Z]+)/g
    

    This one is just replaced with $1. Luckily Javascript supports variable length lookbehinds.

    Here it is working on Regex101: enter image description here

    I did not look at your last replacement. Superficially it seemed to be OK.