regex sublimetext3 camelcasing sublimetext-snippet

Sublime Text 3 - snippet with camel case - and removal or word converting of special characters

I am updating some Sublime Text snippets to help automate some page developing, but I am running into a bit of a hurdle.

For this snippet I am currently trying to get the metrics string (based on the alt text), to do the following:

<snippet>
    <content><![CDATA[  
    <div class="div-block">
        <!-- Set A -->
        <div class="place_10 start">
            <a href="LinkGoesHere" main_fp_tx="PLACE-_-AH-_-${1:0}_${4:year}_${6/[^a-z0-9]+//ig}">
                <img src="http://images/sv/800/set${2:##}_${3:####}_${4:year}_img${5:#}?\$P_CONTENT\$" alt="${6:ImageAltText}" />
            </a>
        </div>
    </div>]]></content>
    <!-- Optional: Tab trigger to activate the snippet -->
    <tabTrigger>IMGT</tabTrigger>
    <!-- Optional: Scope the tab trigger will be active in -->
</snippet>

Which gives you this when typing IMGT and pressing Tab:

<div class="div-block">
    <!-- Set A -->
    <div class="place_10 start">
        <a href="LinkGoesHere" main_fp_tx="PLACE-_-AH-_-0_year_ImageAltText">
            <img src="http://images/sv/800/set##_####_year_img#?$P_CONTENT$" alt="ImageAltText" />
        </a>
    </div>
</div>

Currently when I type This is an example of content. Special character examples: 20% & & % registered trademark ® into the ImageAltText field, here is what appears in the metrics string:

<div class="div-block">
    <!-- Set A -->
    <div class="place_10 start">
        <a href="LinkGoesHere" main_fp_tx="PLACE-_-AH-_-0_year_ThisisanexampleofcontentSpecialcharacterexamples20amp37registeredtrademarkreg">
            <img src="http://images/sv/800/set##_####_year_img#?$P_CONTENT$" alt="This is an example of content.  Special character examples: 20% & &amp; &#37; registered trademark &reg;" />
        </a>
    </div>
</div>

As you can see, right now, here is how the metric text translates the current example:

"ThisisanexampleofcontentSpecialcharacterexamples20amp37registeredtrademarkreg"

Here is what I am trying to get it to do:

It needs to be camel cased, meaning each letter of each word needs to be capitalized.
Certain symbols need to be translated to a word. So for example, "%" needs to say "Percent" - and "&", (ampersand), should say "And".
HTML entity names should not appear. For example, "®" should just not appear in the metrics. Right now it shows the "reg" word. So basically anything that is between "&" and ";" should not appear - as well as the "&" and ";" parts. Make sense? Though here is where it gets difficult - & should also not appear - or at least it can say "And".

All that said - here is what I am trying to achieve.

Current version:
ThisisanexampleofcontentSpecialcharacterexamples20amp37registeredtrademarkreg

What I am trying to get it to look like:
ThisIsAnExampleOfContentSpecialCharacterExamples20PercentAndRegisteredTrademark

I've tried combining functions and had somewhat gotten it to work, but not completely.

So I was hoping some of you fine developers might know a way to make this happen.

Solution

Borrowing from the answer here a little:

The following snippet will make it so that when you type This is an example of content. Special character examples: 20% & & % registered trademark ® in the alt img attribute, it will appear in the a main_fp_tx attribute as: PLACE-_-AH-_-0_year_ThisIsAnExampleOfContentSpecialCharacterExamples20PercentAndRegisteredTrademark:

<snippet>
    <content><![CDATA[  
    <div class="div-block">
        <!-- Set A -->
        <div class="place_10 start">
            <a href="LinkGoesHere" main_fp_tx="PLACE-_-AH-_-${1:0}_${4:year}_${6/(&amp;)|(&[^; ]+;)|(&)|(\b\w)|(%)|(\W)/(?1And:)(?2:)(?3And:)(?4\u\4:)(?5Percent:)(?6:)/g}">
                <img src="http://images/sv/800/set${2:##}_${3:####}_${4:year}_img${5:#}?\$P_CONTENT\$" alt="${6:ImageAltText}" />
            </a>
        </div>
    </div>]]></content>
    <!-- Optional: Tab trigger to activate the snippet -->
    <tabTrigger>IMGT</tabTrigger>
    <!-- Optional: Scope the tab trigger will be active in -->
</snippet>

As mentioned in the linked answer, this relies on regex conditionals and alternations.

/(&amp;)|(&[^; ]+;)|(&)|(\b\w)|(%)|(\W)/(?1And:)(?2:)(?3And:)(?4\u\4:)(?5Percent:)(?6:)/g

(&) match the ampersand HTML entity into capture group 1. (?1And:) if the capture group (1) was matched, replace it with And. If it wasn't matched, do nothing.
(&[^; ]+;) match HTML entities. (?2:) if the capture group (2) was matched, replace it with nothing. If it wasn't matched, do nothing.
(&) bare ampersand. (?3And:) if the capture group (3) was matched, replace it with And. If it wasn't matched, do nothing.
(\b\w) first letter of a word. (?4\u\4:) if the capture group (4) was matched, replace the first letter with it's upper case equivalent. If it wasn't matched, do nothing.
(%) bare percent. (?5Percent:) if the capture group (5) was matched, replace it with Percent. If it wasn't matched, do nothing.
(\W) non word (a-z, 0-9 etc.) character, including whitespace. (?6:) if the capture group (6) was matched, replace it with nothing. If it wasn't matched, do nothing.

/g global modifier - i.e. don't just stop at the first match. I removed the i - case insensitive modifier as it isn't needed in my regex.

Because the capture groups are matched and replaced in order, it replaces HTML entities before looking at bare ampersands, and strips non word characters last.

For more information on the replacement syntax, see http://www.boost.org/doc/libs/1_61_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html