Unable to determine reason for inappropriate formatting

The text formatting breaks when text is copy pasted from a MS Word document. However when the same text is typed in to the Text box the issue does not occur.

A JSF inputText tag is used to receive text containing a maximum of 140 Chars. This text is then processed server-side to remove Whitespaces and insert HTML <br> tags when a word inside the text is too big and is un-wrappable by the browser.

The text is then displayed on a JSF page using an outputText tag inside an <li> element.

<ul> 
    <hx:viewFragment id="msgFragPrn" rendered="#{pc_Pagecode.enabled}"> 
        <f:verbatim><li class="datarowPrintOnly"><span class="row"></f:verbatim>
        <h:outputText value="Message"></h:outputText>
        <f:verbatim></span></f:verbatim>
        <h:outputText value=":" styleClass="printOnly"></h:outputText>
        <h:outputText  escape="false" styleClass="printInline" value="#{pc_Pagecode.transObj.formattedMsg}">
        </h:outputText>
        <f:verbatim> </li></f:verbatim>
     </hx:viewFragment> 
</ul>

CSS Styles used

.printInline{display:inline-block; word-wrap:break-word; padding-left:2px; }
.printOnly{padding-right:4px; width:2%; vertical-align:top;}

The above code and styles allow the following display

Display format when text is typed into text box

Message : The text is displayed properly when typed into the text box. However, the same is not the case when the text is copy pasted from a Word doc.

Display format when text is typed into text box

Message :
The text is displayed properly when typed into the text box. However, the same is not the case when the text is copy pasted from a Word doc.

Since the only difference between the two text inputs could be CR/LF/Invisible characters. I tried the following things

String whitespace_chars = "" /* dummy empty string for homogeneity */
+ "\\u0009" // CHARACTER TABULATION 
                            + "\\u000A" // LINE FEED (LF) 
                            + "\\u000B" // LINE TABULATION 
                            + "\\u000C" // FORM FEED (FF) 
                            + "\\u000D" // CARRIAGE RETURN (CR) 
                            + "\\u0085" // NEXT LINE (NEL) 
                            + "\\u001C" // FILE SEPARATOR. 
                            + "\\u001D" //  GROUP SEPARATOR. 
                            + "\\u001E" // RECORD SEPARATOR. 
                            + "\\u001F" //  UNIT SEPARATOR. 
                            + "\\u00A0" // NO-BREAK SPACE 
                            + "\\u1680" // OGHAM SPACE MARK 
                            + "\\u180E" // MONGOLIAN VOWEL SEPARATOR 
                            + "\\u2000" // EN QUAD 
                            + "\\u2001" // EM QUAD 
                            + "\\u2002" // EN SPACE 
                            + "\\u2003" // EM SPACE 
                            + "\\u2004" // THREE-PER-EM SPACE 
                            + "\\u2005" // FOUR-PER-EM SPACE 
                            + "\\u2006" // SIX-PER-EM SPACE 
                            + "\\u2007" // FIGURE SPACE 
                            + "\\u2008" // PUNCTUATION SPACE 
                            + "\\u2009" // THIN SPACE 
                            + "\\u200A" // HAIR SPACE 
                            + "\\u2028" // LINE SEPARATOR 
                            + "\\u2029" // PARAGRAPH SEPARATOR 
                            + "\\u202F" // NARROW NO-BREAK SPACE 
                            + "\\u205F" // MEDIUM MATHEMATICAL SPACE 
                            + "\\u3000" // IDEOGRAPHIC SPACE 
            ;

Using a Pattern & Matcher and replaceAll method replaced all the above occurrences with " " (blank space). However Matcher.find() could not find any occurrence of the above characters.

I then tried replacing the occurrence of below with " ".

"\\r\\n|\\r|\\n"

The above mentioned issue was also observed when the text is a continuous string of 140 chars without spaces or when the text consists of a word that has a length of more than 50 chars. In such cases I tokenized the text with [\s+] as delimiter and inserted <br/>. The display format is as expected in both the cases after insertion of break tags.

I'm still wondering what could be the reason behind breaking of display format when the text is copy pasted. Any pointers please?

Solution

The only thing that seems to work is the insertion of <br/> tag within the text when the text content is bigger than certain length.