Problem:
My site allows users to copy/paste contents from other files/documents like MS Word and websites (eg CNN.com) into the Rich TextEditor we provide. This Rick TextEditor supports (and we too have to support) paste contents with embedded styles, this brings random styles, tags inline styles from content origin.
Eg: If you paste from any MS word document, it brings H1, H2, P, UL/OL/LI, STRONG, I, EM, TABLE
etc. with their own styles. Same happens when you copy paste from other webpages.
How To Format? I am looking for THE best way to handle the formatting of these kinds of user-generated contents. First, I need to keep the copied tags intact. Lets say, H1 was brought from user from MS Word - I have to keep this yet style on my own using given corporate branding.
Another problem is, when you copy/paste from external origin - some tags are not properly closed - this causes my layout break. How do we handle this?
For styles, m applying
.article * {
allKnownCSSProperties: myValues!important;
}
Any method would work. JavaScript, C# is preferred.
To strip out unwanted styles a simple regex would suffice. In Javascript:
/( style=['"][^'"]*['"])/g