I scraped some html contents from internet, below is only a beginning part of it,
<p style="max-width: 100%;min-height: 1em;letter-spacing: 0.544px;text-align: center;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;font-size: 24px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(255, 41, 65);box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;color: rgb(0, 0, 0);font-size: 18px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;font-size: 24px;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.544px;color: rgb(61, 167, 66);box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;word-wrap: break-word !important;">...
I am using
body_html=bleach.clean(markdown(value, output_format='html'),tags=['SOME_ALLOWED_TAGS'] ,attributes=['SOME_ALLOWED_ATTRIBUTES'],styles=['SOME_ALLOWED_STYLES'],strip=True,strip_comments=True)
but the return is not what I expected,
<pre><code> <p style="max-width: 100%;min-height: 1em;letter-spacing: 0.544px;text-align: center;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;font-size: 24px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(255, 41, 65);box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;color: rgb(0, 0, 0);font-size: 18px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;font-size: 24px;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><strong style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;word-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box
what is wrong with bleach clean? is it because I have too many tags and styles to be cleaned so it just added "<pre><code>
" at the beginning and closed it at the end?
Figured Out. It is because the content to be cleaned contains \n \n\n \n\n \n \n
at the beginning. Should remove those first.