Search code examples
iosswiftuikit

HTML text to plain text Swift


I am working on an app that has lots of messages in HTML format. I am using this to parse the HTML to Attributed String, but it is very slow when we have lots of long messages.

  func attributedStringFromHtml(completion: @escaping (NSAttributedString?) ->()) {

        guard let data = self.data(using: String.Encoding.utf8, allowLossyConversion: true) else {
            return completion(nil)
        }
        
        let options: [NSAttributedString.DocumentReadingOptionKey : Any] = [.documentType: NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue]
        
        DispatchQueue.main.async {
            if let attributedString =
                try? NSAttributedString(data: data, options: options, documentAttributes: nil)
            {
                completion(attributedString)
            } else {
                completion(nil)
            }
        }
    }

Can you help me with something that can parse the HTML to simple text very fast?

Thanks!


Solution

  • To convert HTML text to plain text Swift, fast, try this approach, where the html tags are removed, to give you plain text. This approach is very very fast.

    With this approach of course, you don't get the string attributes.

    extension String {
        func toPlainText() -> String {
            self.replacingOccurrences(of: #"<[^>]+>"#, with: "", options: .regularExpression)
        }
    }
    
    struct ContentView: View {
        let msg = """
     <p><b>E pluribus unum</b></p><b>Instructions.</b> Latin for “Out of many one”, is a motto requested by <i>Pierre Eugene du Simitiere</i> (originally Pierre-Eugène Ducimetière) and found in 1776 on the Seal of the United States, along with Annuit cœptis and Novus ordo seclorum, and adopted by an Act of Congress in 1782.</p><p>
     """
        
        var body: some View {
            Text(msg.toPlainText())
        }
    }