I am trying to parse some not-complicated RSS html content in iphone.
So I don't need a heavy HTML parser.
I have searched here and found these two:
https://github.com/topfunky/hpple
https://github.com/zootreeves/Objective-C-HMTL-Parser
Both are simple to use. But I guess they have their problems for my purpose.
For TFHpple, it is good, but for every element, it does not have the complete HTML <> with itself. for example, element doesn't have this complete tag string. I need this complete tag string, because I need to remove it from the whole HTML string. I would be more convenient for me if element has that.
For zootreeves HTML-Parser, it is also simple and good. And it has the complete tag string with every element. I am very happy. However, it seems to be a big memory-comsumer. I monitored it. If I try to parse a big number of HTML fragments (say, 1000), the memory it will cost and stays occupied is like 40MB. It is not applicable for ios devices. zootreeves is using pure C codes and linked-list to organise the tree structures of the HTML, I guess. and it uses pure malloc and free for memory. I don't know whether that will affect ios memory.
So, anyone can recommend a state-of-art better and fast and simple HTML parser for iOs for me?
Thanks
I'd use libxml2. It's not just for xml; it has an HTML parser too. It's fast and low-memory and is available in iOS. The only drawback is that it's a C-based API, but for all that it's not terribly difficult to work with.
Update
In response to the first comment below: It's been awhile, so I'm not sure, but I don't think so. What you get is a data structure with lots of information about the document structure, and each tag has a list of attribute/value pairs. Nowhere is the original html string stored (I presume that this is considered redundant and is not done to save memory).
However, it doesn't seem like you actually need it for what you want to do. It seems to me that you are using information from the parser to modify the original string, stripping out HTML tags. What you want to do instead is to rebuild the document using information from the parse tree, and when you do this, leave out the tags you want omitted.