Search code examples
objective-ccocoamacososx-snow-leopardnsxmldocument

Manipulating HTML


I need to read a HTML file and search for some tags in it. Based on the results, some tags would need to be removed, other ones changed and maybe refining some attributes — to then write the file back.

Is NSXMLDocument the way to go? I don't think that a parser is really needed in this case, it could even mean more work. And I don't want to touch the entire file, all I need to do is to load the file in memory, change some things, and save it again.

Note that, I'll be dealing with HTML, and not XHTML. Could that be a problem for NSXMLDocument? Maybe some unmatched tags or un-closed ones could make it stop working.


Solution

  • NSXMLDocument is the way to go. That way you can use Xpath/Xquery to find the tags you want. Bad HTML might be a problem but you can set NSXMLDocumentTidyHTML and it should be OK unless it's really bad.