I'm receiving XML files which may not be well-formed, in which case I need to ignore them.
I'm using SweetXml which wraps xmerl.
I have example badly formed XML which doesn't have a space between two attributes.
There is no is_well_formed function - one with a simple boolean response would be great.
Xmerl attempts to parse the file, doesn't like it, and so sends an exit.
I haven't yet learnt about supervisors, but this looks to me like a case for them.
Is there a rookie or simple way of handling that exit signal?
defmodule XmlIsWellFormed.WellFormed do
def is_well_formed(xml) do
import SweetXml
xml_string = to_string xml
result = xml_string |> parse # parse sends exit.
# FYI - SweetXml.parse :
# def parse(doc) do
# {parsed_doc, _} = :xmerl_scan.string(doc)
# parsed_doc
# end
# Note: inspecting result is no use because xmerl sends an exit with:
# "whitespace_required_between_attributes"
# Something like this would be handy:
# try do
# result = :xmerl_scan.string(xml)
# rescue
# :exit, _ -> nil
# end
end
end
rubbish_xml = '<rubbishml><html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml"></rubbishml>'
XmlIsWellFormed.WellFormed.is_well_formed rubbish_xml
You used try/rescue
, which only intercepts exceptions. Exits, on the other hand, can be intercepted with the try/catch
construct:
def is_well_formed(xml) do
try do
xml |> to_string |> parse
true
catch
:exit, _ -> false
end
end
IEX will print the exit message to the console, but the program will continue to execute:
iex> XmlIsWellFormed.WellFormed.is_well_formed ~s(<a b=""c=""/>)
3437- fatal: {whitespace_required_between_attributes}
false
iex> XmlIsWellFormed.WellFormed.is_well_formed ~s(<a b="" c=""/>)
true
However, catch
ing and rescue
ing exceptions is very uncommon in Elixir. You should rather design your application with a supervision tree, so that it knows how to respawn itself properly. Then you can just let it crash™, and the supervisor will take care of the rest.