Search code examples
rebolrebol3red-lang

parse html with single or double quotes


When using the parse dialect, how to parse tags that have properties enclosed by ' or '"`, as in:

thru <h2 class="txt-medium txt-bold">

thru <h2 class='txt-medium txt-bold'>

One way was to do:

thru {<h2 class=} thru {txt-medium txt-bold} thru ">"

Tried to use the | or operator but with no success. Can I use the | operator to parse the tag?


Solution

  • Yes, you can use | operator, but defining a charset is better in this case:

    delimiter: charset [#"^"" #"'"]
    single: {<h2 class='txt-medium txt-bold'>}
    double: {<h2 class="txt-medium txt-bold">}
    
    >> parse single [thru "class=" delimiter copy values to delimiter thru ">"] values
    == "txt-medium txt-bold"
    
    >> parse double [thru "class=" delimiter copy values to delimiter thru ">"] values 
    == "txt-medium txt-bold"
    

    The golden rule is to avoid to and thru when possible and define what to match.