Search code examples

Regular expression alternative for scraping certain syntax inside html

I have functions and placed inside of html code. These functions has this following syntax rules:

  1. There is '#' symbol as an opened tag
  2. There is a function name after the opened '#' tag. The function name can contain number (1,2,3),alphabet (a,b,c), and underscore (_).
  3. After function name, there is a pair of brackets contain of paramater. The paramater can contain anything including alphanumeric, arithmetic operator (<,>,=,!), and this: @,#,$,%,^,&,(,),?,*,/,[,]
  4. After the parameter, there is html code which is put inside of curly bracket.
  5. Finaly the function closed using '#' tag.

This is not my real function but it give the whole ideas of rules above:


All this time, I'm using this regex to capture the all of the function names, parameters, and the html strings inside functions:


This is the result:

enter image description here

You can see the detail in regex tester:

Currently I found that preg_match_all function in php does not work for a long string. Thus, I cannot use this regex if the html code inside the function is too long. I need to capture the function name, function parameter, and html string inside the function. Is there any alternative for this regex? Maybe using PHP file function like substr, strpos, etc?


  • Here is an improvement of your regex, a little bit more efficient:


    Demo & Explanation