Search code examples
phparrayshtml-parsinghtml-selecttext-extraction

Get all <option> text values from an HTML document


I have HTML text in the below format:

<option value="http://www.torontoairportlimoflatrate.com/aurora-limousine-service.html">Aurora</option>
<option value="http://www.torontoairportlimoflatrate.com/alexandria-limousine-service.html">Alexandria</option>

I have tried:

preg_match_all("#>\w*#",$data,$result);

This returns the results as below

Array
(
    [0] => Array
        (
            [0] => >Ajax
            [1] => >
            [2] => >Aurora
            [3] => >
            [4] => >Alexandria
            [5] => >
            [6] => >Alliston

I only want a flat array of option text values (cities)

[0] => Ajax
[1] => Aurora
...

Solution

  • If you'd prefer not to use an HTML parser, you can do it with a regex, but keep in mind that you'll probably need to modify it based on what you'll receive as input in the future. For your specific problem, this is a regex that does the job:

    <?php
        preg_match_all('/<option\svalue=\"([a-zA-Z0-9-_.\/:]+)\">([a-zA-Z\s]+)<\/option>/', $data, $result);
    
        var_dump($result[2]);
    

    Note:

    If you want to match every url you should replace ([a-zA-Z0-9-_.\/:]+) with a more capable url matching regex. You can find some on StackOverflow also, but for me is a matter of personal taste.