I was using this (?<=alt)[\w\s\,\/\(\)\.]*
to extract the first alt text. This is great but there are multiple alt texts that I would like to extract.
I am using regex inside visual web ripper
The code I am extracting from is
<DIV id=ctl00_ContentRightColumn_CustomFunctionalityFieldControl1_ctl00_ctl00_woodFeatures class="woodFeaturesPanel woodFeaturesPanelSingle" sizcache="23614" sizset="0"><H2>Features:</H2> <DIV sizcache="23614" sizset="0"> <UL sizcache="23614" sizset="0"> <LI sizcache="23386" sizset="0"><IMG alt="Information board at site" src="/PublishingImages/icon_infoboard.gif"> <LI sizcache="20558" sizset="0"><IMG alt="Parking nearby" src="/PublishingImages/icon_carparknear.gif"> <LI sizcache="23614" sizset="0"><IMG alt=Grassland src="/PublishingImages/icon_grassland.giF"> <LI sizcache="17694" sizset="0"><IMG alt="Is woodland creation site" src="/PublishingImages/icon_woodlandcreation.gif"> <LI sizcache="21680" sizset="0"><IMG alt="Mainly broadleaved woodland" src="/PublishingImages/icon_mainlybroadleaved.gif"> <LI sizcache="20704" sizset="0"><IMG alt="Mainly young woodland" src="/PublishingImages/icon_mainlyyoung.gif"> <LI> <LI></LI></UL></DIV></DIV>
Without the language this is difficult to say, but using memory patterns you can capture what you need:
/alt=(\w\S*|"([^"]*)")/
Using preg_match_all()
it gives the following results:
Array
(
[0] => Array
(
[0] => alt="Information board at site"
[1] => alt="Parking nearby"
[2] => alt=Grassland
[3] => alt="Is woodland creation site"
[4] => alt="Mainly broadleaved woodland"
[5] => alt="Mainly young woodland"
)
[1] => Array
(
[0] => "Information board at site"
[1] => "Parking nearby"
[2] => Grassland
[3] => "Is woodland creation site"
[4] => "Mainly broadleaved woodland"
[5] => "Mainly young woodland"
)
[2] => Array
(
[0] => Information board at site
[1] => Parking nearby
[2] =>
[3] => Is woodland creation site
[4] => Mainly broadleaved woodland
[5] => Mainly young woodland
)
)
The second memory pattern is for double quote enclosed strings; if empty, you should look at the first memory pattern instead.