I am working on scraping and then parsing an HTML string to get the two URL parameters inside the href. After scraping the element I need, $description, the full string ready for parsing is:
<a target="_blank" href="CoverSheet.aspx?ItemID=18833&MeetingID=773">Description</a><br>
Below I use the explode parameter to split the $description variable string based on the = delimiter. I then further explode based on the double quote delimiter.
Problem I need to solve: I want to only print the numbers for MeetingID parameter before the double quote, "773".
<?php
echo "Description is: " . htmlentities($description); // prints the string referenced above
$htarray = explode('=', $description); // explode the $description string which includes the link. ... then, find out where the MeetingID is located
echo $htarray[4] . "<br>"; // this will print the string which includes the meeting ID: "773">Description</a><br>"
$meetingID = $htarray[4];
echo "Meeting ID is " . substr($meetingID,0,3);
?>
The above echo statement using substr works to print the meeting ID, 773.
However, I want to make this bulletproof in the event MeetingID parameter exceeds 999, then we would need 4 characters. So that's why I want to delimit it by the double quotes, so it prints all numbers before the double quotes.
I try below to isolate all of the amount before the double quotes... but it isn't seeming to work correctly yet.
<?php
$htarray = explode('"', $meetingID); // split the $meetingID string based on the " delimiter
echo "Meeting ID0 is " . $meetingID[0] ; // this prints just the first number, 7
echo "Meeting ID1 is " . $meetingID[1] ; // this prints just the second number, 7
echo "Meeting ID2 is " . $meetingID[2] ; // this prints just the third number, 3
?>
Question, why is the array $meetingID[0] not printing the THREE numbers before the delimiter, ", but rather just printing a single number? If the explode function works properly, shouldn't it be splitting the string referenced above based on the double quotes, into just two elements? The string is
"773">Description</a><br>"
So I can't understand why when echoing after the explode with double quote delimiter, it's only printing one number at a time..
The reason you're getting the wrong response is because you're using the wrong variable.
$htarray = explode('"', $meetingID);
echo "Meeting ID0 is " . $meetingID[0] ; // this prints just the first number, 7
echo "Meeting ID1 is " . $meetingID[1] ; // this prints just the second number, 7
echo "Meeting ID2 is " . $meetingID[2] ; // this prints just the third number, 3
echo "Meeting ID is " . $htarray[0] ; // this prints 773
There's an easier way to do this though, using regular expressions:
$description = '<a target="_blank" href="CoverSheet.aspx?ItemID=18833&MeetingID=773">Description</a><br>';
$meetingID = "Not found";
if (preg_match('/MeetingID=([0-9]+)/', $description, $matches)) {
$meetingID = $matches[1];
}
echo "Meeting ID is " . $meetingID;
// this prints 773 or Not found if $description does not contain a (numeric) MeetingID value