So I've seen questions asked before that are along the lines of finding the maximum occurence of a string within a file but all of those rely on knowing what to look for.
I have what you might almost call a flat file database that grabs a bunch of input data and basically wraps different parts of it in html span tags with referencing ids.
Each line comes out in this kind of fashion:
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>some text</span>
<span class='effect1'> and caused seizures </span>
<span class='time'>23:47</span>
</p>
How would I then go about finding the #test contents that occurs the most times.
i.e if I had
<p>
<span class="ip">58.106.**.***</span>
Wrote <span id='text'>woof</span>
<span class='effect1'> and caused seizures </span>
<span class='time'>23:47</span>
</p>
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>meow</span>
<span class='effect1'> and caused mind-splosion </span>
<span class='time'>23:47</span>
</p>
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>meow</span>
<span class='effect1'> and used no effect </span>
<span class='time'>23:47</span>
</p>
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>meow</span>
<span class='effect1'> and used no effect </span>
<span class='time'>23:47</span>
</p>
the output would be 'meow'.
How would I accomplish this in php?
First off: Your format is not conducive to this type of data manipulation; you might want to consider changing it.
That said, based on this structure the logical solution would be to leverage DOMXPath
as Dani says. This could have been problematic because of all the duplicate id
s in there, but in practice it works (after emitting a boatload of warnings, which is one more reason that the data structure affords revision).
Here's some code to go with the idea:
$input = '<body>'.get_input().'</body>';
$doc = new DOMDocument;
$doc->loadHTML($input); // lots of warnings, duplicate ids!
$xpath = new DOMXPath($doc);
$result = $xpath->query("//*[@id='text']/text()");
$occurrences = array();
foreach ($result as $item) {
if (!isset($occurrences[$item->wholeText])) {
$occurrences[$item->wholeText] = 0;
}
$occurrences[$item->wholeText]++;
}
// Sort the results and produce final answer
arsort($occurrences);
reset($occurrences);
echo "The most common text is '".key($occurrences).
"', which occurs ".current($occurrences)." times.";
Update (seeing as you fixed the duplicate id issue): You would simply change the xpath query to "//*[@class='text']/text()"
so that it continues to match. However this way of doing things remains inefficient, so if one or more of these apply:
then changing the data format is a good idea.