Search code examples
phphtmlparsingsimple-html-dom

get the all child in the a p tag with class using php parser


i am using the simple php parser ,there is lot of div with class faq-d-item-content easeParent and inside it there is a question with h5 tag.also a p tag with class faq-popup.the answer of the question is not in uniform class or tag.some answers are in div tag some with ul inside a p tag with class faq-popup. the problem is I am getting all questions but can't getting full answers due to usage of "&nbsp" inside the answers.

This is example HTML.

<div class="faq-d-item-content easeParent">
    <h5>What are the KYC requirements for opening a bank account?</h5>
    <p class="faq-popup">
        <p> To open a bank account, one needs to submit a ‘proof of identity and proof of address' together with a recent photograph.</p>
    </p>
</div>
<div class="faq-d-item-content easeParent">
    <h5>What are the documents to be given as ‘proof of identity' and ‘proof of address'?</h5>
    <p class="faq-popup">
        <p> The Government of India has notified six documents as ‘Officially Valid Documents (OVDs) for the purpose of producing proof of identity. These six documents are Passport, Driving Licence, Voters' Identity Card, PAN Card, Aadhaar Card issued by UIDAI and NREGA Card. You need to submit any one of these documents as proof of identity. If these documents also contain your address details, then it would be accepted as ‘proof of address'. If the document submitted by you for proof of identity does not contain address details, then you will have to submit another officially valid document which contains address details.</p>
    </p>
</div>
<div class="faq-d-item-content easeParent">
    <h5>If I do not have any of the documents listed above to show my ‘proof of identity', can I still open a bank account?</h5>
    <p class="faq-popup">
        <p> Yes. You can still open a bank account known as ‘Small Account' by submitting your recent photograph and putting your signature or thumb impression in the presence of the bank official.</p>
    </p>
</div>
<div class="faq-d-item-content easeParent">
    <h5>Is there any difference between such ‘small accounts' and other accounts ?</h5>
    <p class="faq-popup">
        <p> Yes. The ‘Small Accounts' have certain limitations such as:</p> 
        <p style="margin-left:1.0cm;"> ·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; balance in such accounts at any point of time should not exceed Rs.50,000</p> <p style="margin-left:1.0cm;"> ·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; total credits in one year should not exceed Rs.1,00,000</p> <p style="margin-left:1.0cm;"> ·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; total withdrawal and transfers should not exceed Rs.10,000 in a month.</p> <p style="margin-left:1.0cm;"> ·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Foreign remittances cannot be credited to such accounts.</p> <p> Such accounts remain operational initially for a period of twelve months and thereafter, for a further period of twelve months, if the holder of such an account provides evidence to the bank of having applied for any of the officially valid documents within twelve months of the opening of such account.&nbsp;</p>
    </p>
</div>    

php code is

$html = file_get_html(url );
$content = $html->find( 'div[class=faq-d-item]',0)
$questions = $content->find( 'div[class=faq-d-item-content easeParent] h5' );
foreach($questions as $k => $question){
   $question_array[] = $question->innertext;
}
 $answers = $html->find( 'div[class=faq-d-item-content easeParent] p[class=faq-popup]');
foreach ($answers as $ky => $node){
            $answer_array[] =$node->plaintext;
        }

i want to get all child in a p tag with class faq-popup as an array. also need a solution for null value return when "&nbsp" get parsed.


Solution

  • on getting question removed the h5 tag and get content of its parent as html then strip the result.

    PHP CODE CHANGED as

    $html = file_get_html(url );
    $content = $html->find( 'div[class=faq-d-item]',0)
    $questions = $content->find( 'div[class=faq-d-item-content easeParent] h5' );
    foreach($questions as $k => $question){
       $question_array[] = $question->innertext;
    }    
    $questions1 = $content->find( 'div[class=faq-d-item-content easeParent] h5' );
    
        foreach($questions1 as $qs) {
    
            $qs->innertext = '';
    
        }
    $answers = $content->find( 'div[class=faq-d-item-content easeParent]' );
        foreach ( $answers as $ky => $answer ) {
    
            if(trim($answer->innertext) == '&nbsp') {
                $answer->outertext = '';
            }
     $stripped = trim(preg_replace('/\s+/', ' ', strip_tags($answer)));    
            $answer_array[] =$stripped ;