Search code examples
pythonbeautifulsoupdata-extraction

Python/ Beautiful Soap Data Extract Issue for Non Classed Items


I am trying to extract some data from the website. But the source of the website does not have classes for each item. I need the price quantitiy and size of the products. Can you please guide me to find a solution for my problem?

I though that I can use the scroll menu to extract data for each products.Because that is the only class that I saw on the source of the page. To sum up, I need to get data named as data-comprice data-quantity, and data-size. But could not find a solution yet. I am sharing my basic code and a part of the source page. Thanks in advance!

Source:

 <div class="scrollmenu">
               
               

                
    
  <div data-value="2&#39; x 3&#39;" class="swatch-element 2-x-3 soldout ">
         <input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2&#39; x 3&#39;" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
        <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284839506">
          <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> $75.01  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> $30.00 </p>
        </label>
      </div>

 
    
    
    
              
                
    
  <div data-value="2&#39;7&quot; x 7&#39;3&quot;" class="swatch-element 27-x-73 soldout ">
         <input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2&#39;7&quot; x 7&#39;3&quot;" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
        <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284872274">
          <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> $134.81  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> $53.92 </p>
        </label>
      </div>

 

My initial code block:

from bs4 import BeautifulSoup
import requests
import pandas as pd

webpage = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')

sp = BeautifulSoup(webpage.content, 'html.parser')

for datapage in sp.find('div',attrs={'class':'scrollmenu'}):
   
    
 
  Result=print (datapage)
  
  type(Result)

Solution

  • You can use find_all method on input tag to get attribute from tag and for that .get() method is used

    from bs4 import BeautifulSoup
    html=""" <div class="scrollmenu">
                
      <div data-value="2&#39; x 3&#39;" class="swatch-element 2-x-3 soldout ">
             <input data-comprice="75.01" data-curprice="30.00" data-size="2' x 3'" data-quantity="0" data-sku="AAAA0536-EPERNAY-23" data-price="30.00" data-title="2&#39; x 3&#39;" type="radio" name="id" value="31781284839506" id="radio_31781284839506"/>
            <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284839506">
              <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2' x 3'</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> $75.01  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> $30.00 </p>
            </label>
          </div>
      <div data-value="2&#39;7&quot; x 7&#39;3&quot;" class="swatch-element 27-x-73 soldout ">
             <input data-comprice="134.81" data-curprice="53.92" data-size="2'7" x 7'3"" data-quantity="0" data-sku="AAAA0536-EPERNAY-2773" data-price="53.92" data-title="2&#39;7&quot; x 7&#39;3&quot;" type="radio" name="id" value="31781284872274" id="radio_31781284872274"/>
            <label style="height:75px!important; min-width:135px!important; padding: 0 0px!important;"  for="radio_31781284872274">
              <p style="color: black; margin-bottom:0; font-size:15px; font-weight: bold;"> 2'7" x 7'3"</p> <br> <p style="color: #535258; margin-bottom:0; margin-top:-45px; text-decoration:line-through;"> $134.81  </p> <br> <p style="margin-top:-48px; margin-bottom:2px; color:#584c98; font-weight:bold; font-size: 20px;"> $53.92 </p>
            </label>
          </div>
    
     """
    soup=BeautifulSoup(html,"html.parser")
    inps=soup.find("div",class_="scrollmenu").find_all("input")
    for inp in inps:
        print(inp)
        # inp['data-comprice'] you can also use this
        print(inp.get("data-comprice"))
        print(inp.get("data-curprice"))
        print(inp.get("data-quantity"))
        print(inp.get("data-size"))
    

    Output:

    <input data-comprice="75.01" data-curprice="30.00" data-price="30.00" data-quantity="0" data-size="2' x 3'" data-sku="AAAA0536-EPERNAY-23" data-title="2' x 3'" id="radio_31781284839506" name="id" type="radio" value="31781284839506"/>
        75.01
        30.00
        0
        2' x 3'
    <input 7'3""="" data-comprice="134.81" data-curprice="53.92" data-price="53.92" data-quantity="0" data-size="2'7" data-sku="AAAA0536-EPERNAY-2773" data-title="2'7&quot; x 7'3&quot;" id="radio_31781284872274" name="id" type="radio" value="31781284872274" x=""/>
            134.81
            53.92
            0
            2'7
    

    For website:

    from bs4 import BeautifulSoup
    import requests
    html = requests.get('https://markandday.com/products/epernay-cottage-denim-rug')
    soup=BeautifulSoup(html.text,"html.parser")
    inps=soup.find("div",class_="scrollmenu").find_all("input")
    for inp in inps:
        print(inp.get("data-comprice"))
        print(inp.get("data-curprice"))
        print(inp.get("data-quantity"))
        print(inp.get("data-size"))