I am trying to scrape a site but the results returned for just the links is different from when I inspect it with the browser.
In my browser I get normal links but all the a HREF links all become javascript:void(0);
from Nokogiri.
Here is the site:
https://www.ctgoodjobs.hk/jobs/part-time
Here is my code:
url = "https://www.ctgoodjobs.hk/jobs/part-time"
response = open(url) rescue nil
next unless response
doc = Nokogiri::HTML(open(url))
links = doc.search('.job-title > a').text
is not that easy, urls are "obscured" using a js function, that's why you're getting javascript: void(0)
when asking for the hrefs... looking at the html, there are some hidden inputs for each link, and, there is a preview url that you can use to build the job preview url (if that's what you're looking for), so you have this:
<div class="result-list-job current-view">
<input type="hidden" name="job_id" value="04375145">
<input type="hidden" name="each_job_title_url" value="barista-senior-barista-咖啡調配員">
<h2 class="job-title"><a href="javascript:void(0);">Barista/ Senior Barista 咖 啡 調 配 員</a></h2>
<h3 class="job-company"><a href="/company-jobs/pacific-coffee-company/00028652" target="_blank">PACIFIC COFFEE CO. LTD.</a></h3>
<div class="job-description">
<ul class="job-desc-list clearfix">
<li class="job-desc-loc job-desc-small-icon">-</li>
<li class="job-desc-work-exp">0-1 yr(s)</li>
<li class="job-desc-salary job-desc-small-icon">-</li>
<li class="job-desc-post-date">09/11/16</li>
</ul>
</div>
<a class="job-save-btn" title="save this job" style="display: inline;"> </a>
<div class="job-batch-apply"><span class="checkbox" style="background-position: 0px 0px;"></span><input type="checkbox" class="styled" name="job_checkbox" value="04375145"></div>
<div class="job-cat job-cat-de"></div>
</div>
then, you can retrieve each job_id from those inputs, like:
inputs = doc.search('//input[@name="job_id"]')
and then build the urls (i found the base url at joblist_preview.js
:
urls = inputs.map do |input|
"https://www.ctgoodjobs.hk/english/jobdetails/details.asp?m_jobid=#{input['value']}&joblistmode=previewlist&ga_channel=ct"
end