So, I have a table with multiple rows and columns.
<table>
<tr>
<th>Employee Name</th>
<th>Reg Hours</th>
<th>OT Hours</th>
</tr>
<tr>
<td>Employee 1</td>
<td>10</td>
<td>20</td>
</tr>
<tr>
<td>Employee 2</td>
<td>5</td>
<td>10</td>
</tr>
</table>
There is also another table:
<table>
<tr>
<th>Employee Name</th>
<th>Revenue</th>
</tr>
<td>Employee 2</td>
<td>$10</td>
</tr>
<tr>
<td>Employee 1</td>
<td>$50</td>
</tr>
</table>
Notice that the employee order may be random between the tables.
How can I use nokogiri to create a json file that has each employee as an object, with their total hours and revenue?
Currently, I'm able to just get the individual table cells with some xpath. For example:
puts page.xpath(".//*[@id='UC255_tblSummary']/tbody/tr[2]/td[1]/text()").inner_text
Edit:
Using the page-object gem and the link from @Dave_McNulla, I tried this piece of code just to see what I get:
class MyPage
include PageObject
table(:report, :id => 'UC255_tblSummary')
def get_some_information
report_element[1][2].text
end
end
puts get_some_information
Nothing's being returned, however.
Data: https://gist.github.com/anonymous/d8cc0524160d7d03d37b
There's a duplicate of the hours table. The first one is fine. The other table needed is the accessory revenue table. (I'll also need the activations table, but I'll try to merge that from the code that merges the hours and accessory revenue tables.
I think the general approach is:
Create a hash for each table where the key is the employee
This part you can do in Watir or Nokogiri. It only makes sense to use Nokogiri if Watir is giving poor performance due large tables.
Watir:
#I assume you would have a better way to identify the tables than by index
hours_table = browser.table(:index, 0)
wage_table = browser.table(:index, 1)
#Turn the tables into a hash
employee_hours = {}
hours_table.trs.drop(1).each do |tr|
tds = tr.tds
employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}
employee_wage = {}
wage_table.trs.drop(1).each do |tr|
tds = tr.tds
employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}
Nokogiri:
page = Nokogiri::HTML.parse(browser.html)
hours_table = page.search('table')[0]
wage_table = page.search('table')[1]
employee_hours = {}
hours_table.search('tr').drop(1).each do |tr|
tds = tr.search('td')
employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}
employee_wage = {}
wage_table.search('tr').drop(1).each do |tr|
tds = tr.search('td')
employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}
Merge the results from both tables together
You want to merge the two hashes together so that for a specific employee, the hash will include their hours as well as their revenue.
employee = employee_hours.merge(employee_wage){ |key, old, new| new.merge(old) }
#=> {"Employee 1"=>{"Revenue"=>"$50", "Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Revenue"=>"$10", "Reg Hours"=>"5", "OT Hours"=>"10"}}
Convert to JSON
Based on this previous question, you can then convert the hash to json.
require 'json'
employee.to_json