I'm currently using find_by_xpath in splinter to retrieve all values of a table. It works great for getting all non-blank values and taking little time to do so. However, some cells of the table are blank and the following code is ignoring those cells. Also, I need a delimiter (perhaps a pipe - '|'?) between each value.
browser.find_by_xpath("//*[contains(text(),'Table of Data')]/..").value
Here's a sample result from the first row:
'col1 data col2 data col3 data'
What I need is this, because the 4th column (but sometimes other columns) has an empty cell:
'col1 data|col2 data|col3 data|""'
Thanks in advance!
HTML:
<td class="padtd" height="150" valign="top" width="75%" colspan="2">
<div class="headingSum">Table of Data </div>
<table style="width:100%;height=10;valign:top">
<tbody>
<tr>
<td height="15" width="50%" class="selTabSum">
<div>
<table style="width:100%;" valign="top">
<tbody>
<tr>
<td width="10%" class="tableheading">Column 1</td>
<td width="15%" class="tableheading">Column 2 </td>
<td width="25%" class="tableheading">Column 3 </td>
<td width="50%" class="tableheading">Column 4 </td>
</tr>
<tr>
<td width="10%" valign="top" class="tableCell"><a href=""><span class=“data” id="160042">col1 data</span></a></td>
<td width="15%" valign="top" class="tableCell">col2 data</td>
<td width="25%" valign="top" class="tableCell">col3 data</td>
<td width="50%" class="tableCell"></td>
</tr>
<tr>
<td width="10%" valign="top" class="tableCell"><a href=""><span class=“data” id="160042">col1 data</span></a></td>
<td width="15%" valign="top" class="tableCell">col2 data</td>
<td width="25%" valign="top" class="tableCell">col3 data</td>
<td width="50%" class="tableCell"></td>
</tr>
<tr>
<td width="10%" valign="top" class="tableCell"><a href=""><span class=“data” id="97851">col1 data</span></a></td>
<td width="15%" valign="top" class="tableCell">col2 data</td>
<td width="25%"
valign="top" class="tableCell">col3 data</td>
<td width="50%" class="tableCell">
col4 data
<table width="100%">
<tbody>
<tr></tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
I ended up getting the HTML from the table (via xpath) and passing it into pandas via pd.read_html.
import pandas as pd
from splinter import Browser
...
xp = "//*[contains(text(),'Table of Data')]/.."
df = pd.read_html(browser.find_by_xpath(xp).html)[1]