i am trying to parse some text from bibliographic database which contains not standard tables. specifications of articles may or may not exist, bu if exist they have same tags for their specifications. For example; all articles have title but only some of them have keywords section. but when they have that section it shown with standard tags like that:
<tr>
<td align="right" valign="top" nowrap="nowrap">Database Name: </td>
<td>Social Science Database</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Journal: </td>
<td>Social Science and Education, 2011,8(4):29-42</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Author: </td>
<td>James H.; Chaomei C.</td>
<td align="right" valign="top" nowrap="nowrap">Type: </td>
<td>Journal</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Article Type: </td>
<td>Research Article</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Retrieve Type: </td>
<td>Bibliographic</td>
</tr>
<tr><td align="right" valign="top" nowrap="nowrap">Language: </td>
<td>En</td>
</tr>
<tr>
<td align="right" valign="top" nowrap="nowrap">Abstract Language: </td>
<td>En</td>
</tr>
Here is my question. I am trying to parse text with Knime using Xpath but i couldn't achieve anything i want. I want to find <tr>
's that contains specific text and take second <td>
's of that section. For example:
for "Database Name:" Xpath must get "Social Science Database".
I tried this code:
.//dns:tr//text()[contains(., 'Database Name:')]
But result contains just first , i need second one.I tried to that code, but it brings nothing.
.//dns:tr//text()[contains(., 'Database Name:')]/dns:td[*]
You can try this:
.//dns:tr//text()[contains(., 'Database Name:')]/../../dns:td[2]
.. takes you to the parent. You need to traverse 2 levels up and get the 2nd td.