Two years ago I asked the question
@Skin provided the answer and it continues to work flawlessly.
Recently however, I started a project to extract table data from some webpages. I created a new function and copied the code from the from the first function. The code that @Skin provided was looking for span
elements with an attribute of class="value"
in order to identify and spit the data.
I was able to modify the code to work with one of the tables I need to convert by removing one of the variables and changing the line var values = xmlRow.SelectNodes(".//span[@class]");
to var values = xmlRow.SelectNodes(".//td");
.
This worked for the table since it only had two values for each data set, Name and Start Date. Using a Compose Action I reworked the table to be very simple:
<table>
<tbody>
<tr>
<td>John Dodo 1</td>
<td>Some Date 1</td>
</tr>
<tr>
<td>John Dodo 2</td>
<td>Some Date 2</td>
</tr>
<tr>
<td>John Dodo 3</td>
<td>Some Date 3</td>
</tr>
</tbody>
</table>
The returned JSON is
{
"name": "John Dodo 1",
"startDate": "Some Date 1"
},
{
"name": "John Dodo 2",
"startDate": "Some Date 2"
},
{
"name": "John Dodo 3",
"startDate": "Some Date 3"
}
]
However, When a table has three values like Name, Company, Address, and I add the appropriate variable to the code, I get a server error and I can't figure out why. I could try and modify the table to add span
elements class="value"
attributes to match the table that @Skin helped with, but I would like to understand why its won't work.
Here is the modified code that won't work:
#r "Newtonsoft.Json"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using System.Collections.Generic;
using Newtonsoft.Json;
using System.Xml;
public class TableData
{
public string PmRep { get; set; }
public string PmComp { get; set; }
public string PmAddr { get; set; }
}
public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
var outputTable = new List<TableData>();
string requestBody = String.Empty;
using (StreamReader streamReader = new StreamReader(req.Body))
{
requestBody = await streamReader.ReadToEndAsync();
}
dynamic data = JsonConvert.DeserializeObject(requestBody);
string xmlString = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String((string)data?.Content));;
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xmlString);
var xmlRows = xmlDocument.DocumentElement.SelectNodes("//tr");
foreach (XmlNode xmlRow in xmlRows)
{
var values = xmlRow.SelectNodes(".//td");
if (values.Count > 0)
{
var row = new TableData()
{
PmRep = values[0].InnerText,
PmComp = values[1].InnerText
PmAddr = values[2].InnerText
};
outputTable.Add(row);
}
}
return new OkObjectResult(outputTable);
}
and here is the sample table:
<table>
<tbody>
<tr>
<td>Name 1</td>
<td>Comp 1</td>
<td>Address 1</td>
</tr>
<tr>
<td>Name 2</td>
<td>Comp 2</td>
<td>Address 2</td>
</tr>
<tr>
<td>Name 3</td>
<td>Comp 3</td>
<td>Address 3</td>
</tr>
</tbody>
</table>
You can simply use Azure Logic Apps itself as an alternative to azure function and, by using integration account and inline java script action:
Input:
<table>
<tbody>
<tr>
<td>Name</td>
<td>Company</td>
<td>Address</td>
</tr>
<tr>
<td>Riithwik</td>
<td>TextCom</td>
<td>Address x</td>
</tr>
<tr>
<td>Bojja</td>
<td>Test Com 2</td>
<td>Address y</td>
</tr>
<tr>
<td>Chotu</td>
<td>Test Com 3</td>
<td>Address z</td>
</tr>
</tbody>
</table>
Design:
Then:
Inline Script:
var rithinput = workflowContext.actions.Compose.outputs;
var rithtest = new RegExp("<tr>(.*?)</tr>", "gs");
var rithm = rithinput.match(rithtest);
var rithout = [];
var headerRowMatches = rithm[0].match(/<td>(.*?)<\/td>/gs);
var columnHeaders = headerRowMatches.map(cell => cell.replace(/<\/?td>/g, '').trim());
for (var i = 1; i < rithm.length; i++) {
var row = rithm[i];
var rb = row.match(/<td>(.*?)<\/td>/gs);
var rowData1 = {};
for (var j = 0; j < rb.length; j++) {
var cellData1 = rb[j].replace(/<\/?td>/g, '').trim();
rowData1[columnHeaders[j]] = cellData1;
}
rithout.push(rowData1);
}
return rithout;
Output:
This design works with 2 or 3 or n columns in a row, first row should be column names in input.