I have a scraped dataset that contains a column of data like below:
<td>1,968</td>
<td>185</td>
<td>1,285<sup id="cite_ref-4" class="reference"><a href="#cite_note-4">[4]</a></sup></td>
I am using Alteryx to process the data and I want to use regex to extract the number between the html tags <td>
and </td>
. So in the above case, I am supposed to get back 1968, 185 and 1285. I tried the following regular expressions, but neither worked using this tester. I believe the version of regex should be R for Alteryx, but not sure.
>([0-9]+)<
>[0-9]+<
Can someone please shed some light on this? Thanks!
An alternate Alteryx approach: use a Formula tool to remove <td>
as well as commas and spaces, then use a Select tool to cast what remains to the numeric type of your choice... it will automatically take everything up to the first non-numeric character.