I'm using Visual Web Ripper to extract name and prices on products on a website.
When i extract the price from a table it comes in a form like this:
Kr. 129,30
I need to extract the 129,30, then turn the comma to a dot (129.30).
Visual Web Ripper can use scripts to modify the extracted content. It can use standard Regex, C# and VB.NET.
In the Regex tab I have found that
(\d+.)?(\d+)(.\d+)?
gives me 129,30, but then I can't change the comma into a dot.
Therefor I have to use C#. It comes with this standard script:
using System;
using VisualWebRipper.Internal.SimpleHtmlParser;
using VisualWebRipper;
public class Script
{
//See help for a definition of WrContentTransformationArguments.
public static string TransformContent(WrContentTransformationArguments args)
{
try
{
//Place your transformation code here.
//This example just returns the input data
return args.Content;
}
catch(Exception exp)
{
//Place error handling here
args.WriteDebug("Custom script error: " + exp.Message);
return "Custom script error";
}
}
}
How do I modify it to extract the number then replace the comma with a dot?
This is obviously Krona, so we should use the Swedish culture info to translate it. First we start with the input:
var original = "Kr. 129,30";
Get the culture:
using System.Globalization;
var culture = CultureInfo.GetCultureInfo("sv-SE");
This culture expects the currency string to be kr
(case insensitive) but we have Kr.
. So let's update it:
var format = (NumberFormatInfo)culture.NumberFormat.Clone();
format.CurrencySymbol = "Kr.";
And now the culture aware parse:
var number = Decimal.Parse(original, NumberStyles.Currency, format);
Now number
contains a decimal that has been parsed correctly.