Initial string can be in many different formats. Can contain (or not) dollar sign, can have words like 'bn' (meaning billion), thousand(s), million(s), billion(s), can have decimal points and thousand separators (commas). In addition this string can contain additional text not directly connected to a value.
I'd like this string to be converted to integer.
For example the string US$ 2,864.773 million in 2014
should be converted to integer number 2864733000
.
If it possible to detect the currency name it would be just perfect!
Is there any out of the shelf solutions like PHP class or something?
I would use a regular expression, something like:
(\w*\$)\s*([0-9,.]+)\s+(thousand|million|billion|bn)?
which will capture both the currency and value. PHP:
if (preg_match('/(\w*\$)\s*([0-9,.]+)\s+(thousand|million|billion|bn)?/i', $input, $matches)) {
$currency = $matches[1];
$value = str_replace(',', $matches[2]);
$multiplier = null;
if (isset($matches[3])) {
$multiplier = $matches[3];
}
}
Explaining the regex a bit:
(\w*\$)
captures the currency / symbol
\s*
allows for any whitespace between the currency and value
([0-9,.]+)
captures the value
(thousand|million|billion|bn)?
captures million/billion, etc. and the ?
makes it optional.