I have a set of sentences giving conversion ratios, such as
All of these sentences show ratios of the fictional currency something (SMTH) for the fictional unit of ∫ (INTEGRAL). I need some way of extracting the conversion ratios between these two units. The difficulty is that numbers can be formatted different ways (10,000 or 10000 or 10k), units can be written differently (something, SMTH and different capitalization), the order of units is different ("x SMTH for ∫x" or "∫x for x SMTH"), and sometimes units are written as ∫x or x∫.
TL;DR: Somehow format the above strings into mathematical relationships, but paying attention to many different formats.
I know this is a lot to ask and it is quite complicated. If there is a similar question out already, I would gladly look at it.
What language you ask? Preferably PHP or JS, but pseudo-code is a good start
EDIT:
var val = get sentence,
integral,
something;
val = val.replace(",", "").replace("k ", "000 ").replace("m ", "000000 ").replace("million ", "000000 ").replace(" million ", "000000 ").replace(" something", "SMTH").replace(" smth", "SMTH");
words = val.split(" ");
for (var i = 0; i < words.length; i++) {
if (words[i].indexOf("$")!==-1) {
integral = words[i].replace("∫" , "");
} else if (words[i].indexOf("SMTH")!==-1) {
something = words[i].replace("SMTH" , "");
}
}
Simplified javascript/Pseudo-code
All examples you have separate the conversion using "for". So there aren't that many combination. What you can do is have a list of words that identify each currency, a regular expression that matches numbers and then you'll have a left side and a right side separated by "for". To process each phrase you would execute the following pseudo code:
for each word:
if it's a known currency identifier
Store what is the currency
else if it's a number
Store the value
else if it's the "for" word
Change side
end if
end for
After you finish this loop you'll have a data structure with what currency you have on each side and what amount.