Search code examples
javascriptphpregexparsingstring-parsing

Parse sentences and extract conversion values


I have a set of sentences giving conversion ratios, such as

  • 10,000 something for ∫1
  • ∫1 for 10k SMTH
  • 1200 Something for ∫0.1
  • Selling 3000 Smth for 3∫

All of these sentences show ratios of the fictional currency something (SMTH) for the fictional unit of ∫ (INTEGRAL). I need some way of extracting the conversion ratios between these two units. The difficulty is that numbers can be formatted different ways (10,000 or 10000 or 10k), units can be written differently (something, SMTH and different capitalization), the order of units is different ("x SMTH for ∫x" or "∫x for x SMTH"), and sometimes units are written as ∫x or x∫.

TL;DR: Somehow format the above strings into mathematical relationships, but paying attention to many different formats.

I know this is a lot to ask and it is quite complicated. If there is a similar question out already, I would gladly look at it.

What language you ask? Preferably PHP or JS, but pseudo-code is a good start

EDIT:

var val = get sentence,
    integral,
    something;
val = val.replace(",", "").replace("k ", "000 ").replace("m ", "000000 ").replace("million ", "000000 ").replace(" million ", "000000 ").replace(" something", "SMTH").replace(" smth", "SMTH");
words = val.split(" ");
for (var i = 0; i < words.length; i++) {
  if (words[i].indexOf("$")!==-1) {
    integral = words[i].replace("∫" , "");
  } else if (words[i].indexOf("SMTH")!==-1) {
    something = words[i].replace("SMTH" , "");
  }
}

Simplified javascript/Pseudo-code


Solution

  • All examples you have separate the conversion using "for". So there aren't that many combination. What you can do is have a list of words that identify each currency, a regular expression that matches numbers and then you'll have a left side and a right side separated by "for". To process each phrase you would execute the following pseudo code:

    for each word:
        if it's a known currency identifier
            Store what is the currency
        else if it's a number
            Store the value
        else if it's the "for" word
            Change side
        end if
    end for
    

    After you finish this loop you'll have a data structure with what currency you have on each side and what amount.