Search code examples
asp.netarraysregexstringstr-replace

Correctly extracting strings from a sentence


I've been working on this problem for a while now, and I hope I could get some resolution to this.

I have a sentence that contains key information that I would like extracted.

Once I extract these strings, I will save them as an array and then compare those strings to what exists in my db.

What I'm having issues is successfully extracting the strings.

For example, I have this sentence:

These are a list of automobiles along with their production dates: AC 3000ME (1979-1984), AC Shelby-Cobra (1961-2004), AC Frua (1965-1973).

I would like to extract: 3000ME, Shelby-Cobra, and Frua.

Here is the code below:

 public string CarModelMethod()
        {
            string sentence = "These are a list of automobiles along with their production dates: AC 3000ME (1979-1984), AC Shelby-Cobra (1961-2004), AC Frua (1965-1973)";
            string[] array = sentence.Split(',');
            CarModel carModel = new CarModel();
            foreach(var item in array)
            {
                var carDataSource = _context.CarTable.Where(x => EF.Functions.Like(x.CarModelName, $"%{item}%")).Select(x => x.CarId);
                foreach(var id in carDataSource)
                {
                    carModel.CarId = id;
                    _context.CarModel.Add(carModel);
                    _context.SaveChanges();
                }
            }
            return null;
        }

Solution

  • You may use

    var results = Regex.Matches(text, @"\bAC\s+(.*?)\s+\(")
        .Cast<Match>()
        .Select(x => x.Groups[1].Value)
        .ToList();
    

    Details

    • \b - word boundary
    • AC - an AC word
    • \s+ - 1+ whitespaces
    • (.*?) - Group 1 (.Groups[1].Value will hold this submatch): any 0 or more chars other than a newline char, as few as possible (as *? is a lazy quantifier)
    • \s+ - 1+ whitespaces
    • \( - a ( char.

    See the regex demo:

    enter image description here