Search code examples
c#algorithmlinguistics

Create short human-readable string from longer string


I have a requirement to contract a string such as...

Would you consider becoming a robot? You would be provided with a free annual oil change."

...to something much shorter but yet still humanly identifiable (it will need to be found from a select list - my current solution has users entering an arbitrary title for the sole purpose of selection)

I would like to extract only the portion of the string which forms a question (if possible) and then somehow reduce it to something like

WouldConsiderBecomingRobot

Are there any grammatical algorithms out there that might help me with this? I'm thinking there might be something that allows be to pick out just verbs and nouns.

As this is just to act as a key it doesn't have to be perfect; I'm not seeking to trivialise the inherant complexity of the english language.


Solution

  • I ended up creating the following extension method which does work surprisingly well. Thanks to Joe Blow for his excellent and effective suggestions:

        public static string Contract(this string e, int maxLength)
        {
            if(e == null) return e;
    
            int questionMarkIndex = e.IndexOf('?');
            if (questionMarkIndex == -1)
                questionMarkIndex = e.Length - 1;
    
            int lastPeriodIndex = e.LastIndexOf('.', questionMarkIndex, 0);
    
            string question = e.Substring(lastPeriodIndex != -1 ? lastPeriodIndex : 0, questionMarkIndex + 1).Trim();
    
            var punctuation =
                new [] {",", ".", "!", ";", ":", "/", "...", "...,", "-,", "(", ")", "{", "}", "[", "]","'","\""};
    
            question = punctuation.Aggregate(question, (current, t) => current.Replace(t, ""));
    
            IDictionary<string, bool> words = question.Split(' ').ToDictionary(x => x, x => false);
    
            string mash = string.Empty;
            while (words.Any(x => !x.Value) && mash.Length < maxLength)
            {
                int maxWordLength = words.Where(x => !x.Value).Max(x => x.Key.Length);
                var pair = words.Where(x => !x.Value).Last(x => x.Key.Length == maxWordLength);
                words.Remove(pair);
                words.Add(new KeyValuePair<string, bool>(pair.Key, true));
                mash = string.Join("", words.Where(x => x.Value)
                                           .Select(x => x.Key.Capitalize())
                                           .ToArray()
                    );
            }
    
            return mash;
        }
    

    This contracts the following to 15 chars:

    • This does not have any prereqs - write an essay...: PrereqsWriteEssay
    • You've selected a car: YouveSelectedCar