Search code examples
c#regexsplitdelimiter

Separate words according to delimeter


I have a search box that allows both for searching of content in a table (space delimited) and for searching by a specific field in the table (colon delimited).

The only problem is that these can both exist at the same time. Examples:

  1. Type:Non-Fiction Murder
  2. Non ISBN:000000001
  3. Fiction ISBN:02 Plane

From example 1, Type is the field name, Non-Fiction is its content and Murder is the content in any field. I am looking for a Regex.Split that puts the field:result into a Dictionary and any other result into an array.

I have managed to make both work on a separate basis but not mixed:

var columnSearch_FieldNames = inSearch.ToUpper().Trim().Split(':').Where((x,i) => i % 2 == 0).ToArray();
var columnSearch_FieldContent = inSearch.ToUpper().Trim().Split(':').Where((x, i) => i % 2 != 0).ToArray();
var adhocSearch_FieldContent = inSearch.ToUpper().Trim().Split(' ');

Example 4:- Type:Non-Fiction Murder Non ISBN:000000001 Kill

Example Output:- Dictionary ({Type, Non-Fiction}, {ISBN, 0000001}) Array {Murder, Non, Kill}


Solution

  • I don't see why using Regex would be faster. And IMHO, I don't think there's any improvement in the readability or maintainability of the code, using Regex. If anything, I think it would be more complicated. But if you really want to use Regex.Split(), something like this would work:

    static void Main(string[] args)
    {
        string input = "Type:Non-Fiction Murder Non ISBN:000000001 Kill", key = null, value = null;
        Dictionary<string, string> namedFields = new Dictionary<string, string>();
        List<string> anyField = new List<string>();
        Regex regex = new Regex("( )|(:)", RegexOptions.Compiled);
    
        foreach (string field in regex.Split(input))
        {
            switch (field)
            {
                case " ":
                    _AddParameter(ref key, ref value, namedFields, anyField);
                    break;
                case ":":
                    key = value;
                    break;
                default:
                    value = field;
                    break;
            }
        }
        _AddParameter(ref key, ref value, namedFields, anyField);
    }
    
    private static void _AddParameter(ref string key, ref string value, Dictionary<string, string> namedFields, List<string> anyField)
    {
        if (key != null)
        {
            namedFields.Add(key, value);
            key = null;
        }
        else if (value != null)
        {
            anyField.Add(value);
            value = null;
        }
    }
    

    Now, if you're willing to just use a plain Regex match, instead of using the Regex.Split() method, one might argue this is marginally more readable/maintainable:

    private static void UsingRegex(string input)
    {
        Dictionary<string, string> namedFields = new Dictionary<string, string>();
        List<string> anyField = new List<string>();
        Regex regex = new Regex("(?:(?<key>[^ ]+):(?<value>[^ ]+))|(?<loneValue>[^ ]+)", RegexOptions.Compiled);
    
        foreach (Match match in regex.Matches(input))
        {
            string key = match.Groups["key"].Value,
                value = match.Groups["value"].Value,
                loneValue = match.Groups["loneValue"].Value;
    
            if (!string.IsNullOrEmpty(key))
            {
                namedFields.Add(key, value);
            }
            else
            {
                anyField.Add(loneValue);
            }
        }
    }