Search code examples
c#.netstringlinq

Get a specific string form a list of strings that mathces a tricky criteria


I have an object that has a list of strings where each string represents a territory (NUTS codes). ex.

["SE","SE12","SE124"]

what I'm trying to do is to get the most general and the most specific one (I don't know If I made sense) I'll write some input examples and what the expected output to be so it gets clearer about what I mean.

input1 : ["SE", "SE123", "SE124", "SE123456", "SE123456789"],
input2 : ["SE", "SE2", "SE123", "SE123456", "SE123456789"],
input3 : ["SE", "SE123", "SE123456", "SE123456789"],
input4 : ["SE","FI", "SE2"]

the expected output should be : output1 =>"SE12" , output2 => "SE", ouptut3 => "SE123456789", output => "".

I used different approaches but it seems to be trickier than I thought.

my method currently looks like this:

 public static string GetSpecificNuts(IList<string> nuts)
{
    var outNuts = "";
    var annNuts = nuts.Distinct().ToList();
    if (annNuts.Any())
    {
        if (annNuts.Count() == 1)
        {
            outNuts = annNuts.SingleOrDefault();
        }
        else
        {
            var grouped = annNuts.GroupBy(n => n.Length).OrderByDescending(n=>n.Key).ToList();
            var highest = grouped.Select(g => g.Key).FirstOrDefault();

            var highestGroup = grouped?.SingleOrDefault(g => g.Key == highest)?.ToList();
            var length = highestGroup?.Count;

            if (length == 1)
            {
                var highestNuts = highestGroup?.SingleOrDefault();
                var contained = grouped?.Where(n => n.Key != highest).SelectMany(g => g.ToList()).Where(s => highestNuts.StartsWith(s)).OrderByDescending(s=>s.Length);
                var firstContained = contained.FirstOrDefault();
                if (!string.IsNullOrWhiteSpace(firstContained))
                {
                    outNuts = firstContained;
                }
            }
            while (length > 1)
            {
                var deducted = new List<string>();
                highestGroup?.ForEach(i => { deducted.Add(i.Length > 2 ? i.Remove(i.Length - 1, 1) : i); });
                var distinct = deducted?.Distinct().ToList();
                length = distinct?.Count;
                highestGroup = distinct;
                if (length == 1)
                {
                    outNuts = distinct?.SingleOrDefault();
                }
            }
        }
    }

    return outNuts;
}

any thoughts?

EDIT FOR MORE EXPLANATION: consider the numbers after the first 2 letters as a tree view. first number represents a group of states, 2nd represents a state, 3rd represents a district and the 4th represents municipalities ..and so on. I need to get the most specific area and I Achieved that at input3. but if the list has ex. 2 or more different districts then I need to get the number that represent the state. 2 more different states then I need to get the number that represents the group of states. 2 or more different groups of states then I need to get the first 2 letters that represent the country. 2 or country codes ex ("SE","FI") then the output should be an empty string.


Solution

  • To clarify what I believe you're looking for: you want to find the NUTS code that represents the most specific division that is either common for all in the list or the division for which all other NUTS codes is a more general division of. Essentially, find the deepest node in a k-ary tree for which the paths to all leaf nodes pass through:

     SE
      \
       5
        \
         7 <-- Most specific in the common branch (SE57)
        / \
       4   2
    
     SE
      \
       5 
        \
         7
          \
           2 <-- Most specific in the common branch (SE572)
    

    One solution for this could be as follows (giving the method a good name is a little tricky):

    public static string GetDeepestNodeInCommonBranch(IList<string> nutsCodes)
    {
        // Build a tree with the depth being the specificity (index in the code) of
        // the country subdivisions and the nodes at each level being the different
        // divisions at that level/specificity (different numbers at the same index).
        
        var levels = nutsCodes.Distinct(StringComparer.OrdinalIgnoreCase)
            .SelectMany(nc => nc.Select((c, i) => new {Character = c, Index = i}))
            .GroupBy(obj => obj.Index)
            // Walk the tree from the top
            .OrderBy(g => g.Key);
        
        var commonDiv = string.Empty;
        foreach (var level in levels)
        {
            var divisions = level.Select(obj => obj.Character).Distinct().ToList();
            // If the tree branches out here, the division at the last level is the
            // most specific division common for all
            if (divisions.Count > 1)
            {
                // Only return full country codes
                return commonDiv.Length >= 2 ? commonDiv : string.Empty;
            }
            commonDiv += divisions.Single();
        }
        return commonDiv;
    }