Search code examples
c#difflcs

LCS C# Algorithm - Get the deleted lines and added lines in previous and current text


I have to rewrite the LCS algorithm because some company policies.

I've already get done the LCS algorithm, but next step is to identify which lines were removed from the previous text and which one were added in the current text.

I tried a simple check thought the lines, but it won't work if I got a text with duplicated lines.

He is my code: LCS Method

private static string[] LcsLineByLine(string previous, string current)
    {
        string[] Previous = previous.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
        string[] Current = current.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);

        string lcsResult = string.Empty;
        int a = Previous.Length;
        int b = Current.Length;
        int[,] table = new int[a + 1, b + 1];

        //create a table with first line and column equal 0
        for (int i = 0; i <= a; i++)
            table[i, 0] = 0;
        for (int j = 0; j <= b; j++)
            table[0, j] = 0;

        //create a table matrix
        for (int i = 1; i <= a; i++)
        {
            for (int j = 1; j <= b; j++)
            {
                if (string.Equals(Previous[i - 1].Trim(), Current[j - 1].Trim(), StringComparison.InvariantCultureIgnoreCase))
                {
                    table[i, j] = table[i - 1, j - 1] + 1;
                }
                else
                {
                    table[i, j] = Math.Max(table[i, j - 1], table[i - 1, j]);
                }
            }
        }

        //get the lcs string array with the differences
        int index = table[a, b];

        string[] lcs = new string[index + 1];
        lcs[index] = "0";

        while (a > 0 && b > 0)
        {
            if (string.Equals(Previous[a - 1].Trim(), Current[b - 1].Trim(), StringComparison.InvariantCultureIgnoreCase))
            {
                lcs[index - 1] = Previous[a - 1].Trim();
                a--;
                b--;
                index--;
            }
            else if (table[a - 1, b] > table[a, b - 1])
                a--;
            else
                b--;
        }

        return lcs;
    }

And this is the code that is not working with duplicated lines with same value.

Method to get all deleted items in the previous text:

private List<DiffItem> GetDiffPrevious(string[] previous, string[] diff)
    {
        List<DiffItem> differences = new List<DiffItem>();

        //check items deleted            
        int line = 0;
        for (int i = 0; i < previous.Length; i++)
        {
            bool isAbsent = false;
            for (int j = 0; j < diff.Length; j++)
            {
                if (string.Equals(previous[i].Trim(), diff[j].Trim(), StringComparison.InvariantCultureIgnoreCase))
                {
                    differences.Add(new DiffItem() { Position = line, Text = diff[j], Status = DiffStatus.Equal });
                    line++;
                    isAbsent = false;
                    break;
                }
                else
                {
                    isAbsent = true;
                }
            }
            //mark as deleted
            if (isAbsent)
            {
                differences.Add(new DiffItem() { Position = line, Text = previous[i].Trim(), Status = DiffStatus.Deleted });
                line++;
            }
        }

        return differences;
    }

If anyone could help me or any feedback would be great. Just a reminder, I cannot use third party libraries.

Thanks in advance.


Solution

  • I found the solution! Basically, I rewrite the two lists and translated using Hashtable, so all the values will be unique by line. Then, I use the method LCS and got the result as expected. I hope it helps somebody.