Search code examples
c++algorithmrefactoringpseudocode

How to best follow a function to understand what it is doing?


This question is general but I am going to give a specific example.

The specific problem that needs to be solved is explained here. The description is long so I will not cut and paste but the basic idea is for input strings S and T (as they are called in the code below), find the minimum number of changes that needs to be done to S to produce T. One change can be :

  • Insert one letter to any end of the string.
  • Delete one letter from any end of the string.
  • Change one letter into any other one.

Below is a solution I am trying to track. What I am looking for are tips on how to best grok the solution. What are some methods I can use to read and understand the code (let's discard stepping through a debugger).

#include<iostream>
#include<cstring>
#include<stdio.h>
using namespace std;
char S[2010];
char T[2010];
int lens,lent;
int main()
{
    int i,j,ma,p;

    while(scanf("%s%s",S,T)!=EOF)
    {
        lens=strlen(S);
        lent=strlen(T);
        ma=0;p=0;
        for(i=0;i<lens;i++)
        {
            p=0;
            for(j=0;j<lent;j++)
            {
                if(i+j>=lens)
                    break;
                if(S[i+j]==T[j]){p++;}
            }
            if(ma<p)
                ma=p;
            if(ma==lent)
                break;
        }
        for(i=0;i<lent;i++)
        {
            p=0;
            for(j=0;j<lens;j++)
            {
                if(i+j>=lent)
                    break;
                if(T[i+j]==S[j]){p++;}
            }
            if(ma<p)
                ma=p;
            if(ma==lent)
                break;
        }
        printf("%d\n",lent-ma);
    }
    return 0;
}

Solution

  • Step 1: Explain to yourself what the variables represent:

    S: the string from which we want to extract a substring

    T: the string which we want to achieve in the end, after having modified the extracted substring with as few operations as possible

    lens: length of string S

    lent: length of string T

    i: the index in S from which the extracted substring starts

    j: the index in string T of a character which we want to match with a corresponding character in the substring

    p: the amount of matching chars found for the currently investigated substring

    ma : the maximum amount of matching chars for any of the substrings


    Step 2: Having established these meanings, it's rather simple to translate the first loop into words:

    for loop 1 :    selects a start position of the substring
    
        set the match counter to 0, since we start investigation of a new substring
    
    
        for loop 2 :    loops through the substring
    
            if 1 :      if there is no char left to read string S, stop looping
    
            if 2 :      if the current character in the extracted substring matches
                    a character in the "goal" string, increment the match counter (p)
    
    
        if 3 :      now, we finished looping through a substring,
                if the count of matching characters in the substring and the goal
                string was higher than for any of the previous counts,
                then store this value as the max count
    
        if 4 :      if the max count of matching characters is equal to the 
                length of the "goal string", dr Moriatry can receive the goal string
                with 0 substring changes, and hence, we can stop looping
    

    The next loop is similar. The roles of S and T have kind of been reversed. Notice though, that the roles of S and T have not been fully reversed (as some people have said). The end condition for the outer for loop uses the length of T in both cases, which makes sense.

    Here we extract substrings from string T (the "goal" string) and try to match them against string S. Why are we doing this?

    I expect that the person who wrote the code wanted to account for cases like the following, e.g.:

    S = "b"     T = "abc"
    

    If we'd only extract substrings from S and match them against the whole T string, starting at the first index (like the first loop does), we'd only compare "does b (in S) match a (the first char in T) and then we'd go on and say: "Since no substring matches, we need 3 string change operations to receive string T" (which is obviously wrong, as we can achieve it by choosing "b" as the substring to extract, and making 2 change operations to end up with T)