Search code examples
c#fortranfortran77

Find where variables are calculated in fortran sources files


I'm trying to write a program in C# to take a table of strings (variable names) from a database and search a directory of ~30,000 Fortran 77 source files to determine where that variable is calculated. The variables are typically calculated only 1 time in 1 of the fortran files but used many times in other files. The variables in the database table are all explicitly defined somewhere in the fortran files. So far I've accomplished most of this by first building a list of files that each variable appears in, and then searching the files in that list line by line. I've been looking for which side of the "=" sign the variable appears on by doing something like this:

CompareInfo ci = CultureInfo.CurrentCulture.CompareInfo;
for (int k = 0; k < fullpaths.Count; k++)
{
    string line;
    // Read the file and display it line by line.
    System.IO.StreamReader FortranFile = new System.IO.StreamReader(fullpaths[k]);
    while ((line = FortranFile.ReadLine()) != null)
    {
        // Search the file line-by-line for the variable
        if (ci.IndexOf(line, Variable, CompareOptions.IgnoreCase) > 0)
        {
            // Search for the equals sign
            int equalLocation = ci.IndexOf(line, "=");
            if (equalLocation > 0)
            {
                // substring LHS
                string subLineLHS = line.Substring(0, equalLocation+1);
                // is the line commented out?
                if (Convert.ToString(subLineLHS[0]) == "C" ||
                    Convert.ToString(subLineLHS[0]) == "!" ||
                    Convert.ToString(subLineLHS[0]) == "c" ||
                    Convert.ToString(subLineLHS[0]) == "*")
                {
                    continue;
                }
                // ignore if the line contains a DO, IF, or WHILE loop, 
                // to prevent reading IF [Variable] = xxxx as being calculated.
             else if ( (ci.IndexOf(subLineLHS, "IF", CompareOptions.IgnoreCase) > 0) ||
                       (ci.IndexOf(subLineLHS, "DO", CompareOptions.IgnoreCase) > 0) ||
                      (ci.IndexOf(subLineLHS, "WHILE", CompareOptions.IgnoreCase) > 0))
                {
                    continue;
                }
                // find where the variable is used in the line
            else if (ci.IndexOf(subLineLHS, Variable, CompareOptions.IgnoreCase)  > 0 )
                {
                    isCalculated[k] = true;
                    calculatedLine[k] = counter;
                }
            }
        } //if loop
        counter++;
    } //while loop

    FortranFile.Close();
}

The problems I'm having is with IF statements, e.g.:

   IF(something == xx .AND.
1     variable == xx) THEN
...

this method would tell me that the variable is calculated on that line "variable = xx". 1-line if-statements such as IF(something) variable=xx are also ignored. Lines with multiple = signs may give me problems too.

Any suggestions on how I could get around this? Is there a better method of doing this? Please go easy on me - I'm not a programmer.

Thanks!


Solution

  • The most error-proof approach would be to parse the Fortran code and work from the syntax tree.

    My suggestion: use ctags. See for instance Exuberant ctags; it has support for Fortran. ctags generates an index of all named entities in a set of source code files. The index is stored in a data structure (tags) that can be read from most file editors/IDEs. If you import that tags file in your favourite text editor, you will be able to jump to the definition of a variable when you position your cursor on it and take proper action.

    The tags file is also very easy to read and parse: it structured like this.

    named_entity<Tab>file_where_it_is_defined<Tab>location_in_the_file
    

    For instance, from a set of Fortran files (this is on Linux, but Exuberant ctags offers Windows binaries):

    gpar    remlf90.f90 /^           xrank,npar,gpar,/;"    v   program:REMLF90
    hashia1 ../libs/sparse2.f   /^      subroutine hashia1(/;"  s
    hashv1  ../libs/sparse3.f   /^      integer function hashv1(/;" f
    hashvr_old  ../libs/sparse2.f   /^      integer function hashvr_old(/;" f
    

    We can observe that the gparvariable is defined in remlf90.f90 and hashia1 is defined in ../libs/sparse2.f, etc.