I'm trying to write a program in C# to take a table of strings (variable names) from a database and search a directory of ~30,000 Fortran 77 source files to determine where that variable is calculated. The variables are typically calculated only 1 time in 1 of the fortran files but used many times in other files. The variables in the database table are all explicitly defined somewhere in the fortran files. So far I've accomplished most of this by first building a list of files that each variable appears in, and then searching the files in that list line by line. I've been looking for which side of the "=" sign the variable appears on by doing something like this:
CompareInfo ci = CultureInfo.CurrentCulture.CompareInfo;
for (int k = 0; k < fullpaths.Count; k++)
{
string line;
// Read the file and display it line by line.
System.IO.StreamReader FortranFile = new System.IO.StreamReader(fullpaths[k]);
while ((line = FortranFile.ReadLine()) != null)
{
// Search the file line-by-line for the variable
if (ci.IndexOf(line, Variable, CompareOptions.IgnoreCase) > 0)
{
// Search for the equals sign
int equalLocation = ci.IndexOf(line, "=");
if (equalLocation > 0)
{
// substring LHS
string subLineLHS = line.Substring(0, equalLocation+1);
// is the line commented out?
if (Convert.ToString(subLineLHS[0]) == "C" ||
Convert.ToString(subLineLHS[0]) == "!" ||
Convert.ToString(subLineLHS[0]) == "c" ||
Convert.ToString(subLineLHS[0]) == "*")
{
continue;
}
// ignore if the line contains a DO, IF, or WHILE loop,
// to prevent reading IF [Variable] = xxxx as being calculated.
else if ( (ci.IndexOf(subLineLHS, "IF", CompareOptions.IgnoreCase) > 0) ||
(ci.IndexOf(subLineLHS, "DO", CompareOptions.IgnoreCase) > 0) ||
(ci.IndexOf(subLineLHS, "WHILE", CompareOptions.IgnoreCase) > 0))
{
continue;
}
// find where the variable is used in the line
else if (ci.IndexOf(subLineLHS, Variable, CompareOptions.IgnoreCase) > 0 )
{
isCalculated[k] = true;
calculatedLine[k] = counter;
}
}
} //if loop
counter++;
} //while loop
FortranFile.Close();
}
The problems I'm having is with IF statements, e.g.:
IF(something == xx .AND.
1 variable == xx) THEN
...
this method would tell me that the variable is calculated on that line "variable = xx". 1-line if-statements such as IF(something) variable=xx are also ignored. Lines with multiple = signs may give me problems too.
Any suggestions on how I could get around this? Is there a better method of doing this? Please go easy on me - I'm not a programmer.
Thanks!
The most error-proof approach would be to parse the Fortran code and work from the syntax tree.
My suggestion: use ctags. See for instance Exuberant ctags; it has support for Fortran.
ctags
generates an index of all named entities in a set of source code files. The index is stored in a data structure (tags) that can be read from most file editors/IDEs.
If you import that tags file in your favourite text editor, you will be able to jump to the definition of a variable when you position your cursor on it and take proper action.
The tags file is also very easy to read and parse: it structured like this.
named_entity<Tab>file_where_it_is_defined<Tab>location_in_the_file
For instance, from a set of Fortran files (this is on Linux, but Exuberant ctags offers Windows binaries):
gpar remlf90.f90 /^ xrank,npar,gpar,/;" v program:REMLF90
hashia1 ../libs/sparse2.f /^ subroutine hashia1(/;" s
hashv1 ../libs/sparse3.f /^ integer function hashv1(/;" f
hashvr_old ../libs/sparse2.f /^ integer function hashvr_old(/;" f
We can observe that the gpar
variable is defined in remlf90.f90
and hashia1
is defined in ../libs/sparse2.f
, etc.