I am new in programming and I am actually a mechanical engineer. For my research I have written a fortran routine for modelling a process. This routine is quite slow because either has been written by me (and so it's not perfect computationally speaking) and it performs many iteration to reach convergence, so it need time.
But I have a 6 core-CPU and I think if I could exploit all of the cores the routine could run faster than it does now.
The routine is like this:
PROGRAM my routine
INCLUDE 'dimensions_of_arrays.dim'
INCLUDE 'subroutines.sub'
INCLUDE 'subroutines2.sub'
DECLARATION OF VARIABLES
..
.
DO LOOP OVER MANY STEPS
.
CALL MANY SUBROUTINES
.
.
.
PERFORM SOME ITERATION
END LOOP
.
WRITE RESULTS
END
In the file of the subroutines 'subroutines.sub' I have more than 20 subroutines, like this:
SUBROUTINE xxx(a,b)
INCLUDE 'dimensions_of_arrays.dim'
DECLARATION OF VARIABLES
COMMON/PATH1/PATH2/G,J,K
.
.
SOME CALCULATION
.
END
In the file 'dimensions_of_arrays.dim' there are common and parameters used during compilation.
Is it possible in your opinion using multi-processor with this routine? Trying not to modify it "heavily".
I use Intel Composer XE2011 with Visual Studio 2010 as compiler of the code.
Any help is very appreciated. Thanks
Since you are using Intel Fortran, I suggest that your first step should be to add the automatic parallelization option. In Visual Studio on Windows this is project property Fortran > Optimization > Parallelization > Yes. While you're at it, I suggest setting option /QxHost. I don't remember if the old version you're using supports this as a project property - if it does, it would be Fortran > Code Generation > Intel Processor-Specific Optimization > Same as the host processor. Of course, you should be building a Release configuration to enable optimization.
This may give you enough performance boost to be satisfactory. If not the next step I would suggest would be to turn on optimization diagnostics and see what it says about why certain loops could not be parallelized.
You are using a quite-old version of the compiler - newer versions are much better at parallelization and optimization and I'd recommend you use the latest you have access to. If none of this produces the results you want, then I agree you'll need to "get your hands dirty" and add OpenMP directives, but this will require that you have a good understanding of how the program works, which variables should be shared and which private. An intermediate step would be to use the Intel parallelization directives, but these aren't a lot different from OpenMP.
When converting a serial program to parallel, especially an old Fortran code, you have to be very careful when it comes to global variables (COMMONs usually). These can either block parallelization or lead to incorrect results. The Intel Inspector XE tool (part of larger Intel Parallel Studio XE editions) can be good at finding these for you.