ORIGINAL Q: I'm wondering if anyone has had experience of migrating a large Cobol/PL1 codebase to Java?
How automated was the process and how maintainable was the output?
How did the move from transactional to OO work out?
Any lessons learned along the way or resources/white papers that may be of benefit would be appreciated.
EDIT 7/7: Certainly the NACA approach is interesting, the ability to continue making your BAU changes to the COBOL code right up to the point of releasing the JAVA version has merit for any organization.
The argument for procedural Java in the same layout as the COBOL to give the coders a sense of comfort while familiarizing with the Java language is a valid argument for a large organisation with a large code base. As @Didier points out the $3mil annual saving gives scope for generous padding on any BAU changes going forward to refactor the code on an ongoing basis. As he puts it if you care about your people you find a way to keep them happy while gradually challenging them.
The problem as I see it with the suggestion from @duffymo to
Best to try and really understand the problem at its roots and re-express it as an object-oriented system
is that if you have any BAU changes ongoing then during the LONG project lifetime of coding your new OO system you end up coding & testing changes on the double. That is a major benefit of the NACA approach. I've had some experience of migrating Client-Server applications to a web implementation and this was one of the major issues we encountered, constantly shifting requirements due to BAU changes. It made PM & scheduling a real challenge.
Thanks to @hhafez who's experience is nicely put as "similar but slightly different" and has had a reasonably satisfactory experience of an automatic code migration from Ada to Java.
Thanks @Didier for contributing, I'm still studying your approach and if I have any Q's I'll drop you a line.
Update 6/25: A friend just ran across the NACA Cobol to Java converter. Looks quite interesting, it was used to translate 4m lines of Cobol with 100% accuracy. Here's the NACA open source project page. The other converters I've seen were proprietary, and the materials were conspicuously lacking success stories and detailed example code. NACA is worth a long look.
Update 7/4: @Ira Baxter reports that the Java output looks very Cobol-esque, which it absolutely does. To me, this is the natural result of automatic translation. I doubt we'll ever find a much better translator. This perhaps argues for a gradual re-write approach.
Update 2/7/11: @spgennard points out that there are some Cobol compilers on the JVM, for example Veryant's isCobol Evolve. These could be used to help gradually transition the code base, though I think the OP was more interested in automated source conversion.
I'd be very cautious about this. (I used to work for a company that automatically corrected Cobol and PL/I programs for Y2K, and did the front end compiler that converted many dialects of Cobol into our intermediate analytic form, and also a code generator.) My sense is that you'd wind up with a Java code base that still would be inelegant and unsatisfying to work with. You may wind up with performance problems, dependencies on vendor-supplied libraries, generated code that's buggy, and so on. You'll certainly incur a huge testing bill.
Starting from scratch with a new object-oriented design can be the right approach, but you also have to carefully consider the decades of stored knowledge represented by the code base. Often there are many subtleties that your new code may miss. On the other hand, if you're having a hard time finding staff to maintain the legacy system, you may not have a choice.
One gradual approach would be to first upgrade to Cobol 97. This adds object-orientation, so you can rewrite and refactor subsystems individually when you add new functionality. Or you could replace individual subsystems with freshly-written Java.
Sometimes you'll be able to replace components with off-the-shelf software: we helped one very large insurance company that still had 2m lines of code in a legacy language it created in the 1950s. We converted half of it to Y2K compliant legacy language, and they replaced the other half with a modern payroll system they bought from an outside vendor.