COBOL to Java Automatic Migration with GPL'ed Tools
During the NACA project run by Publicitas Ltd., 4m lines of COBOL were automatically trans-coded (migrated) toward their Java equivalent. The company claims that the recurrent annual savings in cash-outs amount to a total of 3m euros and has released the tools from the NACA project under GPL.
- many levels of cache to maximize performances of the new Java version of the old application. Through them, our Java-transcoded transactions and batches have better performances than their Cobol ancestors used to have on mainframe.
- pre-allocation of all program variable structures (COMMAREA of COBOL) to further improve performances but also to minimize garbage collection that freezes the system while running.
- strongly object-oriented architecture of resulting Java objects in order to maximize the effect of all controls done by compiler. As example, each old COBOL program becomes a Java class whose existence is checked at compile-time rather than at runtime. Very useful when your application is 4 millions lines of code like ours and when you want to track down every typing mistake in a continuous integration architecture like ours
- strong integration with Eclipse IDE for highest productivity for developpers: we even developed a plug-in to facilitate debugging and edition of old COBOL programs from Eclipse
- line-by-line equivalence between old COBOL programs and newly transcoded Java classes. The home developers don't get lost: they receive afterwards a Java application with the exact same structure as the original COBOL version
- support of IBM JVM as well as Sun JVM in order to also allow for the transcoding of stored procedures
- support of distinct character sets and encoding schemes (EBCDIC) between mainframe & Linux. Support of all resulting possibilities for data sorting.
- full management of multi-level COBOL data structures in Java independently of the UTF encoding (2 bytes per char) used by Java
- transparency of wrapping framework (raw JVM, Apache Tomcat, etc...) for the application
While Didier emphasized the key aspects of such a project:
- economic motivation as core driver: move from a multi-million (CHF or euros) mainframe environment to an incredibly cheap and nimble farm of Linux Intel-based servers. The massive savings (3 millions euros / year in our case) allow for a quick auto-financing of the project far before its end. The main virtue of Open Source for a company like us remains clearly its very very low price.
- migrate people with technology: we believe that we succeeded in our project because we clearly demonstrated very early on to the people in place that they would find a new interesting job in the final constellation. That generated their full commitment to the project!
- iso-functionality as a must: migrating in such a manner prevents months of discussion about the final target. But, mostly, it allows for 100% automatic migration, a key factor for quality in the transcoding.
- no big-bang but numerous reversible steps: such a total migration with (tens of) thousands of new steps can never successfully reach the ends if you try big steps. Permanent incremental progress toward the goal is a much better approach. The nice consequences: small steps generate smaller local trouble when problems arise. Your users remain much more patient this way! Our experience was so...
The tools that we deliver today (v1.0) in the zip package:
- Doc: a set of documents explaining in details the tools and libraries. Your feedback around this documentation, its missing points, etc. is essential in order to improve it.
- NacaTrans (license GPL - approx. 83′00 lines of code code & 690 Java classes): our transcoder that allowed us to convert 100% automatically the 4 millions lines of our PUB 2000 application in COBOL to Java. It is very much based on compiler technologies. It analyzes the structure of the initial COBOL programs (supposed 100% valid) to bring them in an intermediate xml structure before generating the final Java code that extensively calls functions and uses classes of the runtime library NacaRT, itself depending on JLib. This new Java source code was very carefully designed: each line of Cobol generates very intentionally a single corresponding line of Java. The aim is to look like as much as possible like the original COBOL code in order to ease the maintenance by the original developers / maintainers who master the structure of their original Java programs. The completeness of the accepted syntax for all variants of Cobol is of course not guaranteed. But our own 4 millions of lines of code as well as additional tests on other external application tend to prove that the current coverage of Cobol by NacaTrans is already very high. We want to improve this coverage through your feedback for valid constructs that we don’t support yet.
- NacaRT & Jlib (license LGPL - approx 153′000 lines of code & 890 Java classes): those are the 2 runtime librairie who provide all the runtime transactional services for the application. They emulate all teh functions of a classical transactional monitor like CICS from IBM. At the same time, they also support all the COBOL constructs (for example, COMMARÈA structure with multiple data representation masks, management of specific data format like COMP-X, etc.)
- NacaRTTest (license GPL): this is a test suite allowing us to test the correct output of the transcoder on a set of reference COBOL constructs. It’s the way to validate part of our transcoding algorithms. For a new user of NACA, this is definitely the place to start: when this runs on oyur infrastructure, you can feel pretty confident about your installation of the package.
With a legacy of 50 years of COBOL and around 250bn LOCs in production, it seems there is a considerable market for similar tools.
Is it really what we want to have?
"[...] they receive afterwards a Java application with the exact same structure as the original COBOL version [...]" - in other words assuming that you start migrating the COBOL application that cannot be written better (in terms of a language capabilities, available libraries and the system architecture) - you end up with the worst nightmare you can think of in the Java world.
Thanks for reporting about NACA
Thanks a lot for reporting about NACA.
People interested can get in contact with me: firstname.lastname@example.org or email@example.com
Re: Is it really what we want to have?
1 to 1 line matching between Cobol & java was a deliberate move to keep the developpers "at home" after migration: they find afterwards the exact same code structure that they had before.
I understand that this strategy may look odd / wrong to pure OO developpers but for us it allowed to keep our long-time developpers on board: they have to learn java (remember: they come from Cobol). So helping them by providing "close" source code in Java was our deliberate choice. Now, that they are confortable with Java, they progressively re-objectify (from a business perspective) the application.
This way of doing saved us probably 12 to 24 months in our project, so 3 to 6 additional millions euros. That gives then quite a lot of financial means for post-refactoring... ;-)
Re: Is it really what we want to have?
My company (much smaller) is trapped into a MF Cobol ERP running Linux and feature-wise, the dinosaur is clearly constraining our growth. We still have some quick gains to achieve in COBOL land - like migrating data into SQL instead of using ISAM. But I know that eventually such change will be necessary.
Thank you for this contribution to the OSS world.
Commercial COBOL to Java/C# tool generating much higher quality of OO code
Here is a sample of code from NACA's examples concerning COBOL's "MOVE" statement
Var W3 = declare.level(1).occurs(10).var();
Var VX10 = declare.level(5).picX(10).var();
SoftwareMining's conversion tool covers a full analysis of "how" each variable has been used. Ie it can identify if "formatting" information is required for a variable, if not, the VX10 variable gets translated into an "String" primitive data type in java (or C#) . It also creates data-classes, e.g. Vx10 should belong to a class called "orderClass", "invoiceClass" or in NACA's example it should be W3.
In SoftwareMining's conversion utility, the "Move" statement will hence be translated to :
owningClass.setVx10( "9876543210" );
or in C# :
owningClass.Vx10= "9876543210" ;
Furthermore, the SoftwareMining solution can also "re-engineer" the code during the translation. For example, it can identify if most of a methods operation is involving a data-class called "Order" - then it will automatically move that method to the "Order" class. Ie the new code will adhere to the principals of Object-Orientation.
All in all, a significantly higher quality of code is generated - which helps in long term maintenance of the system. SoftwareMining has many large references.
For more information see www.softwaremining.com