InfoQ

News

COBOL to Java Automatic Migration with GPL'ed Tools

Posted by Dionysios G. Synodinos on Jul 03, 2009

Community
Java
Topics
Change
Tags
migration

During the NACA project run by Publicitas Ltd., 4m lines of COBOL were automatically trans-coded (migrated) toward their Java equivalent. The company claims that the recurrent annual savings in cash-outs amount to a total of 3m euros and has released the tools from the NACA project under GPL.

Didier Durand and Pierre-Jean Ditscheid made a presentation about it at at Jazoon09 last week, which is available online.

Pierre described the architecture of the "transcoding compiler":

  • many levels of cache to maximize performances of the new Java version of the old application. Through them, our Java-transcoded transactions and batches have better performances than their Cobol ancestors used to have on mainframe.
  • pre-allocation of all program variable structures (COMMAREA of COBOL) to further improve performances but also to minimize garbage collection that freezes the system while running.
  • strongly object-oriented architecture of resulting Java objects in order to maximize the effect of all controls done by compiler. As example, each old COBOL program becomes a Java class whose existence is checked at compile-time rather than at runtime. Very useful when your application is 4 millions lines of code like ours and when you want to track down every typing mistake in a continuous integration architecture like ours
  • strong integration with Eclipse IDE for highest productivity for developpers: we even developed a plug-in to facilitate debugging and edition of old COBOL programs from Eclipse
  • line-by-line equivalence between old COBOL programs and newly transcoded Java classes. The home developers don't get lost: they receive afterwards a Java application with the exact same structure as the original COBOL version
  • support of IBM JVM as well as Sun JVM in order to also allow for the transcoding of stored procedures

     

  • support of distinct character sets and encoding schemes (EBCDIC) between mainframe & Linux. Support of all resulting possibilities for data sorting.
  • full management of multi-level COBOL data structures in Java independently of the UTF encoding (2 bytes per char) used by Java
  • transparency of wrapping framework (raw JVM, Apache Tomcat, etc...) for the application
  • etc...

While Didier emphasized the key aspects of such a project:

  • economic motivation as core driver: move from a multi-million (CHF or euros) mainframe environment to an incredibly cheap and nimble farm of Linux Intel-based servers. The massive savings (3 millions euros / year in our case) allow for a quick auto-financing of the project far before its end. The main virtue of Open Source for a company like us remains clearly its very very low price.
  • migrate people with technology: we believe that we succeeded in our project because we clearly demonstrated very early on to the people in place that they would find a new interesting job in the final constellation. That generated their full commitment to the project!
  • iso-functionality as a must: migrating in such a manner prevents months of discussion about the final target. But, mostly, it allows for 100% automatic migration, a key factor for quality in the transcoding.
  • no big-bang but numerous reversible steps: such a total migration with (tens of) thousands of new steps can never successfully reach the ends if you try big steps. Permanent incremental progress toward the goal is a much better approach. The nice consequences: small steps generate smaller local trouble when problems arise. Your users remain much more patient this way! Our experience was so...

The tools can be downloaded and the distribution contains:

The tools that we deliver today (v1.0) in the zip package:

  • Doc: a set of documents explaining in details the tools and libraries. Your feedback around this documentation, its missing points, etc. is essential in order to improve it.
  • NacaTrans (license GPL - approx. 83′00 lines of code code & 690 Java classes): our transcoder that allowed us to convert 100% automatically the 4 millions lines of our PUB 2000 application in COBOL to Java. It is very much based on compiler technologies. It analyzes the structure of the initial COBOL programs (supposed 100% valid) to bring them in an intermediate xml structure before generating the final Java code that extensively calls functions and uses classes of the runtime library NacaRT, itself depending on JLib. This new Java source code was very carefully designed: each line of Cobol generates very intentionally a single corresponding line of Java. The aim is to look like as much as possible like the original COBOL code in order to ease the maintenance by the original developers / maintainers who master the structure of their original Java programs. The completeness of the accepted syntax for all variants of Cobol is of course not guaranteed. But our own 4 millions of lines of code as well as additional tests on other external application tend to prove that the current coverage of Cobol by NacaTrans is already very high. We want to improve this coverage through your feedback for valid constructs that we don’t support yet.
  • NacaRT & Jlib (license LGPL - approx 153′000 lines of code & 890 Java classes): those are the 2 runtime librairie who provide all the runtime transactional services for the application. They emulate all teh functions of a classical transactional monitor like CICS from IBM. At the same time, they also support all the COBOL constructs (for example, COMMARÈA structure with multiple data representation masks, management of specific data format like COMP-X, etc.)
  • NacaRTTest (license GPL): this is a test suite allowing us to test the correct output of the transcoder on a set of reference COBOL constructs. It’s the way to validate part of our transcoding algorithms. For a new user of NACA, this is definitely the place to start: when this runs on oyur infrastructure, you can feel pretty confident about your installation of the package.

With a legacy of 50 years of COBOL and around 250bn LOCs in production,  it seems there is a considerable market for similar tools.

Is it really what we want to have? by Marcin Niebudek Posted Jul 3, 2009 2:37 PM
Re: Is it really what we want to have? by Didier DURAND Posted Jul 4, 2009 2:02 AM
Re: Is it really what we want to have? by Cloves Almeida Posted Jul 6, 2009 5:17 PM
Re: Is it really what we want to have? by Fiona Oliver Posted Jul 8, 2009 5:54 PM
Thanks for reporting about NACA by Didier DURAND Posted Jul 4, 2009 1:56 AM
subject by Neil Murphy Posted Jul 7, 2009 3:52 PM
  1. Back to top

    Is it really what we want to have?

    Jul 3, 2009 2:37 PM by Marcin Niebudek

    If I understand it correctly NacaTrans produced 4m lines of Java code out of 4m lines of COBOL code. Do we really want those other 250bn LOCs of COBOL code to become 250bn LOCs of Java code? I don't think so.



    "[...] they receive afterwards a Java application with the exact same structure as the original COBOL version [...]" - in other words assuming that you start migrating the COBOL application that cannot be written better (in terms of a language capabilities, available libraries and the system architecture) - you end up with the worst nightmare you can think of in the Java world.



    Regards,

    Marcin

  2. Back to top

    Thanks for reporting about NACA

    Jul 4, 2009 1:56 AM by Didier DURAND

    Hi Dionysios,
    Thanks a lot for reporting about NACA.
    People interested can get in contact with me: didier.durand@publicitas.com or mediaandtech@gmail.com

  3. Back to top

    Re: Is it really what we want to have?

    Jul 4, 2009 2:02 AM by Didier DURAND

    Hi Marcin,

    1 to 1 line matching between Cobol & java was a deliberate move to keep the developpers "at home" after migration: they find afterwards the exact same code structure that they had before.

    I understand that this strategy may look odd / wrong to pure OO developpers but for us it allowed to keep our long-time developpers on board: they have to learn java (remember: they come from Cobol). So helping them by providing "close" source code in Java was our deliberate choice. Now, that they are confortable with Java, they progressively re-objectify (from a business perspective) the application.

    This way of doing saved us probably 12 to 24 months in our project, so 3 to 6 additional millions euros. That gives then quite a lot of financial means for post-refactoring... ;-)
    regards
    didier

  4. Back to top

    Re: Is it really what we want to have?

    Jul 6, 2009 5:17 PM by Cloves Almeida

    I'm truly impressed by the accomplishment. The beauty is that the decisions were made using a good mix of "tech" and "business" arguments. While writing 1:1 procedural code makes no sense from a pure technological perspective, it clearly was the best decision from a strategic POV. Since you've created lot's of test cases, you're free to refactor your code with little risk.

    My company (much smaller) is trapped into a MF Cobol ERP running Linux and feature-wise, the dinosaur is clearly constraining our growth. We still have some quick gains to achieve in COBOL land - like migrating data into SQL instead of using ISAM. But I know that eventually such change will be necessary.

    Thank you for this contribution to the OSS world.

  5. Back to top

    subject

    Jul 7, 2009 3:52 PM by Neil Murphy

    What's good is the decisions were pragmatic business driven ones, not by propeller headed purists. Too many purists treat software development as almost a religious experience and defend technical positions with religious fanaticism, but totally forget that they build systems to meet a business need. Well done Didier and team.

  6. Back to top

    Re: Is it really what we want to have?

    Jul 8, 2009 5:54 PM by Fiona Oliver

    Hi Cloves, I saw your comment about migrating COBOL data to SQL. I hope you won't mind if I suggest a product that could meet your requirement: www.transoft.com/products/dbpronto. This product allows you to replace your MF COBOL data with any RDBMS with no code changes.

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.