InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

COBOL to Java Automatic Migration with GPL'ed Tools

Posted by Dionysios G. Synodinos on Jul 03, 2009

Sections
Process & Practices,
Development
Topics
Java ,
Change
Tags
migration

During the NACA project run by Publicitas Ltd., 4m lines of COBOL were automatically trans-coded (migrated) toward their Java equivalent. The company claims that the recurrent annual savings in cash-outs amount to a total of 3m euros and has released the tools from the NACA project under GPL.

Didier Durand and Pierre-Jean Ditscheid made a presentation about it at at Jazoon09 last week, which is available online.

Pierre described the architecture of the "transcoding compiler":

  • many levels of cache to maximize performances of the new Java version of the old application. Through them, our Java-transcoded transactions and batches have better performances than their Cobol ancestors used to have on mainframe.
  • pre-allocation of all program variable structures (COMMAREA of COBOL) to further improve performances but also to minimize garbage collection that freezes the system while running.
  • strongly object-oriented architecture of resulting Java objects in order to maximize the effect of all controls done by compiler. As example, each old COBOL program becomes a Java class whose existence is checked at compile-time rather than at runtime. Very useful when your application is 4 millions lines of code like ours and when you want to track down every typing mistake in a continuous integration architecture like ours
  • strong integration with Eclipse IDE for highest productivity for developpers: we even developed a plug-in to facilitate debugging and edition of old COBOL programs from Eclipse
  • line-by-line equivalence between old COBOL programs and newly transcoded Java classes. The home developers don't get lost: they receive afterwards a Java application with the exact same structure as the original COBOL version
  • support of IBM JVM as well as Sun JVM in order to also allow for the transcoding of stored procedures

     

  • support of distinct character sets and encoding schemes (EBCDIC) between mainframe & Linux. Support of all resulting possibilities for data sorting.
  • full management of multi-level COBOL data structures in Java independently of the UTF encoding (2 bytes per char) used by Java
  • transparency of wrapping framework (raw JVM, Apache Tomcat, etc...) for the application
  • etc...

While Didier emphasized the key aspects of such a project:

  • economic motivation as core driver: move from a multi-million (CHF or euros) mainframe environment to an incredibly cheap and nimble farm of Linux Intel-based servers. The massive savings (3 millions euros / year in our case) allow for a quick auto-financing of the project far before its end. The main virtue of Open Source for a company like us remains clearly its very very low price.
  • migrate people with technology: we believe that we succeeded in our project because we clearly demonstrated very early on to the people in place that they would find a new interesting job in the final constellation. That generated their full commitment to the project!
  • iso-functionality as a must: migrating in such a manner prevents months of discussion about the final target. But, mostly, it allows for 100% automatic migration, a key factor for quality in the transcoding.
  • no big-bang but numerous reversible steps: such a total migration with (tens of) thousands of new steps can never successfully reach the ends if you try big steps. Permanent incremental progress toward the goal is a much better approach. The nice consequences: small steps generate smaller local trouble when problems arise. Your users remain much more patient this way! Our experience was so...

The tools can be downloaded and the distribution contains:

The tools that we deliver today (v1.0) in the zip package:

  • Doc: a set of documents explaining in details the tools and libraries. Your feedback around this documentation, its missing points, etc. is essential in order to improve it.
  • NacaTrans (license GPL - approx. 83′00 lines of code code & 690 Java classes): our transcoder that allowed us to convert 100% automatically the 4 millions lines of our PUB 2000 application in COBOL to Java. It is very much based on compiler technologies. It analyzes the structure of the initial COBOL programs (supposed 100% valid) to bring them in an intermediate xml structure before generating the final Java code that extensively calls functions and uses classes of the runtime library NacaRT, itself depending on JLib. This new Java source code was very carefully designed: each line of Cobol generates very intentionally a single corresponding line of Java. The aim is to look like as much as possible like the original COBOL code in order to ease the maintenance by the original developers / maintainers who master the structure of their original Java programs. The completeness of the accepted syntax for all variants of Cobol is of course not guaranteed. But our own 4 millions of lines of code as well as additional tests on other external application tend to prove that the current coverage of Cobol by NacaTrans is already very high. We want to improve this coverage through your feedback for valid constructs that we don’t support yet.
  • NacaRT & Jlib (license LGPL - approx 153′000 lines of code & 890 Java classes): those are the 2 runtime librairie who provide all the runtime transactional services for the application. They emulate all teh functions of a classical transactional monitor like CICS from IBM. At the same time, they also support all the COBOL constructs (for example, COMMARÈA structure with multiple data representation masks, management of specific data format like COMP-X, etc.)
  • NacaRTTest (license GPL): this is a test suite allowing us to test the correct output of the transcoder on a set of reference COBOL constructs. It’s the way to validate part of our transcoding algorithms. For a new user of NACA, this is definitely the place to start: when this runs on oyur infrastructure, you can feel pretty confident about your installation of the package.

With a legacy of 50 years of COBOL and around 250bn LOCs in production,  it seems there is a considerable market for similar tools.

Dionysios G. Synodinos is a Web Engineer and a freelance consultant, focusing on Web technologies

Is it really what we want to have? by Marcin Niebudek Posted
Re: Is it really what we want to have? by Didier DURAND Posted
Re: Is it really what we want to have? by Cloves Almeida Posted
Thanks for reporting about NACA by Didier DURAND Posted
subject by Neil Murphy Posted
  1. Back to top

    Is it really what we want to have?

    by Marcin Niebudek

    If I understand it correctly NacaTrans produced 4m lines of Java code out of 4m lines of COBOL code. Do we really want those other 250bn LOCs of COBOL code to become 250bn LOCs of Java code? I don't think so.



    "[...] they receive afterwards a Java application with the exact same structure as the original COBOL version [...]" - in other words assuming that you start migrating the COBOL application that cannot be written better (in terms of a language capabilities, available libraries and the system architecture) - you end up with the worst nightmare you can think of in the Java world.



    Regards,

    Marcin

  2. Back to top

    Thanks for reporting about NACA

    by Didier DURAND

    Hi Dionysios,
    Thanks a lot for reporting about NACA.
    People interested can get in contact with me: didier.durand@publicitas.com or mediaandtech@gmail.com

  3. Back to top

    Re: Is it really what we want to have?

    by Didier DURAND

    Hi Marcin,

    1 to 1 line matching between Cobol & java was a deliberate move to keep the developpers "at home" after migration: they find afterwards the exact same code structure that they had before.

    I understand that this strategy may look odd / wrong to pure OO developpers but for us it allowed to keep our long-time developpers on board: they have to learn java (remember: they come from Cobol). So helping them by providing "close" source code in Java was our deliberate choice. Now, that they are confortable with Java, they progressively re-objectify (from a business perspective) the application.

    This way of doing saved us probably 12 to 24 months in our project, so 3 to 6 additional millions euros. That gives then quite a lot of financial means for post-refactoring... ;-)
    regards
    didier

  4. Back to top

    Re: Is it really what we want to have?

    by Cloves Almeida

    I'm truly impressed by the accomplishment. The beauty is that the decisions were made using a good mix of "tech" and "business" arguments. While writing 1:1 procedural code makes no sense from a pure technological perspective, it clearly was the best decision from a strategic POV. Since you've created lot's of test cases, you're free to refactor your code with little risk.

    My company (much smaller) is trapped into a MF Cobol ERP running Linux and feature-wise, the dinosaur is clearly constraining our growth. We still have some quick gains to achieve in COBOL land - like migrating data into SQL instead of using ISAM. But I know that eventually such change will be necessary.

    Thank you for this contribution to the OSS world.

  5. Back to top

    subject

    by Neil Murphy

    What's good is the decisions were pragmatic business driven ones, not by propeller headed purists. Too many purists treat software development as almost a religious experience and defend technical positions with religious fanaticism, but totally forget that they build systems to meet a business need. Well done Didier and team.

Educational Content

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?