InfoQ ホームページアーティクル Why BPEL is not the holy grail for BPM

Why BPEL is not the holy grail for BPM

2009年4月12日読了時間 39 分

Introduction

Looking at recent articles and various BPM solutions, it would be easy to assume that BPEL is now the defacto standard to be used when implementing a workflow engine. From a technical perspective this may well be correct, however few people will claim that BPEL can be easily understood by the end-user, a.k.a the business analyst, who definitely prefers a graph based notation such as BPMN. This article will provide guidance in understanding the discrepancy between the technical point of view (pro-BPEL) and analyst's (pro-BPMN). Going further, even if most BPEL-based BPM solutions agree on the discrepancy (since they usually provide a BPMN to BPEL mapping) this article will explain why it is not currently the solution to BPM problems. A real-world example will be used to illustrate our arguments.

Parallelism and Structuredness in Programming Languages

Developers and BPM users may believe that BPEL [BPEL07] is a structured language since it is basically based on blocks, much as traditional languages such as Java and C amongst others. This comes in part from its origin: Microsoft's XLANG, which was block based. However, BPEL origins also include IBM's WSFL, and this is of great importance for the following discussion as it was graph based (hence unstructured). We find in BPEL a mix of structuredness (blocks) and unstructuredness (control-links and events). Those last constructs introduce a bit of unstructuredness into a world of structuredness... The conclusion is that BPEL is not a structured languages even if it looks like a structured language.

On the other hand, BPMN is a flow-chart notation which is naturally unstructured. No doubt about this. In chapter 11 (page 137) of the BPMN specification [BPMN06], a direct mapping from BPMN to BPEL is provided. Some BPMN editors (and users) believe that BPMN is a simple GUI for the underlying BPEL language. This is not quite true as explained in the BPMN FAQ:

"By design there are some limitations on the process topologies that can be described in BPEL, so it is possible to represent processes in BPMN that cannot be mapped to BPEL".

This article, will provide some fundamental details to that very important statement. But let's first focus on structured versus unstructured languages. Why is it so important? The main point is that it is much harder to perform code analysis on an unstructured language than a structured one (such as Java, C, and most --- if not all --- widely used programming languages). Code analysis has a wide range of applications, from error checking (e.g. compilers), to bug detection (e.g. findbugs, deadlock detection, ...) and quality checking (e.g. check style).

An important theorem from Böhm and Jacopini [BOHM66] (and vulgarized on Wikipedia) states that every computable function can be implemented in a programming language that combines subprograms in only three specific ways. These three control structures are:

executing one subprogram, and then another subprogram (sequence);
executing one of two subprograms according to the value of a boolean variable (selection);
executing a subprogram until a boolean variable is true (iteration).

This basically means that any (unstructured) flow-chart can be transformed into a structured one. This formed the basis of Dijkstra' paper "Go To Statement Considered Harmful" [DIJKSTRA68].

There is still a debate on whether we should allow unstructured programming language or not. Nevertheless, the facts are:

most students in the world are taught structured programming;
the most widely used programming languages are (non-strict) structured programming languages;
most unstructured programming languages have introduced some structured constructions (BASIC, COBOL, FORTRAN).

So in general, most programmers do focus on structured programming but sometimes use unstructured constructions (goto, jump, break, exceptions) for various reasons (mainly readability, maintenance, sometimes performance).

Business Analysts Write Parallel and Unstructured Processes

A business analyst (BPM end-user) has to deal with the real world¹, which is in essence not only unstructured but highly parallel. This has two implications:

BPM end-users are usually not computer engineers, neither computer scientists: they design business processes using flow-chart notation (this is the most natural) for them, hence, unstructured (and parallel, this is important);
in the face of parallelism, unstructuredness is more expressive than structuredness.

Point 2 is of great importance and has been formally proven by Kiepuszewski et al. [KIEPUSZEWSKI00]. The fact is that there are parallel unstructured, workflows that cannot be expressed into a parallel, structured one. Actually, those cases are quite simple to find. Consider this example using the BPMN notation (created with the Intalio BPMN designer):

In a tool that uses BPEL as its underlying format, a separate pool has to be created in order to validate that diagram. We will discuss this problem later, but the main point here is to focus on the pool called 'Employer' and the 6 activities it contains. When a new employee arrives at a company, a workflow is instantiated. First a record needs to be created in the Human Resources database. Simultaneously, an office has to be provided. As soon as the Human Resources activity has been completed, the employee can undergo a medical check. During that time, a computer is provided. This is only possible when both an office has been set up and an account created in the information system from the human resource database. When both the computer and the medical check have been performed, the employee is ready to work. Of course, you may want to model this simple process differently, but the point I want to make here is that you cannot define a structured parallel workflow that is equivalent ² to this one, I mean, you will never be able to! We will use this simple example throughout this document.

As a first conclusion from this study we have:

developers naturally write their programs using sequential structured constructions (blocks);
BPM users naturally design their processes using unstructured and parallel constructions (graph);
unstructured and parallel workflows are more expressive than structured parallel ones.

There are workflows that BPM users will design that you definitely cannot express equivalently into a structured parallel workflow. Worse, also in [KIEPUSZEWSKI00], the author shows that even when the transformation from a parallel unstructured workflow to a parallel structured one is possible, it requires the addition of several variables and/or nodes, so that the final result is almost unreadable to the end user. We will speak soon about that readability problem.

Transforming BPMN to BPEL

In the article [OUYANG06], the authors proposed an automatic translation from BPMN to (readable) BPEL. They define a subset of BPMN due to unclear semantic in the specification. Hence, the OR Gateway and Error Intermediate Events are just not considered in their paper. This is a problem if the process being designed contains multiple end events since they are not considered by the transformation tool, and the standard way to convert from a multiple end events workflow to a single one is by using OR Gateways. The algorithm basically works like this:

Try to find well known patterns that map directly to BPEL (sequence, flow, pick, while, ...) in the graph. For each of these, replace the component by a simple task containing the mapped BPEL code.
Then, try to find some quasi-structured component and transforms them in such a way that most of the component is mapped directly to a BPEL structure.
Then, search for acyclic BPMN sub-models (containing only sequence flows and parallel gateways). For these, use BPEL control-links.
Finally, BPEL events are used for the rest.

The result is an "as readable as possible" equivalent BPEL process. Note that using events in BPEL we can always transform BPMN to BPEL. The problem is that the generated code is not readable at all. The conclusion is that an algorithm for transforming any BPMN diagram into an equivalent ³ BPEL process that is "as readable as possible" ⁴ does exist.

The Intalio Use Case

Note that this is certainly not the algorithm used by the Intalio BPM v2.0 solution for their transformation from BPMN to BPEL as shown by the simple example below. Taking the previous parallel unstructured BPMN diagram, the Intalio solution transforms it into the BPEL process, some of which is shown below:

To get a better picture of the transformation output, we opened the process into the Eclipse BPEL designer:

For some reason, labels for the activities 'Fill HR Db', 'Medical Check', and so on are missing, but in any case we can see from the BPEL source code that the BPMN activity 'Employee Arrival' has been transformed into a 'Receive' BPEL operation. To the business analyst, it is strange to now see 7 activities ('Receive' and 6 other 'Empty' activities) while our original process contained only 6. Looking at the BPEL source code the puzzle is solved: we can see that the activity 'Provide Computer' has been duplicated. In some ways, this is good for the employee: they will get two computers in their office!

Whether the Intalio BPMN2BPEL transformation algorithm produces a readable BPEL or not, is not the issue here: the problem is that the transformation is simply wrong. You can hardly imagine the result for diagrams produced by professional BPM designers focusing on real business processes since they are highly parallel and unstructured in essence.

The Readability problem

The claim in [AALST_UNKNOWNDATE] is that BPEL is not readable by the end-user, so there is a need for a high-level language in which processes are designed. Typically though, runtime information is required before the process can actually be executed. How will this be entered in into the resulting BPEL code? One could argue this is the reason it is important to generate a readable BPEL code. In the case where all the information is entered directly into the editor/designer, at least for debugging purposes, the BPEL code should be as readable as possible where readable means: using BPEL direct straightforward structure (sequence, flow, pick, wait, etc...) as much as possible.

On this readability issue, let's introduce the eclipse JWT sub-project whose aim is to provide a toolkit for Workflow management (designer, transformations, simulations and connections to engine). JWT currently uses the UML Activity Diagram notation (UML-AD for short) for designing workflows. UML-AD is strictly equivalent (in terms of expressive power) to BPMN (see [WHITE_UNKNOWN_DATE] for details). So, using an UML-AD notation, we can represent the previous BPMN diagram in JWT:

JWT has been developed to be extensible, and provides different transformation plugins. One of them is a UML-AD2BPEL transformation provided as a research project from the University of Augsburg. The BPEL transformation plugin outputs a BPEL-WS-1.1 document that is 518 lines long (provided at the end of this document). Note that neither the Eclipse BPEL Designer nor the Intalio Designer were able to properly display this BPEL file. We used Netbeans for that purpose. The diagram produced is too complex to be presented here (see the resources section for a partial rendering of the diagram). The JWT2BPEL transformation tool uses BPEL events extensively in order to get the equivalent ⁵ BPEL representation. For the courageous readers who tried to read and understand the resulting BPEL file, it should be obvious that it is very hard.

So the simple parallel and unstructured process expressed using pure workflow notation such as BPMN or UML-AD is hardly representable into a “readable” BPEL format. This is a general fact, and not specific to the simple process we are discussing. The situation is worse when BPMN-to-BPEL round-trip engineering problems are taken into consideration: (from Wikipedia) “generating BPEL code from BPMN diagrams and maintaining the original BPMN model and the generated BPEL code synchronized, in the sense that any modification to one is propagated to the other”. Definitely, using BPEL as the final format for executing business processes expressed using a natural workflow notation such as BPMN is asking for trouble.

Note that making a BPMN diagram out of a BPEL process seems to be much easier than the reverse: transforming structured elements to unstructured elements is straightforward. Note that such a transformation --- BPEL2BPMN --- is provided in Intalio in the Java class available here. It seems to be in the core of STP and it is (currently) not fully BPEL compliant seeing the comments in the class:

/*
 * Very basic sample that generates BPMN out of a BPEL file.
 * The BPEL parsed is a small subset of the bpel spec:
 * scope, assign, receive, reply, invoke, flow, sequence.
 * ...
 */

Still, importing the BPEL file generated by the Intalio Designer itself does not work for obscure reasons. Nevertheless, the round-trip problem is about synchronizing two quite different representations of the same process: one in BPMN and one in BPEL.

Conclusion

First, we have clarified a common misunderstanding: BPEL is not a structured language, but it is based on a structured language (block-based). In some ways, BPEL is much closer to a standard language such as Java than to a natural workflow notation such as BPMN (which is graph-based). Until now, programmers have dealt directly with their language. Integrated Development Environments are used to simplify several recurrent steps such as compiling, refactoring, testing and so on. But programmers “speak” their language directly. We claim that it should be the same with BPEL. An IDE can only simplify programming (note that we don't use the term “designing” here). But BPEL programmers will have to “speak” BPEL in order to use it and make something useful out of it. The question of whether BPEL is “speakable” or not by general technicians --- as Java can be --- is out of scope of this article, but is definitely an interesting question.

For the business analyst however, it is clear that BPEL is not user-friendly. BPEL is hard to read, hard to learn, hard to implement, and most of all as this is the major end-user concern: hard to hide. We have already noticed that when creating the “Employer” pool in the simple example used in this document we were constrained to having to create another “non-executable” pool in order to generate the BPEL file. Many other BPEL related baggage is present in the Intalio designer that is irrelevant from the point of view of the BPMN analyst: e.g. namespaces, web-service invocations, and XML data type amongst others.

Therefore, we consider BPMN notation as the only currently viable solution for Business Analysts ⁶. Nevertheless, many execution details, absent from the BPMN specification --- and unknown at design time from the analyst --- will have to be specified before the process can actually get executed. This information is usually technical in nature and site (e.g. mail server address, task repository) or implementation dependent (e.g. web service, J2EE service or .NET service). It is therefore of a great importance that the process skeleton on which the technician will enter the environmental context is both equivalent to the original BPMN process in terms of execution semantics (bisimulation), and easy to read to ensure that modifications made do not change the process behaviour.

Transformation from BPMN to (readable) BPEL is quite hard to implement, and produces --- when correct --- hardly readable code. By the way, the round-trip problem is even harder. This last problem unless resolved, makes BPEL a very difficult target for the output of a process designed by a real business analyst.

Therefore, we may wonder why BPMN is transformed to BPEL since there exist a graph-based standard that maps directly BPMN constructs --- namely XPDL v2.0. With this mapping, XPDL v2.0 becomes the natural BPMN persistence file format. Moreover, it specifies behaviours that were only available previously in BPEL such as Web Service invocations and compensations. Of course, one may claim that XPDL 2.0 lacks some execution specifications that makes him unsuitable for direct execution. We believe that using the BPEL semantic wherever XPDL is under-specified makes room for an engine that can be fully BPMN v2.0, XPDL v2.0 and BPEL compliant. This is how Bonita and Orchestra team will implement their next generation of BPM engine. But this is another story, that requires an article of its own... Stay tuned!

I would like to thank the Bonita & Orchestra team for their help and support during the writing of this article, and especially Miguel Valdes-Faura for his review and suggestions. I would also like to thank Gavin Terrill for his help on corrections and final touches.

Pierre Vignéras

Bull, Architect of an Open World™

*BPM Team*, Bull R & D

Bibliography

[BPEL07] OASIS, The BPEL Specification; (2007)
[BPMN06] OMG, The BPMN Specification; (2006)
[BOHM66] Bohm, Corrado and Giuseppe Jacopini, Flow diagrams, Turing machines and languages with only two formation rules; (1966)
[DIJKSTRA68] Dijkstra, Edsger, Go To Statement Considered Harmful; (1968)
[KIEPUSZEWSKI00] B. Kiepuszewski and A. H. M. Ter Hofstede and C. Bussler, On Structured Workflow Modelling; (2000)
[MILNER89] Milner, R. 1989 Communication and Concurrency. Prentice-Hall, Inc.
[EKIE89] Eike Best and Raymond Devillers and Astrid Kiehn and Lucia Pomello, Concurrent bisimulations in Petri nets; (1991)
[OUYANG06] Chun Ouyang and Marlon Dumas and Arthur H. M. Ter Hofstede, From Business Process Models to Process-oriented Software Systems: The BPMN to BPEL Way; (2006)
[AALST_UNKNOWNDATE] Wil M.P. van der Aalst1,2 and Kristian Bisgaard Lassen2, Translating Workflow Nets to BPEL
[WHITE_UNKNOWN_DATE] Stephen A. White, Process Modeling Notations and Workflow Patterns

Footnotes

One may say that programmers also have to deal with the real world. But they are definitely not at the same abstraction level (what would be the point otherwise defining specific notations for analysts?).
Of course we have to define what 'equivalent' means. Usually, in the processes world, the formally defined bisimulation [MILNER89] is used. But since it cannot make a difference between a parallel execution and its sequential simulation -- something that BPM users want, the notion of fully parallel bisimulation [EKIE89] is used instead.
According to the notion of fully parallel bisimulation
According to the common idea of a human readable BPEL, i.e. made of block wherever possible.
By the way, proving formally that the resulting BPEL file is really equivalent (bisimulation) to the original BPMN diagram is left as an exercise... ;-)
UML-AD notation may also be used as its expressive power is equivalent to BPMN. Nevertheless, we think that BPMN is closer to BPM analyst needs.

Resources

1. The JWT2BPEL transformation outputs this BPEL file.

2. Using NetBeans to get a graphical representation of the BPEL process presented in the previous resource, we get this diagram. This diagram represents the whole process, with the central part collapsed (the node with a '+' symbol). If we expand this node, we get an expanded diagram. On that diagram, we can see two nodes with a '+' symbol inside. They are respectively the 'Employee Arrival' activity and the 'Ready to work' activity of the BPMN original diagram. From the 'Employee Arrival' node, we see a BPEL scope labelled C5 from which events are attached. Those events are seen on its right side. Those events are used to implement in BPEL the parallel and unstructured flow specified in the original BPMN diagram.

Why BPEL is not the holy grail for BPM

Introduction

Parallelism and Structuredness in Programming Languages

関連スポンサーコンテンツ

Business Analysts Write Parallel and Unstructured Processes

Transforming BPMN to BPEL

The Intalio Use Case

The Readability problem

Conclusion

この記事に星をつける

このコンテンツのトピックは エンタープライズアーキテクチャ です。

関連記事:

関連記事

トレンド

特集コンテンツ一覧

InfoQ ニュースレター

このコンテンツのトピックはエンタープライズアーキテクチャです。