Introduction to Data Services
Data services are software services that encapsulate operations on key data entities of relevance to the enterprise. Enterprise data is stored in multiple systems and require multiple interfaces or mechanisms to interact with them. There are varying channels (branch, Online, call center) and mechanisms (event driven, on demand, batch process) that need to be served as well adding additional challenges to data services. Without an abstraction layer for data consumers that insulate them from this complexity the enterprise will end up with a spaghetti of point to point integrations between data sources and data consumers.
Data services abstract the consumer from having to access or update multiple data sources and are critical in helping maintain data integrity when a consumer needs to work with multiple data sources. Additionally, they help build reusable data services that can be leveraged for multiple projects and initiatives. Data services also perform a critical governance function - they help centralize metrics, monitoring, version management, reuse of data types, and enforce data visibility and access rules.
Data services provide several additional benefits - data source abstraction, aggregation of data providers, reuse (generic, interoperable, flexible consumption patterns), alignment with logical data models , support for multiple service versions, provide value added features, and single point of interaction. Consequently, they serve as the foundation on which an enterprise can meet evolving business requirements on a continual basis.
a. Rationale for data services: As stated briefly above, data services have several key benefits to the enterprise. In this section each benefit is explained in more detail:
- Data source abstraction: Data services abstract the sources of physical data from consumers. This enables the data providers to change data structure (adding/dropping tables, columns, or other database objects), data format (going from plain text to XML), data persistence mechanism (changing from a single database to multiple, changing database vendors, adding table partitions), data exchange protocol (ODBC driver to OLE DB driver) and not impact every consumer in an adverse manner. The change to any of these parameters will only impact the data service code and will not force every single consumer to change their data access logic.
- Aggregation: data services allow the provider to use one or more data sources in order to construct a business entity. This idea not only is applicable for homogenous data sources but heterogeneous ones as well. E.g. the aggregation could be done across two databases or across a database and an XML document. This allows the data service to combine structured data with semi- or unstructured data. E.g. the data service could aggregate textual data such as disclaimer information from a data source along with Party Profile data from another source. The provider is able to construct a Party or an Account data message using a variety of data sources to aggregate the target message together for the consumer. This allows the consumer to not have to query/access multiple data sources and perform the aggregation. Additionally, by performing the aggregation the data service provides a simpler programming interface for the consumer to access, handle errors, and maintain.
- Reuse: Data services serve as reusable building blocks of operations on enterprise data. Data services that perform CRUD (Create, Read, Update, and Delete) and find operations on enterprise data are inherently reusable across multiple projects. Data services - also known as Entity Services - are reusable because of several characteristics - generic nature, platform interoperability, and support for multiple consumption patterns. Data services logic can apply to several business processes greatly facilitating reuse. For example - a Find Party service is applicable for finding a party while assigning an authorized individual on a retail account as well as looking up a party for merging duplicate profiles on a data-merge activity. They are inter-operable across multiple platforms and could be made available via multiple transport protocols such as HTTP and JMS. The data service can be accessed via the predominant message exchange patterns - fire/forget, request/reply, and publish/subscribe. Hence the consumer can access the service on demand via a user interface or via an asynchronous process requiring a reliable messaging transport.
- Alignment with logical data models: data services provide alignment with logical data model entities such as Profile, Account, etc. by retaining consistency in data structure and behavior of various data attributes. Without data services, each consumer might interpret physical data attributes in their own unique manner with the additional risk of being directly coupled with the underlying data structure of the provider. Data services allow reuse of data types across schemas and reuse of schemas across service operations. Since the schemas are defined against the logical data models defined/guided by information architecture the data services provide alignment with these logical models. This allows data services to leverage logical data values and logical data attributes (as opposed to system specific values and attributes). Why is this important? One of the biggest hurdles in decommissioning legacy systems is the pervasive propagation and use of physical legacy values and attributes across consuming applications. This tightly couples the legacy system with every consumer of the data. Data services effectively decouple consumers from legacy values and attributes by aligning with the logical data models.
- Support for multiple service versions: Data services allow the provider the option to expose one or more service versions. This enables the data service provider to pilot or offer a new version of a service to a smaller target consumer population. This also makes it convenient for the data service provider to offer new features in newer service versions and not force all consumers upgrade simultaneously. In the same vein, it allows the data service consumers to gracefully migrate to a newer version of a data service.
- Provide value added features: in addition to its primary function of operating on a data entity, the data service could provide value added services such as data caching (providing efficient/faster access to frequently accessed data), filtering (e.g. for clients that want to only receive a subset of publication messages related to a data entity), and subscription management (registration management of publications to clients).
- Single point of interaction: Data services act as a single point of interaction to data entities for consumers. They can use consistent metaphors/mechanisms to operate on enterprise data across different data domains (profile data, account data, cross references, relationships, etc.). This also facilitates enforcement of authentication (validate a consumer’s credentials) and entitlements (e.g. does the consumer have the entitlements to execute a service? Does the consumer have visibility into a confidential data attribute?). The single point of interaction also allows the organization to have a repeatable consumer integration process across data services.
b. Scope of data services: data services are primarily concerned with actions on data entities - period. Thus data services scope include various manipulations on data entities, aggregation of data across multiple disparate data sources, a facility to consume data interfaces from a variety of platforms using a variety of transport protocols, mapping between logical interface with physical provider interfaces, and graceful error handling of data service errors. Data sourcing and transfer of very large data extracts could use data services as well although traditionally those areas use ETL and data profiling tools. Business process orchestration logic and execution of line of business rules are out of scope for data services since they inhibit reuse. Logic that is specific to a particular user interface screen/application is also out of scope for a data service.
c. Data services development: Pursuing a “contract-first” approach to data services development, the service contract - input schemas and output schemas are developed based on the requirements. Schema design needs to follow several guidelines and best practices and it is important to review the key ones here.
- Schema attributes/types/elements are designed in conjunction with the logical data model published and governed by the information architecture function. This ensures that system or technology specific identifiers/values are not exposed in the public schema contract. Standard code values need to be used and new ones added wherever applicable to ensure that the schema is aligned with where the organization wants to go strategically.
- Existing contracts are examined to consider reuse opportunities. Business entity schemas are also governed by the information architecture function and standard schemas for enterprise data entities need to be reused wherever possible. E.g. Get Product and Create Product data service web methods could both use the same Product schema. This is also applicable when designing publication services - reuse schemas when publishing a data entity as opposed to exposing for an on-demand invocation. This will not only save time but also ensure that the data service consumer has a consistent definition of the data entity when preparing inputs or parsing outputs from data services.
- When designing WSDL contracts for data service consumers, schemas need to be imported in order for the WSDL documents to be consistent with the interfaces used to implement the data orchestration. It also ensures that WSDL documents are lightweight and modular.
- Schemas and WSDL documents should be validated against tools such as Web Services Interoperability (WS-I) WSDL Validator to ensure that the data service contract do not use platform/vendor/technology specific constructs inhibiting interoperability and data service reuse.
Once the contracts are determined, the data orchestration is designed in order to implement the contract. The data orchestration will consist of modular data service provider components being executed in sequence, in parallel, or a combination of both. This step in ten development process will decide exactly what calls are needed, in what order, which calls can be made concurrently, and which ones have dependencies on prior ones etc.
d. Data services consumption patterns: Data services consumption needs to be examined from several perspectives:
- Computing environment: data services could be consumed from a plethora of platforms. Majority of consumers will use the following: .NET common language runtime (CLR), java virtual machine runtime (JRE), mainframe system, and Unix/Linux. The bottom line is that the computing environment could be any one from which a web service call can be executed; a message to a reliable queue can be sent or received.
- Transport protocol: data services could be consumed via reliable (such as JMS via MQ Series) or unreliable transport messaging protocols (such as HTTP). Some data services might be offered only via a certain transport based on the functionality.
- Message exchange pattern: data services could be accessed via the four primary message exchange patterns - request/reply (tight SLA), request/reply (relaxed SLA), fire/forget, and publish/subscribe.
I am Vijay Narayanan, a software development team lead building reusable data services and business process automation components working for a financial services firm. I have worked on several software projects ranging from single user systems to large, distributed, multi-user platforms with several services. I blog about Software Reuse at http://softwarereuse.wordpress.com/.
Data Services are an SOA anti-pattern
Re: Data Services are an SOA anti-pattern
Are data services an anti-pattern? Just like everything else in architecture, the answer is based on the technology, business, and organizational contexts. The number of data sources, the complexity of data entity structure and data access patterns, volume of data exchange, performance, and degree of decoupling needed between consumers and providers all play into choosing the implementation characteristics of data services. If your organization has several silos of data including a mixture of legacy repositories and each data source integration is unique and costs resources to build and test become factors as well. There are specific concerns and techniques when implementing read-only services vs. read-write services. Finally, data services are not always accessed in synchronous request/reply fashion neither are they always accessed from a task or business service.
Personally I totally agree with Bill Poole's analysis.
Re: Data Services are an SOA anti-pattern
Don't confuse SOA with unified data tier
The inherent inefficiencies of XML, associated with large data handling, and the fact that almost all SOA suites are built upon Java containers – it necessitates the use of a highly optimized caching server within the MW tier to Federate IA using SOA. This added setup/maintenance cost of Caching server basically veers us to utilize SOA Data Service alongside Data Federation Tier.