Agile View of Big Data

An agile view of Big Data, wherein data is viewed as a real time stream, offers a new look at how data is managed. Using an agile data infrastructure, organizations can conquer Big Data challenges with a level of ease, flexibility and performance. White paper by codeFutures describes the Agile view of Big Data.

Whitepaper states that companies want to see a real time view of Big Data as what is happening now, in real time. With this type of visibility, organizations can respond immediately to changes in areas such as the market, user patterns, fraud detection, and many others. Therefore, companies need a different view of Big Data, that is flexible and agile in nature. They need to be able to access fast-growing Big Data in a way that is natural to our applications and needs.

Thus if being agile is our goal when it comes to software development processes and organizational management, why is it that data infrastructures continue to be a barrier to an agile approach? It can be simply stated this way:

Your process is agile…Why isn’t your data?

According to whitepaper, the current view of the data infrastructure is not effective because we treat Big Data as follows:

Considering a database as a static repository.
Relying on fixed schemas that do not directly reflect application needs.
Continuing the viscous cycle of adding complexity to the data architecture, placing increasing burden on application functionality and developers.

Cory Isaacson, CEO and CTO at codeFutures, delivered a session at Database Month NYC on what it really takes to achieve an agile data infrastructure. Cory says that what is needed is an agile view of Big Data instead of looking at a database as a static repository, in an agile Big Data infrastructure data is considered as a dynamic stream. The key to an agile Big Data infrastructure is its ability to create dynamic views of data as needed to meet application requirements.

White paper mentions some key capabilities for the ideal agile Big Data infrastructure as follows:

An agile Big Data platform must handle data in a continuous, real time stream.
It must be capable of dynamically constructing and maintaining any number of agile views of the data in order to satisfy application requirements.
Views must maintain data integrity.
The agile platform should isolate schema complexity from application developers, simplifying the application code.
The infrastructure must be fully scalable and reliable, both in terms of stream processing and dynamic views.
An agile data platform must support event processing, enabling a new class of real time application capabilities.
Views must be accessible in real time, supporting advanced dashboards and other similar capabilities.

InfoQ spoke with Cory on Agile Big Data.

InfoQ: Please explain how we can create an agile view for big data?

A view is created by listening to the real-time stream exposed by the Agile Data Tier. If you consider a simple MySQL order entry application, a view can be built by summarizing the data stream for orders (for example revenue by product). This real-time capability to keep downstream views in sync is the real power of the Agile Data Tier concept.

InfoQ: How can we manage creating views from different databases? Can we include some historical data mapping also while creating views?

Views can be created in any appropriate data engine, matching the requirements of the view to the ideal engine implementation. This includes key-value engines, relational engines, or specialized index engines. If there are multiple data engines feeding the Agile Data Tier that is easily supported as well; views "listen" to each of the different sources. If a source database has historical data, the view can be created by running an initial query against the data source as a starting point for the view. If full historical data is not required, the view can start from "now" going forward from the current point.

InfoQ: How easy is to create and maintain agile big data views?

This is a function of the tool used to provide the Agile Data Tier implementation. With CodeFutures' Agile Data technology, views will be very easy to create and maintain, that is the purpose of the entire product -- to make maintenance of dynamic views and real-time streams easy to take advantage of.

InfoQ: What are the pros and cons of this approach? Are there any restrictions for choosing the suitability of the application?

The only real drawback is that an Agile Data Tier does add some latency into the system, such that all views are not immediately propagated. For most applications this minimal latency is not an issue, but for something like a real-time trading system this would be an unacceptable cost. Otherwise, an Agile Data Tier can support all types of big data needs, from transactional to real-time analytics to complex reporting needs.

InfoQ: Would you like to share your experience working with agile big data?

We have been able to quickly solve a variety of application problems with the Agile Data Tier approach. It becomes very straightforward to satisfy requirements by understanding the desired real-time stream flow and views needed at intermediate stages. Because views are simple in nature, with straightforward schemas dedicated to a given purpose, it is much easier to work with than a traditional complex monolithic enterprise data model. A data architect need only be concerned with a given view for a given purpose; it is easy to set up as many additional views as needed to satisfy other requirements, each in its own package.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the Agile topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter