BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Breaking Down Data Silos with Foreign Data Wrappers

Breaking Down Data Silos with Foreign Data Wrappers

Bookmarks

The digital revolution is wreaking havoc on data management systems. The rapid growth of data has made it more difficult than ever for companies to store, manage and make sense of the information they collect. At the same time, as data becomes more varied, enterprises are not only harnessing massive amounts of structured data from a growing network of connected devices but also semi-structured and unstructured data as well.

As a result, the need for solutions that can support multiple data types has turned the typical data center into a patchwork of data management technologies used to handle the volume, velocity and variety of big data. These include relational databases, standalone NoSQL solutions and specialized extensions to handle geographic data, to name a few. In fact, a recent research from Dell found database administrators are responsible for an infrastructure much more complex than ever before, with 72 percent saying the number of databases they must support is rising. The survey also found that among respondents, 80 percent said they must support multiple apps, 45 percent said they have to support multiple user groups and 72 percent said their responsibilities are increasing overall.

At the same time, developers are also facing added pressure to speed application delivery and respond more directly to line of business needs. This has given rise to ‘shadow IT’ efforts – a term used to describe IT systems and solutions built or used within an organization without the involvement of IT leadership. Many developers have turned to NoSQL-only database solutions. But using these technologies to spin up clusters on and off premises for new specialized applications that analyze the new unstructured data from sources like social media and mobile or applications using pictures or graphics, introduces data silos – making it ever more difficult to gain insights over the holistic data landscape. This also makes it more difficult for enterprises to manage the data and ensure its long-term integrity.

Integrating Foreign Data Types in Postgres

Postgres has a solution for this dilemma: a feature called a Foreign Data Wrapper (FDW), the first of which was developed four years ago by PostgreSQL community leader Dave Page and based on the SQL standard SQL/MED (SQL Management of External Data). FDWs provide a SQL interface for accessing remote and large data objects in remote data stores, enabling DBAs to integrate data from disparate sources and bring them into a common model under the Postgres database.

Through this, DBAs can access and manipulate data being managed by other systems as if it were within a native Postgres table. For example, using the FDW for MongoDB, database administrators can query the data from the document database and join it with data from the local Postgres table using SQL. Using this method, users can view, sort and group the data as rows and columns or as JSON documents. They can even write (insert, update or delete) data directly in the source document database from Postgres as if were a single seamless deployment. The same could be done with a Hadoop cluster or MySQL deployment. FDWs allow Postgres to function as a central federated database, or ‘hub,’ for enterprises.

With these capabilities in mind, FDWs can ease the numerous pains associated with managing diverse data siloes deployed by NoSQL solutions. Firstly, if not properly handled (and many are not) applications using NoSQL-only solutions can erode the long-term value of enterprise data. According to Gartner, “By 2017, 50% of data stored in NoSQL DBMSs will be damaging to the business due to a lack of applied information governance policies and programs.”[i]

Postgres FDWs can help DBAs and developers to ensure the integrity of their data, increase developer productivity and extend the value of current resources.

Ensuring Data Integrity

NoSQL-only databases are not ACID compliant; as a result, achieving the robust data integrity that enterprises typically need across their data set requires complex application development. Also, because NoSQL-only solutions only store data and do not process it, the data must be brought to the application for analysis. This means the application (and each individual application developer) is responsible for efficiently accessing data, implementing business rules, and for maintaining data consistency – this limits uniformity across the system and enhances complexity for DBAs who deploy the data into the application.

Further complications can arise due to the fact that each NoSQL database product uses a different representation for its data and its data access/manipulation language, so organizations may find themselves with multiple but incompatible NoSQL solutions. Thus, pushing the processing out of the database and into the applications— especially in the absence of ACID compliance— can be problematic for two reasons:

  • The creation of more data silos. As each application requires its own data store, enterprise data becomes fragmented, separated into silos and loosely governed. This makes it difficult or nearly impossible for enterprises to effectively consolidate, trust and make use of the data stored in their multiple NoSQL-only database silos. It also makes changes to individual data objects difficult as it requires identifying and updating across multiple applications.
  • The loss of control. Enterprises can lose control over application logic because each developer is using their solution’s individual processing language and applying their own preferred techniques. This can further erode the quality and usability of data from NoSQL-only applications.

Increasing Developer Productivity

Developers today have the ability to reuse code and tap stored functions, speeding development to just days rather than weeks. However, NoSQL-only solutions are unable to support stored procedures, and because they do not represent one single technology, they can inhibit the ability to reuse code, establish standards and find talented resources. Postgres FDWs can streamline efforts for developers in three ways:

  • Support of structured and unstructured data types. Developers using Postgres are no longer bound by the confines of a rigid data model. With JSON/JSONB for supporting and processing document data and the HStore data type for key/value pairs, Postgres provides the flexible data models that developers need to build applications capable of evolving with changing business objectives.
  • Flexibility. Postgres supports unstructured data stores but then enables developers to apply schema rules to selected data according to business needs. Postgres is also capable of providing enhanced performance using the same eventual consistency techniques used by NoSQL solutions.
  • Centralized data management. Developers can integrate external structured and unstructured data within Postgres and enable Postgres to read and write SQL queries to foreign data sources. There are FDWs for MongoDB, CouchDB, MySQL, Redis, Neo4j and even Twitter and more.

New solutions bring benefits but also challenges, requiring new skill sets, maintenance requirements, operational processes and deployment, tuning and upgrade demands. Utilizing NoSQL-only solutions for application deployment may help developers speed delivery but the costs on the overall environment and enterprise data strategies are significant. Postgres FDWs enable organizations to leverage existing data deployments with a balance of enterprise-ready relational DBMS combined with NoSQL capabilities in a unified platform. Bringing disparate and potential siloed data sources together, FDW’s allow IT departments to obtain a comprehensive and holistic views of the information and the goods, services, stakeholders or other objects the information represents. With this, enterprises can make better-informed, more intelligent decisions and recommendations to achieve business goals and drive strategic initiatives.

MongoDB FDW in Action

Given the popularity of MongoDB, an updated FDW for MongoDB has been created that incorporates new WRITE capabilities enabled by enhancements in PostgreSQL 9.3. Initially, FDWs could only support read capabilities. With expanded utility as well as performance boosters and data stability the EDB developed for the FDW, It will make it easy to manipulate data in MongoDB data using simple SQL statements (SELECT / INSERT / UPDATE / DELETE). 

The FDW can be downloaded from the Github repository here.

Following are some examples of using the mongo_fdw with the MongoDB equivalent statements.

-- load extension first time after install
CREATE EXTENSION mongo_fdw;
-- create server object
CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '127.0.0.1', port '27017');
-- create user mapping
CREATE USER MAPPING FOR Postgres SERVER mongo_server OPTIONS (username 'mongo_user', password 'mongo_pass');
-- create foreign table (Note: first column of the table must be "_id" of type "NAME".)
CREATE FOREIGN TABLE warehouse( warehouse_id int, warehouse_name text, warehouse_created timestamptz) SERVER mongo_server OPTIONS (database 'db', collection 'warehouse');
-- select from table
SELECT * FROM warehouse WHERE warehouse_id = 1; _id          | warehouse_id | warehouse_name | warehouse_created ------------------------+----------------+--------------------------- 53720b1904864dc1f5a571a0|            1 | UPS            | 12-DEC-14
12:12:10 +05:00
-- corresponding find statement in MongoDB
db.warehouse.find({"warehouse_id" : 1}).pretty() { "_id" : ObjectId("53720b1904864dc1f5a571a0"), "warehouse_id" : 1, "warehouse_name" : "UPS", "warehouse_created" : ISODate("2014-12-12T07:12:10Z") }
-- insert row in table
INSERT INTO warehouse values (0, 1, 'UPS', to_date('2014-12-12T07:12:10Z'));
-- corresponding insert statement in MongoDB
db.warehouse.insert ( { "warehouse_id" : NumberInt(1), "warehouse_name" : "UPS", "warehouse_created" : ISODate("2014-12-12T07:12:10Z") })
-- delete row from table
DELETE FROM warehouse where warehouse_id = 3;
-- corresponding delete statement in MongoDB
db.warehouse.remove( { "warehouse_id" : 3 })
-- update a row of table
UPDATE warehouse set warehouse_name = 'UPS_NEW' where warehouse_id = 1;
-- corresponding update statement in MongoDB
db.warehouse.update ( {       "warehouse_id" : 1    }, { "warehouse_id" : 1, "warehouse_name" : "UPS_NEW" } )
-- explain a table
EXPLAIN SELECT * FROM warehouse WHERE warehouse_id = 1; QUERY PLAN ----------------------------------------------------------------- Foreign Scan on warehouse  (cost=0.00..0.00 rows=1000 width=44) Filter: (warehouse_id = 1) Foreign Namespace: db.warehouse b.warehousedwarehousems (4 rows)
-- collect data distribution statistics`
ANALYZE warehouse;

EnterpriseDB Database Architect Ibrar Ahmed wrote the sample code contained in this article. 

About the Author

Lenley Hensarling currently works as the Vice President of Strategy and Product Management at EnterpriseDB. He plays a major role in setting EDB’s strategic direction through product development and customer and partner interactions. Lenley has more than two decades of experience in the software industry in large enterprise technology organizations. A long time J.D. Edwards executive, Lenley went on to leadership roles at PeopleSoft then Oracle. His roots are in engineering and he quickly rose to vice president of engineering at Novell. Lenley received his BA from the University of Texas at Austin in economics.

Rate this Article

Adoption
Style

BT