BT

PayPal's Gimel Analytics Platform Provides Unified Data API and GSQL

| by Srini Penchikala Follow 33 Followers on Apr 17, 2018. Estimated reading time: 3 minutes |

A note to our readers: You asked so we have developed a set of features that allow you to reduce the noise: you can get email and web notifications for topics you are interested in. Learn more about our new features.

GDPR for Software Engineers

This eMag examines what software engineers, data engineers, and operations teams need to know about GDPR, along with the implications it has on data collection, storage and use for any organization dealing with customer data in the EU. Download Now.

At PayPal, data engineers, analysts and data scientists work with a variety of datasources, compute engines, languages and execution models (stream, batch, interactive). This results in engineers spending lot of time managing these different data sources, impacting the time-to-market of their products.

The PayPal data team developed a new analytics platform called Gimel, which provides access to any data store using a single data API and SQL, and a centralized data catalog.

Romit Mehta and Deepak Chandramouli from PayPal spoke at the recent QCon.ai Conference about the platform and how it can be used to commoditize data access. They talked about the components of Gimel - Compute Platform, Data API, PCatalog, GSQL and Notebooks. They also announced the open source version of the framework.

InfoQ spoke with Mehta and Chandramouli about Gimel data platform and its support in the areas of security, data versioning and future roadmap.

InfoQ: Are there any differences in managing the data catalog (PCatalog) for transactional and analytics use cases?

Mehta & Chandramouli: Gimel API and SQL implementation today are focused towards Analytics platforms. Regardless of whether the storage type is Kafka, NoSQL, Relational or Document based - the data API remains the same & SQL provides the language abstraction. Within PayPal, we’re seeing requests coming from online / live systems to have a similar layer of abstraction. We are currently in the thought process for bringing a similar layer for online systems that require sub-second level responses.

InfoQ: How did you address the security and access control requirements for the data access on the Gimel platform?

Mehta & Chandramouli: Since all queries are submitted to the underlying systems as the user who has logged in, and because all queries are eventually executed by those underlying systems, all existing security policies and controls are maintained.

In addition, through the logging framework, Gimel keeps a log of every query executed including the query itself, whether there was data downloaded to local, and in the future, will also tag if any classified data was accessed.

Gimel at PayPal also honors the Ranger policies and works tightly with the Kerberized clusters.

InfoQ: How do you manage the data store versioning?

Mehta & Chandramouli: We partner with storage admins in PayPal to ensure our APIs have full support for the versions of storage supported by the infra team. In addition, if the storage teams have needs such as new instrumentation, we wire the same in our API so that all the clients inherit the implementation transparently. With this being said whenever version upgrade happens, in most cases clients need not change their code.

InfoQ: Can you talk about GSQL query language and how it differs from other similar frameworks like Spark SQL or Neo4j's Cypher?

Mehta & Chandramouli: GSQL today is a light weight implementation that intercepts user SQL, generates the corresponding data API code behind the scenes for Gimel Datasets, and then passes on the same to Spark SQL interpretor. On longer term, we are working on adding push down optimizations for SQLs that blend/join data from multiple storage types - say join Kafka, Hive, HBase, and write results to Elastic.

From a roadmap perspective, besides working on incremental features and updates, the team is also planning the following for Gimel:

  • Query optimization
  • Open source PCatalog (includes metadata services, discovery services, catalog UI)
  • Add support for Python; currently they support Scala
  • Release to open source the features added to Jupyter & Livy

If you would like to learn more about the Gimel platform or have any questions about its features, checkout the documentation, Slack Channel, User Forum and Developer Forum. You can also try Gimel first-hand by following these instructions.
 

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT