Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Data Solutions Framework: an Open Source Project for Building Data Solutions on AWS

Data Solutions Framework: an Open Source Project for Building Data Solutions on AWS

This item in japanese

AWS recently released the Data Solutions Framework (DSF), an opinionated open-source framework designed to accelerate the creation of data solutions on AWS. Built using the AWS CDK, the framework exposes abstractions and patterns as building blocks for constructing data solutions and is available in TypeScript (npm) and Python (PyPi).

DSF provides building blocks packaged as standard L3 AWS CDK Constructs to compose data solutions on AWS. These building blocks offer customization capabilities and can be composed with any other CDK Construct, including those available through Construct Hub, a collection of open-source CDK libraries. Lotfi Mouhib, principal solutions architect at AWS, Dženan Softić, senior solutions architect at AWS, and Vincent Gromakowski, principal solutions architect at AWS, write:

With DSF, data (platform) engineers can focus on their use case and business logic, and instead create a data platform from building blocks that represent common abstractions in data solutions such as a data lake. (...) While DSF is an opinionated framework, it provides deep customization capabilities for developers to adapt what they build to their specific needs.

The AWS CDK is an open-source software development framework to define cloud infrastructure in code and provision it through CloudFormation. While L1 constructs, known as CFN resources, are the lowest-level construct and offer no abstraction, L2 constructs, known as curated constructs, map directly to single CloudFormation resources. L3 constructs, known as patterns, offer instead the highest level of abstraction and contain multiple resources that are configured to work together to accomplish a specific task or service.

According to the authors, DSF is ready for production-ready workloads and follows data analytics best practices as described in the Data Analytics Lens of the Well-Architected Framework. DSF uses cdk-nag to enforce security and compliance, validating that the state of constructs complies with a given set of rules. Mouhib, Softić and Gromakowski add:

In DSF, we expose all resources that constructs create, so you can either use those directly in your AWS CDK application, or leverage AWS CDK escape hatches to customize, or decide to override AWS CloudFormation resources.

The Spark Data Lake example builds a data lake and processes data with Apache Spark, providing a multi-environment CI/CD pipeline with support for integration tests. Sebastian Gebski, principal solutions architect at AWS, comments:

It is dedicated strictly to building composable data analytics solutions - I believe that category of solutions didn't get as much love as it deserves so far (...) The initial release is heavily skewed towards data lakes (good!), but as we're talking about an Open Source project here, the directions of future development will heavily depend on what's interesting to the community.

Source: AWS blog

SDF is not the sole framework extending the AWS CDK: the Open Construct Foundation recently announced an initiative for a community-driven CDK construct library.

DSF is open source under Apache 2.0 license and provides a public roadmap.

About the Author

Rate this Article