InfoQ Homepage Data Warehouse Content on InfoQ
-
How Data Mesh Platforms Connect Data Producers and Consumers
A challenge that companies often face when exploiting their data in data warehouses or data lakes is that ownership of analytical data is weak or non-existent, and quality can suffer as a result. A data mesh is an organizational paradigm shift in how companies create value from data where responsibilities go back into the hands of producers and consumers.
-
Grammarly Replaces its in-House Data Lake with Databricks Platform Using Medallion Architecture
Grammarly adopted the medallion architecture while migrating from their in-house data lake, storing Parquet files in AWS S3, to the Delta Lake lakehouse. The company created a new event store for over 6000 event types from 40 internal and external clients and, in the process, improved data quality and reduced the data-delivery time by 94%.
-
Databricks Unveils Lakehouse AI and MosaicML Acquisition at Data + AI Summit
The Data and AI company Databricks recently unveiled Lakehouse AI, a suite of tools for building and governing generative AI models, including large language models (LLMs), within the Databricks platform. Among the tools were LakehouseIQ, a "knowledge engine" that uses AI to understand a company's unique data, culture, and language in order to improve natural language interfaces like chatbots.
-
Amazon Redshift Serverless Generally Available to Automatically Scale Data Warehouse
Amazon recently announced the general availability of Redshift Serverless, an elastic option to scale data warehouse capacity. The new service allows data analysts, developers and data scientists to run and scale analytics without provisioning and managing data warehouse clusters.
-
AWS Introduces Amazon Redshift Serverless
As part of a trend towards serverless analytics options, AWS announced the public preview of Amazon Redshift Serverless. The latest version of the managed data warehouse service targets deployments where it is difficult to manage capacity due to variable workloads or unpredictable spikes.
-
AWS Announces the Public Preview of AWS Data Exchange for Amazon Redshift
Recently AWS announced the public preview of AWS Data Exchange for Amazon Redshift. This new feature enables customers to find and subscribe to third-party data in AWS Data Exchange to query in an Amazon Redshift data warehouse.
-
Data Collection, Standardization and Usage at Scale in the Uber Rider App
Uber Engineering recently published how it collects, standardises and uses data from the Uber Rider app. Rider data comprises all the rider's interactions with the Uber app. This data accounts for billions of events from Uber's online systems every day. Uber uses this data to deal with top problem areas such as increasing funnel conversion, user engagement, etc.
-
Amazon Redshift Data Sharing Now Generally Available
Amazon has recently announced the general availability of the Amazon Redshift Data Sharing functionality to share live data across Amazon Redshift clusters. This allows the use of a single data warehouse cluster for multi-cluster deployments and sharing data instantly without the need to copy or move them.
-
The Distributed Data Mesh as a Solution to Centralized Data Monoliths
Instead of building large, centralized data platforms, corporations and data architects should create distributed data meshes.
-
Microsoft Announces Azure Synapse for Data Warehousing and Analytics
During Microsoft's annual Ignite conference the company announced a new analytics service called Azure Synapse. The service, which is a continuation of Azure SQL Data Warehouse, focuses on bringing enterprise data warehousing and big data analytics into a single service.
-
The Future of Data Engineering: Chris Riccomini at QCon San Francisco
At QCon San Francisco 2019, Chris Riccomini presented “The Future of Data Engineering”. The key takeaway of his talk is about reaching an end goal with data engineering, which is having a fully automated decentralized data warehouse.
-
Databricks Open Sources Delta Lake to Make Data Lakes More Reliable
Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...
-
William McKnight on Data Platforms and Creating a Modern Data Architecture
William McKnight gave a keynote presentation last week at Data Architecture Summit 2018 Conference on creating a modern data architecture using different data platforms.
-
Data Workflow Management Using Airbnb's Airflow
Airbnb recently opensourced Airflow, its own data workflow management framework. Airflow is being used internally at Airbnb to build, monitor and adjust data pipelines. Airflow’s creator, Maxime Beauchemin and Agari’s Data Architect and one of the framework’s early adopters Siddharth Anand discuss about Airflow, where it can be of use and future plans.
-
Snowflake Announces General Availability of their Cloud Data Warehouse Offering
Snowflake Computing has announced the general availability of their Snowflake Elastic Data Warehouse, a software as a service offering that provides a SQL data warehouse on top of Amazon Web Services.