InfoQ Homepage Data Warehousing Content on InfoQ
-
Instacart Creates Real-Time Item Availability Architecture with ML and Event Processing
Instacart combined machine learning with event-based processing to create an architecture that provides customers with an indication of item availability in near real-time. The new solution helped to improve user satisfaction and retention by reducing order cancellations due to out-of-stock items. The team also created a multi-model experimentation framework to help enhance model quality.
-
Next Generation of Data Movement and Processing Platform at Netflix
Netflix engineering recently published in a tech blog how they used data mesh architecture and principles as the next generation of data platform and processing to unleash more business use cases and opportunities. Data mesh is the new paradigm shift in data management that enables users to easily import and use data without transporting it to a centralized location like a data lake.
-
Uber Open-Sourced Its Highly Scalable and Reliable Shuffle as a Service for Apache Spark
Uber engineering has recently open-sourced its highly scalable and reliable shuffle as a service for Apache Spark. Spark is one of the most important tools and platforms in data engineering and analytics. It is shuffling data on local machines by default and causes challenges while the scale is getting very large. Shuffle as a service is a solution developed at Uber for this problem.
-
Amazon Redshift Serverless Generally Available to Automatically Scale Data Warehouse
Amazon recently announced the general availability of Redshift Serverless, an elastic option to scale data warehouse capacity. The new service allows data analysts, developers and data scientists to run and scale analytics without provisioning and managing data warehouse clusters.
-
Google Launches a New Cross-Platform Data Storage Engine BigLake in Preview
At the recent Cloud Data Summit, Google recently announced the preview of BigLake, a new data lake storage engine that makes it easier for enterprises to analyze the data in their data warehouses and data lakes.
-
AWS Announces the Public Preview of AWS Data Exchange for Amazon Redshift
Recently AWS announced the public preview of AWS Data Exchange for Amazon Redshift. This new feature enables customers to find and subscribe to third-party data in AWS Data Exchange to query in an Amazon Redshift data warehouse.
-
Amazon Redshift Data Sharing Now Generally Available
Amazon has recently announced the general availability of the Amazon Redshift Data Sharing functionality to share live data across Amazon Redshift clusters. This allows the use of a single data warehouse cluster for multi-cluster deployments and sharing data instantly without the need to copy or move them.
-
The Future of Data Engineering: Chris Riccomini at QCon San Francisco
At QCon San Francisco 2019, Chris Riccomini presented “The Future of Data Engineering”. The key takeaway of his talk is about reaching an end goal with data engineering, which is having a fully automated decentralized data warehouse.
-
Databricks Open Sources Delta Lake to Make Data Lakes More Reliable
Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...
-
Data Workflow Management Using Airbnb's Airflow
Airbnb recently opensourced Airflow, its own data workflow management framework. Airflow is being used internally at Airbnb to build, monitor and adjust data pipelines. Airflow’s creator, Maxime Beauchemin and Agari’s Data Architect and one of the framework’s early adopters Siddharth Anand discuss about Airflow, where it can be of use and future plans.
-
Software Defined Data Mart In The Enterprise Using Metanautix Quest
Metanautix recently announced the newest version of its product, Quest. Quest allows enterprises to build software defined data marts that can run in virtualized servers. Designed from the ground up with security and auditability in mind, Quest can deal with Big Data workloads and encapsulate it into different logical views, ready for consumption by different departments in enterprise.
-
Thoughtworks Technology Radar March 2012
ThoughtWorks recently published the latest update to its Technology Radar; a report produced to help technology decision makers understand emerging trends in software development techniques, tools, languages and platforms. There are some interesting observations of interest to Agile software development teams.
-
What’s New in SQL Server 2012 RC0
Microsoft has released SQL Server 2012 Release Candidate 0. There are many new features, including: AlwaysOn, better performance management, more reporting and visualization tools, Columnstore index, and FileTables. The product will come in 3 main editions: Standard, Business Intelligence and Enterprise.
-
Olap4j 1.0: a Java API for OLAP Servers
Business Intelligence vendor Pentaho has announced the release of olap4j 1.0, a new, common Java API for any online analytical processing (OLAP) server.
-
Column-based Storage in SQL Server 2011
Imagine ad hock data mining queries against a single table with 1 TB of data and 1.44 billion rows coming back in roughly a second. This is the scenario Microsoft intends to support using 32-core machines and their new column-based storage engine.