InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

How Twitter Automated Data Quality Check Process

Twitter engineering has recently shared a blog post on how they architected and developed a quality automation platform. Twitter digests and creates thousands of data sets for different data products and applications. The next natural step is to make sure of the quality of the data by adding automation on top of it. In this news post, we explore this architecture in more detail.

Reza Rahimi
on Dec 20, 2022
AI, ML & Data Engineering

Uber Freight Near-Real-Time Analytics Architecture

Uber Freight is the Uber platform dedicated to connecting shippers with carriers. Providing reliable service to shippers is crucial for Uber Freight. This is why the Carrier Scorecard was developed, with several metrics including on-time pickup/delivery, tracking automation, and late cancellations.

Claudio Masolo
on Nov 08, 2022
AI, ML & Data Engineering

Snap Way to Design Ads Ranking Service Using Deep Learning

Snap engineering has recently published a blog post on how they designed their ads ranking and targeting service using deep learning. Showing ads to the users is the mainstream of social network platform monetization. Snap ad ranking system is designed to target the right user at the right time. Snap is providing an excellent user experience while preserving user privacy and security.

Reza Rahimi
on Oct 23, 2022
Cloud

Azure Data Explorer Supports Native Ingestion from Amazon S3

Microsoft recently announced the ability to natively ingest data from Amazon S3 into Azure Data Explorer (ADX). The new feature simplifies multi-cloud data analytics deployments, bringing data from Amazon S3 to Azure, without relying on custom ETL pipelines.

Renato Losio
on Sep 07, 2022
AI, ML & Data Engineering

Next Generation of Data Movement and Processing Platform at Netflix

Netflix engineering recently published in a tech blog how they used data mesh architecture and principles as the next generation of data platform and processing to unleash more business use cases and opportunities. Data mesh is the new paradigm shift in data management that enables users to easily import and use data without transporting it to a centralized location like a data lake.

Reza Rahimi
on Aug 29, 2022
Cloud

Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery

Recently, Google announced the general availability of Bigtable federated queries, with BigQuery allowing customers to query data residing in Bigtable via BigQuery faster. Moreover, the querying is without moving or copying the data in all Google Cloud regions with increased federated query concurrency limits, closing the longstanding gap between operational data and analytics.

Steef-Jan Wiggers
on Aug 11, 2022
Cloud

Amazon Redshift Serverless Generally Available to Automatically Scale Data Warehouse

Amazon recently announced the general availability of Redshift Serverless, an elastic option to scale data warehouse capacity. The new service allows data analysts, developers and data scientists to run and scale analytics without provisioning and managing data warehouse clusters.

Renato Losio
on Jul 23, 2022
AI, ML & Data Engineering

Shopify’s Practical Guidelines from Running Airflow for ML and Data Workflows at Scale

Shopify engineering shared its experience in the company's blog post on how to scale and optimize Apache Airflow for running ML and data workflows. They shared practical solutions for the challenges they faced like slow file access, insufficient control over DAG, irregular level of traffic, resource contention among workloads, and more.

Reza Rahimi
on Jul 22, 2022
Architecture & Design

Fitting Presto to Large-Scale Apache Kafka at Uber

The need for ad-hoc real-time data analysis has been growing at Uber. They run a large Apache Kafka deployment and need to analyse data going through the many workflows it supports. Solutions like stream processing and OLAP datastores were deemed unsuitable. An article was published recently detailing why Uber chose Presto for this purpose and what it had to do to make it performant at scale.

Vasco Veloso
on Jun 20, 2022
Cloud

Amazon Elastic MapReduce Now Generally Available as a Serverless Offering

AWS recently announced that Amazon Elastic MapReduce (EMR) Serverless is generally available (GA). The offering is a serverless deployment option for customers to run big data analytics applications using open-source frameworks like Apache Spark and Hive without configuring, managing, and scaling clusters or servers.

Steef-Jan Wiggers
on Jun 07, 2022
Cloud

Google Introduces Autoscaling for Cloud Bigtable for Optimizing Costs

Cloud Bigtable is a fully-managed, scalable NoSQL database service for large operational and analytical workloads on the Google Cloud Platform (GCP). And recently, the public cloud provider announced the general availability of Bigtable Autoscaling, which automatically adds or removes capacity in response to the changing demand for applications allowing cost optimizations.

Steef-Jan Wiggers
on Jan 31, 2022
Cloud

Amazon OpenSearch Adds Anomaly Detection for Historical Data

Amazon OpenSearch recently introduced the support of anomaly detection for historical data. The machine learning based feature helps identifying trends, patterns, and seasonality in OpenSearch data.

Renato Losio
on Jan 29, 2022
Cloud

AWS Announces the Public Preview of AWS Data Exchange for Amazon Redshift

Recently AWS announced the public preview of AWS Data Exchange for Amazon Redshift. This new feature enables customers to find and subscribe to third-party data in AWS Data Exchange to query in an Amazon Redshift data warehouse.

Steef-Jan Wiggers
on Oct 27, 2021
Cloud

AWS Announces the General Availability and Open Sourcing of the Amazon Genomics CLI

Amazon Genomics CLI is a tool that makes it easier to process genomics data at a petabyte-scale on AWS. Earlier this year, the public cloud vendor shared a preview of the tool, and it is now open source and generally available.

Steef-Jan Wiggers
on Oct 06, 2021
Cloud

Hazelcast Jet 4.4 Released - the Four-Year Anniversary Release as Seen by Scott McMahon

Hazelcast Jet recently celebrated its four-year anniversary with the release of version 4.4. Besides the normal bug fixes and performance enhancements, this new version ships with new features such as the unified file connector and the first beta version of the SQL interface. InfoQ spoke to Scott McMahon, technical director of field engineering at Hazelcast, about this new release.

Olimpiu Pop
on Mar 19, 2021

Newer News

Older News

InfoQ Software Architects' Newsletter

News