InfoQ Homepage Data Analytics Content on InfoQ
-
ClickHouse Keeper: Efficient Apache ZooKeeper Alternative Created with C++ and Raft
ClickHouse project team created an in-house replacement for Apache Zookeeper as it needed a more efficient implementation that would also address some of Zookeeper's shortcomings. Now, ClickHouse Keeper is an essential part of the ClickHouse project and a cornerstone of this open-source analytical database, but can also be used independently for many distributed coordination use cases.
-
KubeCon NA 2023: Kubernetes Storage Platform to Run Real-Time Analytic Databases
Kubernetes storage platform provides a portable and flexible foundation for data management to help developers build their own data solutions. Robert Hodges spoke last week at KubeCon CloudNativeCon North America 2023 Conference on different techniques his teams developed to build their own data platform.
-
Running Apache Flink Applications on AWS KDA: Lessons Learnt at Deliveroo
Deliveroo introduced Apache Flink into its technology stack for enriching and merging events consumed from Apache Kafka or Kinesis Streams. The company opted to use AWS Kinesis Data Analytics (KDA) service to manage Apache Flink clusters on AWS and shared its experiences from running Flink applications on KDA.
-
AWS Introduces New Clickstream Analytics on AWS Solution for Mobile and Web Applications
AWS recently announced a new service called Clickstream Analytics on AWS, an end-to-end solution to collect, ingest, analyze, and visualize clickstream data inside organizations’ web and mobile applications.
-
Unified Analytics Platform: Microsoft Fabric
At the recent annual Build Conference, Microsoft introduced a unified analytics platform with Microsoft Fabric that brings together all the data and analytics that organizations need.
-
AWS Introduces Athena Provisioned Capacity
AWS recently announced a new feature Provisioned Capacity for Athena, that allows users to run SQL queries on fully-managed compute capacity for a fixed price and no long-term commitments.
-
Netflix Built a Scalable Annotation Service Using Cassandra, Elasticsearch and Iceberg
Netflix recently published how it built Marken, a scalable annotation service using Cassandra, ElasticSearch and Iceberg. Marken allows storing and querying annotations, or tags, on arbitrary entities. Users define versioned schemas for their annotations, which include out-of-the-box support for temporal and spatial objects.
-
Apache Druid 25.0 Delivers Multi-Stage Query Engine and Kubernetes Task Management
Apache Druid is a high-performance real-time datastore and its latest release, version 25.0, provides many improvements and enhancements. The main new features are: the multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready, and Kubernetes can be used to launch and manage tasks eliminating the need for middle managers...
-
AWS Announces Clean Rooms for Secure Collaboration with Analytics Data
During the recent re:Invent conference, AWS announced the preview of Clean Rooms for analytics data. The new service provides safe environments where multiple customers can securely share and analyze data with control of how the data is used, reducing the risk of sharing personal data.
-
AWS Glue Now Supports Crawler History
AWS recently launched support for histories of AWS Glue Crawlers, which allows the interrogation of Crawler executions and associated schema changes for the last 12 months.
-
Austrian DPA Ruling against Google Analytics Paves the Way to EU-based Cloud Services
In a recent ruling, the Austrian data regulator declared the use of Google Analytics unlawful based on EU GDPR regulation. While the ruling is very specifically argued and worded, its implications go well beyond this particular case.
-
Data Collection, Standardization and Usage at Scale in the Uber Rider App
Uber Engineering recently published how it collects, standardises and uses data from the Uber Rider app. Rider data comprises all the rider's interactions with the Uber app. This data accounts for billions of events from Uber's online systems every day. Uber uses this data to deal with top problem areas such as increasing funnel conversion, user engagement, etc.
-
Microsoft Renames Its Azure for FHIR API to Azure Healthcare APIs
Recently Microsoft announced the renaming of its Cloud for Healthcare's Azure API for Fast Healthcare Interoperability Resource (FHIR) to "Azure Healthcare APIs." In addition to the renaming of the APIs, the company also expands support for healthcare data to include patient health data via FHIR, medical imaging data via DICOM - and medical device data via the Azure IoT Connector for FHIR .
-
Amazon SNS Gains Message Archiving and Analytics via Amazon Kinesis Data Firehose
Amazon Web Services (AWS) recently announced that Amazon SNS supports Amazon Kinesis Data Firehose subscriptions to send messages to "data lakes, data stores, and analytics services [...] without writing custom code". The new event destination also simplifies the integration of third-party service providers.
-
AWS Announces a Data Management and Analytics Solution Called Amazon FinSpace
Recently, AWS announced a data management and analytics solution purpose-built for the Financial Services Industry (FSI) called Amazon FinSpace. The service aims to reduce the time it takes for financial analysts to find and access all types of financial data for analysis.