InfoQ Homepage Big Data Content on InfoQ
-
Airbnb Builds Himeji - a Scalable Centralized Authorization System
Airbnb recently described how it built Himeji, a scalable centralized authorization system. Himeji stores permissions data and performs permission checks as a central source of truth. It uses a sharded and replicated in-memory cache to improve performance and lower latencies and has served checks in production for about a year.
-
Hazelcast Jet 4.4 Released - the Four-Year Anniversary Release as Seen by Scott McMahon
Hazelcast Jet recently celebrated its four-year anniversary with the release of version 4.4. Besides the normal bug fixes and performance enhancements, this new version ships with new features such as the unified file connector and the first beta version of the SQL interface. InfoQ spoke to Scott McMahon, technical director of field engineering at Hazelcast, about this new release.
-
Using Machine Learning in Testing and Maintenance
With machine learning, we can reduce maintenance efforts and improve the quality of products. It can be used in various stages of the software testing life-cycle, including bug management, which is an important part of the chain. We can analyze large amounts of data for classifying, triaging, and prioritizing bugs in a more efficient way by means of machine learning algorithms.
-
DataStax Announces Astra Serverless Database-as-a-Service
DataStax , the company behind the Cassandra database, announced last week the general availability of Astra serverless, the open, multi-cloud serverless database-as-a-service (DBaaS).
-
Designing for Failure in the BBC's Analytics Platform
Last week at InfoQ Live, Blanca Garcia-Gil, principal systems engineer at BBC, gave a session on Evolving Analytics in the Data Platform. During this session, Garcia-Gil focused on how her team prepared and designed for two types of failure - "known unknowns" and "unknown unknowns."
-
Google Brings Databricks to Its Cloud Platform
Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.
-
PayPal Standardizes on Apache Airflow and Apache Gobblin for Its Next-Gen Data Movement Platform
PayPal recently described how it standardized on Apache Airflow and Apache Gobblin for implementing its next-gen data movement platform. In a recent blog post, PayPal engineers detail how the existing data movement platform evolved into many tools & platforms in a complex and unmanageable ecosystem and their shift towards a new implementation.
-
Analyzing Large Amounts of Feedback to Learn from Users
Making it easy for users to give feedback and automating the collection of feedback helps to get more feedback faster. Using artificial intelligence, you can analyze large amounts of feedback to get insights and visualize trends. Sharing this information widely supports taking action to enhance your product and solve issues that users are having.
-
Microsoft Releases .NET for Apache Spark 1.0
Last month, Microsoft released the first major version of .NET for Apache Spark, an open-source package that brings .NET development to the Apache Spark platform. The new release allows .NET developers to write Apache Spark applications using .NET user-defined functions, Spark SQL, and additional libraries such as Microsoft Hyperspace and ML.NET.
-
Google Announces a New, More Services-Based Architecture Called Runner V2 to Dataflow
Google Cloud Dataflow is a fully-managed service for executing Apache Beam pipelines within the Google Cloud Platform(GCP). In a recent blog post, Google announced a new, more services-based architecture called Runner v2 to Dataflow – which will include multi-language support for all of its language SDKs.
-
Accelerating Machine Learning Lifecycle with a Feature Store
Feature Store is a core part of next generation ML platforms that empowers data scientists to accelerate the delivery of ML applications. Mike Del Balso and Geoff Sims recently spoke at Spark AI Summit 2020 Conference about the feature store driven ML development.
-
Spark AI Summit 2020 Highlights: Innovations to Improve Spark 3.0 Performance
At the recent Spark AI Summit 2020, held online for the first time, the highlights of the event were innovations to improve Apache Spark 3.0 performance, including optimizations for Spark SQL, and GPU acceleration.
-
IBM Fully Homomorphic Encryption Toolkit Now Available for MacOS and iOS
IBM's Fully Homomorphic Encryption (FHE) Toolkit aims to allow developers to start using FHE in their solutions. According to IBM, FHE can have a dramatic impact on data security and privacy in highly regulated industries by enabling computing directly on encrypted data.
-
Splunk Launches New Release of SignalFx APM
Splunk, a platform for searching, monitoring, and examining machine-generated big data, has launched a new release of application monitoring tool SignalFx Microservices APM™. The new release combines NoSample™ tracing, open standards based instrumentation and artificial intelligence (AI)-driven directed troubleshooting from SignalFx and Omnition into a single solution.
-
Boosting Apache Spark with GPUs and the RAPIDS Library
At the 2019 Spark AI Summit Europe conference, NVIDIA software engineers Thomas Graves and Miguel Martinez hosted a session on Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS Library. InfoQ recently talked with Jim Scott, head of developer relations at NVIDIA, to learn more about accelerating Apache Spark with GPUs and the RAPIDS library.