InfoQ Homepage Spark Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery

Agoda unifies hotel images and guest reviews using a shared topic taxonomy, enabling multimodal retrieval across 700M+ images and multilingual reviews with offline enrichment and low-latency serving.

Leela Kumili
on May 19, 2026
Architecture & Design

Pinterest Deploys Production-Scale Model Context Protocol Ecosystem for AI Agent Workflows

Pinterest engineering teams have deployed a production-ready Model Context Protocol (MCP) ecosystem that allows AI agents to automate complex engineering tasks and integrate diverse internal tools. Domain-specific MCP servers, a central registry, and human-in-the-loop approval improve security, governance, and developer productivity while saving thousands of hours per month.

Leela Kumili
on Apr 01, 2026
Cloud

AWS Announces a Data Management and Analytics Solution Called Amazon FinSpace

Recently, AWS announced a data management and analytics solution purpose-built for the Financial Services Industry (FSI) called Amazon FinSpace. The service aims to reduce the time it takes for financial analysts to find and access all types of financial data for analysis.

Steef-Jan Wiggers
on May 13, 2021
Cloud

Simplifying ETL in the Cloud, Microsoft Releases Azure Data Factory Mapping Data Flows

In a recent blog post, Microsoft announced the general availability (GA) of their serverless, code-free Extract-Transform-Load (ETL) capability inside of Azure Data Factory called Mapping Data Flows. This tool allows organizations to embrace a data-driven culture without the need to manage large infrastructure footprints while having the ability to dynamically scale data processing workloads.

Kent Weare
on Oct 21, 2019
Cloud

Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults

Kent Weare
on Sep 09, 2019
AI, ML & Data Engineering

Dataiku's Latest Release Integrates Deep-Learning for Computer Vision

Collaborative data science platform Dataiku's latest release of its Data Science Studio includes pre-trained deep learning models for image processing. The DSS platform implements each step of a data-science project from data-sourcing and visualization to production deployment. Its machine-learning module supports standard libraries and it integrates with Hadoop and multiple Spark engines.

Alexis Perrier
on Apr 11, 2018
AI, ML & Data Engineering

Yahoo Open Sources TensorFlowOnSpark

Yahoo open sources TensorFlowOnSpark, allowing Spark-native TensorFlow runtime and integration for distributed training and serving on Spark or Hadoop.

Dylan Raithel
on Mar 20, 2017
AI, ML & Data Engineering

Google Cloud Machine Learning and Tensor Flow Alpha Release

Late last month Google released an alpha version of their TensorFlow (TF) integrated cloud machine learning service as a response to a growing need to make their Tensor Flow library to run at scale on the Google Cloud Platform (GCP). Google describes several new feature sets around making TF usage scale by integrating several pieces of the GCP like Dataproc, a managed Hadoop and Spark service.

Dylan Raithel
on Apr 18, 2016
IBM to Open Source 50 Projects

IBM has announced a new web portal called developerWorks Open, bringing together various projects they are open sourcing. The projects cover many domains including Analytics, Cloud, IoT, Mobile, Security, Social, Watson and others. So far, IBM has open sourced about 30 projects, and they plan to increase the number up to 50 by the end of the year, and others may come in the future.

Abel Avram
on Jul 23, 2015
MemSQL 4 Database Supports Community Edition, Geospatial Intelligence and Spark Integration

Latest version of MemSQL, in-memory database with support for transactions and analytics, includes a new Community Edition for free use by organizations. MemSQL 4, released last week, also supports integration with Apache Spark, Hadoop Distributed File System (HDFS), and Amazon S3.

Srini Penchikala
on May 30, 2015
LinkedIn Open Sources Cubert With an Eye To Big Data Analytics

LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.

Alex Giamas
on Dec 17, 2014
Mahout to Get Self-Optimizing Matrix Algebra Interface with Pluggable Backends for Spark and Flink

At the recent GOTO conference in Berlin, Mahout committer Sebastian Schelter outlined recent advances in Mahout's ongoing effort to create a scalable foundation for data analysis that is as easy to use as R or Python.

Mikio Braun
on Nov 21, 2014
Apache Drill Included in MapR Latest Distribution Release

MapR recently announced including Apache Drill in its latest release of MapR distribution. Apache Drill is the open source version of Google’s Dremel. Dremel is the infrastructure on which BigQuery is based upon. Drill is offering a low latency SQL-on-Hadoop interface. While this puts it in the same space as several other technologies around Hadoop, Drill has some unique characteristics setting it

Alex Giamas
on Sep 30, 2014
DataBricks Announces Spark SQL for Manipulating Structured Data Using Spark

DataBricks, the company behind Apache Spark, has announced a new addition into the Spark ecosystem called Spark SQL. Spark SQL is separate from Shark, and does not use Hive under the hood. InfoQ reached out to Reynold Xin and Michael Armbrust, software engineers at DataBricks, to learn more about Spark SQL.

Matt Kapilevich
on Apr 19, 2014
A Roundup of Cloudera Distribution Containing Apache Hadoop 5

Cloudera recently released the latest version of its software distribution, CDH5. Almost 20 months after the last major version, CDH4 seems like ages in the Big Data world. We take a look at new features this release brings and the future direction of Cloudera after the latest round of investment from Intel and Google Ventures.

Alex Giamas
on Apr 18, 2014

Newer News

Older News

InfoQ Software Architects' Newsletter

News