Goldman Sachs is widely known as a leader in investment banking, but they are very much a leading technology firm as well. Reladomo is the primary Java ORM used at GS, and it is now open source. In this article GS Technology Fellow, Mohammad Rezaei, takes us on a deep dive into Reladomo.
In this article, author Srini Penchikala discusses Apache Spark GraphX library used for graph data processing and analytics. The article includes sample code for graph algorithms like PageRank, Connected Components and Triangle Counting.
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) talk about challenges when building scalable, big data systems, and how to address them.
This article compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing, streaming ingestion and data wrangling.
Advice on the best talks to attend at QCon London 2017 from London Thought Leaders.
InfoQ talked with Immuta’s Andrew Burt and Steve Touw, to better understand the implications and challenges of the EU's Global Data Protection Regulation, which will come into effect in May 2018.
Cassandra: The Definitive Guide, 2nd Edition book authored by Jeff Carpenter and Eben Hewitt covers the Cassandra NoSQL database version 3.0. InfoQ spoke with the co-author Jeff Carpenter.
In this series we explore ways of making sense of data science - understanding where it’s needed and where it’s not, and how to make it an asset for you, from people who’ve been there and done it.
Yahoo uses Hadoop for different use cases in big data & machine learning areas. InfoQ spoke with Peter Cnudde on how Yahoo leverages big data technologies.
Internet of Things (IoT) is an emerging technology. One of the areas of IoT is the connected vehicles. In this article, we'll use Spark and Kafka to analyse and process IoT connected vehicle's data. 9
In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines. 2