LinkedIn’s Joel Koshy details their Kafka usage, debugging and monitoring two production incidents in using the core Kafka infrastructure concepts, semantics and behavioral patterns to plan for and detect similar problems in the future.
Moving applications to the cloud has somewhat become commodity in the meantime - not only for big players, but also for smaller companies that rely on flexibility and resource utilization. In his presentation "Implementing Infrastructure as Code", Kief Morris, cloud practice lead at ThoughWorks, shares some key principles and recommendations on how to leverage cloud based infrastructure.
LinkedIn recently detailed open-sourced Kafka Monitor service that they're using to monitor production Kafka clusters as well as extensive testing automation, leading them to identify bugs in the main Kafka trunk and contribute solutions to the open-source community.
As part of the ongoing transition to the module system, CORBA and other Java EE modules won't be included in the default classpath from Java 9 onwards. These modules will still be available, but specific command line flags will have to be used to be able to use them. The change will only affect non-modular applications targeting Java 9, for modular ones already need to indicate their dependencies.
Confluent Platform 3.0 messaging system from Confluent, the company behind Apache Kafka messaging framework, supports Kafka Streams for real-time data processing. The company announced last week the general availability of the latest version of the open source Confluent platform.
Cloudera announced their partnership with MIT & Harvard's Broad Institute and detailed some of their experience with the Genome Analytics Toolkit pipeline.
Two years after the first release of Apache Spark, Databricks announced the technical preview of Apache Spark 2.0 , based on upstream branch 2.0.0-preview. The preview is not ready for production, neither in terms of stability nor API, but is a release intended to gather feedback from the community ahead of the general availability of the release.
Amazon has recently announced an update to their Amazon Kinesis Service. In this update, three new features have been added to Amazon Kinesis Streams and Amazon Kinesis Firehose including support for Elasticsearch Service Integration, Shard-Level Metrics and Time-Based Iterators.
AWS engineers Christopher Crosbie and Ujjwal Ratan detail using Spark on EMR for precision medicine data analysis on the ADAM platform with data from the 1000 genomes project.
Supergiant is a container hosting platform built using Kubernetes for distributed, stateful applications.
Summary of DevOps Days Kiel day 1 talks.
Genomic data sequencing and subsequent analysis faces large data volume challenges that several organizations are solving with cloud services. The Broad Institute detailed their experience with petabyte scale sequencing pipelines last month through the Google Research Blog and is detailed here by InfoQ.
After months of awaiting details about the NHS and Google DeepMind partnership InfoQ gains insights into recent claims of widespread patient data access.
Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.
Terracotta has released version 3 of their distributed caching technology Ehcache, sporting a number of important new features. First, its API has been refactored and now leverages Java generics. Performance has generally been enhanced, and support for the javax.cache API (JSR-107) and off heap storage capabilities have been added.