After several years of development, MBrace 1.0 was released last week. MBrace is a programming model for scalable cloud data scripting and programming with F# and C#. The project consists mainly of code libraries and cloud providers runtime.
The second Microservices Conference arranged by Skills Matter is due early November with two days in Stockholm and London respectively. The list of speakers include the program lead for the conference Russ Miles, David A. Dawson, Björn Carlson, chief architect at Klarna, Viktor Klang, chief software architect at Typesafe, Ian Cooper and Daniel Bryant.
When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.
Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.
Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.
Preparing for problems like partial failure is the best thing you can do when working with distributed systems, Vaughn Vernon explains in a conversation with InfoQ and refers to a blog post by Jeff Hodges noting its down-to-earth approach and practical advices e.g. designing for partial availability, and using capped exponential back off to restore full operation when dependencies are unavailable.
GameAnalytics, maker of a free analytics platform, has recently open sourced gascheduler an Erlang library that provides a generic scheduler for parallel execution of distributed tasks. InfoQ has spoken to Chris de Vries, one of gascheduler’s creators.
Looking at Command Query Responsibility Segregation (CQRS) in a larger architectural context there are other architectural styles available. There are database technologies solving the same problems but in a simpler way, Udi Dahan states looking into ways of approaching CQRS. There is also a way that fulfils a lot of the CQRS goals but with fewer moving parts when CQRS is really needed.
ELIoT (Extensible Language for the Internet of Things) is a simple and small programming language aiming to make distributed programming easier. A program in ELIoT may appear as a sigle program, but it actually runs on different computers, so, e.g., a variable or function declared on one computer is transparently used on another.
To make microservices awesome Domain-Driven Design (DDD) is needed, the same mistakes made 5-10 years ago and solved by DDD are made again in the context of microservices, David Dawson claimed in his presentation at this year’s DDD Exchange conference in London.
Twitter has replaced Storm with Heron which provides up to 14 times more throughput and up to 10 times less latency on a word count topology, and helped them reduce the needed hardware to a third.
Apache Parquet, the open-source columnar storage format for Hadoop, recently graduated from the Apache Software Foundation Incubator and became a top-level project. Initially created by Cloudera and Twitter in 2012 to speed up analytical processing, Parquet is now openly available for Apache Spark, Apache Hive, Apache Pig, Impala, native MapReduce, and other key components of the Hadoop ecosystem.
During the last months Martin Fowler among others have claimed that a microservices architecture should always start with a monolith, but Stefan Tilkov is convinced this is wrong, building a well-structured monolith with cleanly separated modules that later may be pulled apart into microservices is extremely hard, if not impossible in most cases.
Latest version of MemSQL, in-memory database with support for transactions and analytics, includes a new Community Edition for free use by organizations. MemSQL 4, released last week, also supports integration with Apache Spark, Hadoop Distributed File System (HDFS), and Amazon S3.
NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.