Databricks has recently announced a new record in the Daytona GraySort contest using the Spark processing engine. The Daytona GraySort contest is a 3rd party benchmark measuring how fast a system can sort 100 Terabytes of data. Databricks posted a throughput of 4.27 TB/min over a cluster of 206 machines for their official run.
When using Domain-Driven Design (DDD) separating the concerns of a large system into bounded contexts with each context using its own data store there is often a need to share some common data. One way of doing that is to let each context publish events about changes, events that others can listen to, Julie Lerman recently explained in MSDN Magazine.
Hortonworks Data Platform (HDP) version 2.2 with features based around Hadoop and YARN has better support for enterprise features such as security, compliance and so on as well.
Microsoft recently published a case study describing how a massively multiplayer online (MMO) game used Microsoft Azure to support tens of thousands of players in a single space battle. The case study looks at how architectural considerations like connectivity, latency, and scale can be addressed in an elastic cloud environment that must respond quickly to unexpected bursts in demand.
Chad Fowler, CTO at 6Wunderkinder, the company behind Wunderlist, describes how they went from a large monolithic Rails application and a large monolithic database to a system with many microservices, and the architecture they ended up with. Starting by adding new functionality as services and splitting the large database into smaller databases, they ended up doing a big rewrite of a new system.
Microsoft recently announced new machine learning capabilities for Microsoft Azure platform. Developers can also create their own web services and publish them to Azure Marketplace. Microsoft also announced availability of Apache Storm for Azure. Azure Stream Analytics, Data Factory and Event Hubs for Azure were all announced in the past few weeks by Microsoft. In this article we explore moreabout
The success of the RICON conference is a testimony to the importance of big applications in industry today. InfoQ speaks to RICON host Basho Technologies about considerations in building distributed systems and technical lessons learned at the conference.
Basho Riak is emerging as -the- highly scalable NoSQL database. InfoQ talks with Basho CEO and President Adam Wray, and Peter Coppola - VP of Product, about the RICON conference, and about Basho, Riak, and distributed systems.
Leslie Lamport is the author of some of the most cited computer science papers and won a Turing Award in 2013 for his seminal work in distributed and concurrent systems. This is a summary of an interview that Lamport gave to Software Engineering Radio touching themes such as his early work in distributed systems and the importance of precise thinking in programming.
Hunk is a relatively new product from Splunk for exploring and visualizing Hadoop and other NoSQL data stores. New in this release is support for Amazon’s Elastic MapReduce.
MapR recently announced including Apache Drill in its latest release of MapR distribution. Apache Drill is the open source version of Google’s Dremel. Dremel is the infrastructure on which BigQuery is based upon. Drill is offering a low latency SQL-on-Hadoop interface. While this puts it in the same space as several other technologies around Hadoop, Drill has some unique characteristics setting it
Robert C. Martin's advice is to start with shared libraries and a plugin architecture and only when that becomes insufficient consider microservices. Giorgio Sironi argues against this, emphasising how different interactions between microservices are compared to interactions between objects and warns for the cost of retrofitting microservices over an existing code base.
Following on from the Stinger initiative delivered in Apache Hive 0.13, Hortonworks has laid out the Stinger.next roadmap to provide fully ACID transactions, a sub-second query engine, and more complete SQL 2011 analytics support, all driving towards the goal of “enhancing the speed, scale and breadth of SQL support” in Hive.
Cloudera recently released an update over Project Rhino and data at-rest encryption in Apache Hadoop. Project Rhino is an effort of Cloudera, Intel and Hadoop community to bring a comprehensive security framework for data protection. InfoQ recently talked to Steven Ross from Cloudera team to learn more about the project.
Different views within the team on the benefits and drawbacks comparing a microservice architecture with a more traditional monolithic architecture was one of the major reasons we failed, Richard Clayton writes sharing his experiences and reasons for failing when implementing and maintaining a microservice architecture.