LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.
At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.
Stripe, the internet payments infrastructure company recently announced open sourcing a set of internally developed tools based on Apache Hadoop.Timberlake, Brushfire, Sequins and Herringbone all contribute to enriching the available tools for building an Apache Hadoop stack.
Microservices are not new ideas and we will over the course of 3-5 years end up rebuilding WS-* the same way Web Services did rebuild all from CORBA unless we learn from our mistakes and improve to prevent them from being made again, Greg Young stated in a presentation at the Microservices Conference in London.
Microservices are valuable, but to break things up properly creating the right boundaries we need to understand our business and its processes Jeppe Cramon stated in a presentation at the Microservices Conference in London.
Udi Dahan describes how looking for highly cohesive, loosely coupled microservices, not within a system but over the enterprise, we can end up with a focus on organising services around business capabilities spanning the whole organisation since this is what the business care about.
When working with Microservices pushing them to the cloud, people often find it difficult to understand the new architecture, it’s a paradigm shift, Daniel Bryant explains in a presentation at the Microservices Conference in London. As a help when designing and implementing cloud microservices Daniel has created the DHARMA principles, the idea being to use them as a checklist.
Databricks has recently announced a new record in the Daytona GraySort contest using the Spark processing engine. The Daytona GraySort contest is a 3rd party benchmark measuring how fast a system can sort 100 Terabytes of data. Databricks posted a throughput of 4.27 TB/min over a cluster of 206 machines for their official run.
When using Domain-Driven Design (DDD) separating the concerns of a large system into bounded contexts with each context using its own data store there is often a need to share some common data. One way of doing that is to let each context publish events about changes, events that others can listen to, Julie Lerman recently explained in MSDN Magazine.
Hortonworks Data Platform (HDP) version 2.2 with features based around Hadoop and YARN has better support for enterprise features such as security, compliance and so on as well.
Microsoft recently published a case study describing how a massively multiplayer online (MMO) game used Microsoft Azure to support tens of thousands of players in a single space battle. The case study looks at how architectural considerations like connectivity, latency, and scale can be addressed in an elastic cloud environment that must respond quickly to unexpected bursts in demand.
Chad Fowler, CTO at 6Wunderkinder, the company behind Wunderlist, describes how they went from a large monolithic Rails application and a large monolithic database to a system with many microservices, and the architecture they ended up with. Starting by adding new functionality as services and splitting the large database into smaller databases, they ended up doing a big rewrite of a new system.
Microsoft recently announced new machine learning capabilities for Microsoft Azure platform. Developers can also create their own web services and publish them to Azure Marketplace. Microsoft also announced availability of Apache Storm for Azure. Azure Stream Analytics, Data Factory and Event Hubs for Azure were all announced in the past few weeks by Microsoft. In this article we explore moreabout
The success of the RICON conference is a testimony to the importance of big applications in industry today. InfoQ speaks to RICON host Basho Technologies about considerations in building distributed systems and technical lessons learned at the conference.
Basho Riak is emerging as -the- highly scalable NoSQL database. InfoQ talks with Basho CEO and President Adam Wray, and Peter Coppola - VP of Product, about the RICON conference, and about Basho, Riak, and distributed systems.