Airbnb recently opensourced Airflow, its own data workflow management framework. Airflow is being used internally at Airbnb to build, monitor and adjust data pipelines. Airflow’s creator, Maxime Beauchemin and Agari’s Data Architect and one of the framework’s early adopters Siddharth Anand discuss about Airflow, where it can be of use and future plans.
Metanautix recently announced the newest version of its product, Quest. Quest allows enterprises to build software defined data marts that can run in virtualized servers. Designed from the ground up with security and auditability in mind, Quest can deal with Big Data workloads and encapsulate it into different logical views, ready for consumption by different departments in enterprise.
ThoughtWorks recently published the latest update to its Technology Radar; a report produced to help technology decision makers understand emerging trends in software development techniques, tools, languages and platforms. There are some interesting observations of interest to Agile software development teams.
Microsoft has released SQL Server 2012 Release Candidate 0. There are many new features, including: AlwaysOn, better performance management, more reporting and visualization tools, Columnstore index, and FileTables. The product will come in 3 main editions: Standard, Business Intelligence and Enterprise.
Business Intelligence vendor Pentaho has announced the release of olap4j 1.0, a new, common Java API for any online analytical processing (OLAP) server.
Imagine ad hock data mining queries against a single table with 1 TB of data and 1.44 billion rows coming back in roughly a second. This is the scenario Microsoft intends to support using 32-core machines and their new column-based storage engine.
Eobject.org's open-source Java framework MetaModel implements a unified API for the access, exploration, and query of different datastores. Eobjects.org, both a website and an open source software organization dedicated to "the development of Open Source software related to Business Intelligence and Data Warehousing", has recently published version 1.5 of MetaModel.
We talk with Daniel Kirstenpfad, founder and CTO of sones GmbH, about Graph Databases and how they can better model some types of data such as relations in a social networking application. A graph database can offer performance benefits over other types of databases because they explicitly represent a graph and are organized to have index free adjacency.
Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving.
The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook's VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop.
The GigaOM Stucture conference a couple of weeks ago addressed many areas of cloud computing. One of the key themes of the event was the emergence of new data architectures. Throughout the panels, interviews, and presentations many speakers identified significant changes in how data gets handled that will be coming.
The need for machine-learning techniques like clustering, collaborative filtering, and categorization has steadily increased the last decade along with the number of solutions needing quick and efficient algorithms to transform vast amounts of raw data into relevant information. Apache Mount 0.3 has been announced on March, adding more functionality, stability and performance.
Microsoft’s service codename “Dallas” is an information marketplace bringing together data, imagery and service providers and their consumers facilitating information exchange through a single point of access.
Microsoft released a free MapPoint 2009 Add-In for SQL Server 2008 spatial data. The add-in can be used with MapPoint to build map graphics against queries on SQL Server 2008 spatial geography columns.
The NoSQL meeting tried to raise the awareness towards the opportunity of using non-relational databases which promise to be cheaper, simpler to administer and maintain, and offering superior scalability. Michael Stonebraker, co-creator of Ingres and Postgres, thinks that the end of RDBMS era is close, while others think that we are not there yet.