BT

MindMeld’s Guide to Building Conversational Apps

by Abel Avram on  Feb 03, 2017

MindMeld, a conversational AI company, has published The Conversational AI Playbook, a guide outlining the challenges and the steps to be made to create conversational applications.

Apache HBase 1.3 Ships with Multiple Performance Improvements

by Alexandre Rodrigues on  Jan 30, 2017

Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction and improvements in multiple areas, like write-ahead log (WAL), and a new RPC scheduler, among others. The release includes almost 1,700 resolved issues in total.

Apache Eagle, Originally from eBay, Graduates to top-level project

by Alexandre Rodrigues on  Jan 24, 2017

Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017. Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion.

Improving Azure SQL Database Performance Using In-Memory Technologies

by Kent Weare on  Jan 21, 2017 4

In late 2016, Microsoft announced the general availability of Azure SQL Database In-Memory technologies. In-Memory processing is only available in Azure Premium database tiers and provides performance improvements for On-line Analytical Processing (OLTP), Clustered Columnstore Indexes and Non-clustered Columnstore Indexes for Hybrid Transactional and Analytical Processing (HTAP) scenarios.

How 3rd Party Tools Nearly Killed Performance (and Culture) at Adidas

by Manuel Pais on  Jan 20, 2017

How the shoe and clothes giant manufacturer's IT tamed an out-of-control proliferation of third party tools in their global websites which was killing performance. Furthermore, this led to a blame culture setting in between business and IT. A new third party governance process focusing on performance data and user experience validation was key to stop the bleeding.

Kuzzle – An On-Premises Document Back-End

by Abel Avram on  Jan 12, 2017

Kuzzle is a document back-end that can run on-premises or in the cloud. The company behind this platform has recently announced the enterprise version of their solution during CES 2017.

Yelp Open-Sources Latest in Data Pipeline Project, Data Pipeline Client Library

by Dylan Raithel on  Jan 06, 2017

Yelp open sources latest component in its data pipeline initiative, a python-based data pipeline client library.

Mathieu Ripert on Instacart's Machine Learning Optimizations

by Alexandre Rodrigues on  Jan 05, 2017

Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer. InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.

Google BigQuery Adds New Public Datasets

by Alex Giamas on  Jan 05, 2017

Stack Overflow recently announced making its dataset available through Google’s BigQuery. Using regular SQL statements, developers can query the full set of Stack Overflow data including posts, votes, tags, and badges. In this article we explore datasets that are available through Google's BigQuery platform.

Neo4j 3.1 Supports Causal Clustering and Security Enhancements

by Srini Penchikala on  Dec 31, 2016

The latest version of Graph NoSQL database Neo4j introduces causal clustering and new security architecture. Neo4j team recently released version 3.1 of the graph database. Other new features include database kernel improvements and a Schema Viewer.

AFK-MC² Algorithm Speeds up k-Means Clustering Algorithm Seeding

by Alexandre Rodrigues on  Dec 23, 2016

“Fast and Probably Good Seedings for k-Means” by Olivier Bachem et al. was presented on 2016’s Neural Information Processing Systems (NIPS) conference and describes AFK-MC2, an alternative method to generate initial seedings for k-Means clustering algorithm that is several orders of magnitude faster than the state of art method k-Means++.

Speedment Releases Stream ORM Version 3.0.1

by Michael Redlich on  Dec 16, 2016

Speedment released version 3.0.1 of their stream object-relational mapping Java toolkit and runtime application, featuring a new declarative Java 8 stream API, an improved user interface, and better code generation. InfoQ spoke to Per-Åke Minborg, co-founder and CTO of Speedment, about this latest release.

Julien Nioche on StormCrawler, Open-Source Crawler Pipelines Backed by Apache Storm

by Alexandre Rodrigues on  Dec 15, 2016

Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.

Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing

by Srini Penchikala on  Dec 09, 2016

A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

by Alexandre Rodrigues on  Dec 08, 2016 1

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

BT