Adopting Big Data and Data Science technologies into an organisation is a transformative project similar to an agile transformation and with many similar challenges. In this article, the author describes such a project for a FTSE100 financial services company.
In their book "Relevant Search", Doug Turnbull and John Berryman focus on the challenge of providing search results by balancing the needs and intents of the user. 1
HTML editors work fine for general formatting, but they don’t have all the capabilities that some businesses require. In this article, Prasadu Babu Dandu shows how to convert Word documents to HTML.
Bikas Saha and Arun Murthy discuss Tez’s design, highlight some of its features and share some of the initial results obtained by making Hive use Tez instead of MapReduce.
Apache Samza is a stream processor LinkedIn recently open-sourced. Chris Riccomini shares Samza's feature set, how it integrates with YARN and Kafka, how it's used at LinkedIn and more.
MetaModel - an Apache Incubator project – is a Java library used to browse, query and update various types of data stores, such as RDBMS, CSV, Excel, NoSQL, etc., in a uniform and programmatic way. 4
Apache Hadoop YARN – a new Hadoop resource manager - has just been promoted to a high level Hadoop subproject. InfoQ had the chance to discuss YARN with Arun Murthy - founder of Hortonworks. 1
Citing a need to be able to respond faster to events, and disappointment in both feature set and timeframe for Java 7, the guardian.co.uk team is using Scala rather than Java for new projects. 12
A new marshaling framework - Apache Avro provides a lot of interesting new features. In his new article, Boris Lublinsky takes it for a test drive and provides some suggestions on its proper usage 5
"Tuscany SCA in Action" by Simon Laws, Mark Combellack, Raymond Feng, Haleh Mahbod and Simon Nash provides a simple step-by-step guide on how to develop applications leveraging SCA and Apache Tuscany.