Data Scientist role has been getting lot of attention lately as organizations are starting to use big data processing and analytics techniques to gain insights into their data. This post takes a closer look at the role of a Data Scientist in 2016.
In the "Spark in Action" book, authors Petar Zecevic and Marko Bonaci discuss the Apache Spark framework for data processing (batch and streaming data use cases). They introduce the architecture of Spark and core concepts such as Resilient Distributed Datasets (RDDs). InfoQ spoke with them about Apache Spark, developer tools, and the upcoming features and enhancements in the future releases.
Current enterprise data architectures include NoSQL databases co-existing with relational databases. However, NoSQL data management currently lacks mature methods and tools to manage NoSQL data. In this article, author discusses a solution for managing both NoSQL and relational databases using Unified Data Modeling techniques.
Sourcing Security Superheroes: Part II: How Policy Can Enhance, Rather Than Hinder, Breach Detection
In theory, security policies protect organizations, stakeholders, and users. But in practice, organizations become more concerned with meeting these standards than protecting the business.
Our physical world is about to become digitally enabled and according to various predictions, there will be many billions of IoT devices going online and collecting data in the coming years. 1
In this article, third installment of Apache Spark series, author discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application. 5
In this article, author discusses the survival prediction of colorectal cancer as a multi-class classification problem and how to solve that problem using the Apache Spark's MLlib Java API.
This article summarizes the key takeaways and highlights from QCon San Francisco 2015 as blogged and tweeted by QCon's 1,300 attendees.
Data Lake-as-a-Service provides big data processing in the cloud for business outcomes in a cost effective way. InfoQ spoke with Lovan Chetty & Hannah Smalltree from Cazena about these solutions work.
In this article, author discusses a bio-informatic software as a service (SaaS) product which was built as a public data warehousing and analytical platform for mass spectrometry data. 3
Find out what's new in Kubernetes V1 with a Jenkins example in Google Container Engine. V1 brings enterprise-level capabilities such as self healing, service discovery, dynamic DNS, resource quotas.
A new Eclipse Oozie plugin allows to significantly simplify implementation of Oozie processes by allowing to define them graphically. An article introduces plugin and provides an example of its usage. 1