InfoQ Homepage Big Data Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

Christine Doig on Data Science as a Team Discipline

Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.

Srini Penchikala
on Aug 26, 2016
AI, ML & Data Engineering

Big Data Analytics with Spark Book Review and Interview

Big Data Analytics with Spark book, authored by Mohammed Guller, provides a practical guide for learning Apache Spark framework for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. InfoQ spoke with author about the book & development tools for big data applications.

Srini Penchikala
on Jun 23, 2016
AI, ML & Data Engineering

Big Data Processing with Apache Spark - Part 4: Spark Machine Learning

In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concepts and Spark MLlib library for running predictive analytics using a sample application.

Srini Penchikala
on May 15, 2016
AI, ML & Data Engineering

The Role of a Data Scientist in 2016

Data Scientist role has been getting lot of attention lately as organizations are starting to use big data processing and analytics techniques to gain insights into their data. This post takes a closer look at the role of a Data Scientist in 2016.

Ed Jones
on Mar 27, 2016
AI, ML & Data Engineering

Unified Data Modeling for Relational and NoSQL Databases

Current enterprise data architectures include NoSQL databases co-existing with relational databases. However, NoSQL data management currently lacks mature methods and tools to manage NoSQL data. In this article, author discusses a solution for managing both NoSQL and relational databases using Unified Data Modeling techniques.

Allen Wang
on Feb 28, 2016
Mobile

Getting Ready for IoT’s Big Data Challenges with Couchbase Mobile

Our physical world is about to become digitally enabled and according to various predictions for example by Gartner or Cisco, there will be many billions of IoT devices going online and constantly gathering data in the coming years. We got in touch with Wayne Carter and Ali LeClerc of Couchbase to discuss how Couchbase Mobile is also ready for the upcoming era of Internet of Things.

Ralph Winzinger
on Jan 20, 2016
AI, ML & Data Engineering

Big Data Processing with Apache Spark - Part 3: Spark Streaming

In this article, third installment of Apache Spark series, author Srini Penchikala discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application.

Srini Penchikala
on Jan 07, 2016
AI, ML & Data Engineering

Health Informatics and Survival Prediction of Cancer with Apache Spark Machine Learning Library

In this article, author discusses the survival prediction of colorectal cancer as a multi-class classification problem and how to solve that problem using the Apache Spark's MLlib Java API.

Konur Unyelioglu
on Dec 22, 2015
AI, ML & Data Engineering

Data Lake-as-a-Service: Big Data Processing and Analytics in the Cloud

Data Lake-as-a-Service solutions provide big data processing in the cloud for faster business outcomes in a very cost effective way. InfoQ spoke with Lovan Chetty and Hannah Smalltree from Cazena team about how Data Lake as a Service works.

Srini Penchikala
on Dec 10, 2015
AI, ML & Data Engineering

Real-time Data Processing in AWS Cloud

In this article, author Oleksii Tymchenko discusses a bio-informatic software as a service (SaaS) product called Chorus, which was built as a public data warehousing and analytical platform for mass spectrometry data. Other features of the product include real-time visualization of raw mass-spec data.

Oleksii Tymchenko
on Nov 11, 2015
AI, ML & Data Engineering

Oozie Plugin for Eclipse

Oozie Eclipse plugin is a new tool for editing Apache Oozie workflows graphically inside Eclipse. Usage of this plugin allows to skip hard to develop and maintain process definition in HPDL. Instead a process graph is defined graphically by placing process actions on pallet and connecting them. An article introduces Eclipse Oozie plugin and provides an example of its usage.

Ahmed Mahran
on Oct 30, 2015
AI, ML & Data Engineering

Big Data Solutions with MS SQL ColumnStore Index

Columnar data storage can offer significant performance improvements over the way database tables are traditionally stored, but they aren’t always faster. Aleksandr Shavlyuga explores the power, and limitations of SQL Server’s ColumnStore Indexes.

Aleksandr Shavlyuga
on Oct 11, 2015

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles