InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

Continuous Development,is it our new maintenance reality?

The Internet of Things, Web APIs and Big Data will make continuous development a necessary reality and will tie developers down with maintenance work on completed applications, says Andrew Binstock of Dr. Dobbs. In that case, short sprints, continuous integration and deployment and modern programming practices are even more important to ensure a developer's time is better utilized.

Jeevak Kasarkod
on Apr 21, 2014
DataBricks Announces Spark SQL for Manipulating Structured Data Using Spark

DataBricks, the company behind Apache Spark, has announced a new addition into the Spark ecosystem called Spark SQL. Spark SQL is separate from Shark, and does not use Hive under the hood. InfoQ reached out to Reynold Xin and Michael Armbrust, software engineers at DataBricks, to learn more about Spark SQL.

Matt Kapilevich
on Apr 19, 2014
A Roundup of Cloudera Distribution Containing Apache Hadoop 5

Cloudera recently released the latest version of its software distribution, CDH5. Almost 20 months after the last major version, CDH4 seems like ages in the Big Data world. We take a look at new features this release brings and the future direction of Cloudera after the latest round of investment from Intel and Google Ventures.

Alex Giamas
on Apr 18, 2014
Hydra Takes On Hadoop

The social-networking company AddThis open-sourced Hydra under the Apache version 2.0 License in a recent announcement. Hydra grew from an in-house platform created to process semi-structured social data as live streams and do efficient query processing on those data sets.

Rags Srinivas
on Apr 11, 2014
Spark Gets a Dedicated Big Data Platform

Spark users can now use a new Big Data platform provided by intelligence company Atigeo, which bundles most of the UC Berkeley stack into a unified framework optimized for low-latency data processing that can provide significant improvements over more traditional Hadoop-based platforms.

Charles Menguy
on Apr 03, 2014
Rebecca Parsons on the ThoughtWorks Technology Radar

In January ThoughtWorks released the latest version of their Technology Radar in which they track what's interesting in the software development ecosystem. The big themes this year are (1) early warning systems and recovery in production, (2) the tension between privacy and big data, (3) the javascript ecosystem and (4) blurring of the line between the physical and virtual worlds.

Shane Hastie
on Mar 28, 2014
HBase 0.98 Introduces Cell-based Security

Apache released HBase 0.98 primarily addressing convergence with Apache Accumulo via cell-based security while resolving over 230 JIRA issues. These new security features are modeled after Accumulo.

Rags Srinivas
on Mar 21, 2014
Graph Processing Using Big Data Technologies

Processing extremely large graphs has been and remains a challenge, but recent advances in Big Data technologies have made this task more practical. Tapad, a startup based in NYC focused on cross-device content delivery, has made graph processing the heart of their business model using Big Data to scale to terabytes of data.

Charles Menguy
on Mar 17, 2014
Domino: Datascience-as-a-Service

Domino, a Platform-as-a-Service for data science, enables people to do analytical work using languages such as Python or R in the cloud (EC2).

Michael Hausenblas
on Mar 11, 2014
Big Data Hadoop Solutions, State of Affairs in Q1/2014

According to a new Forrest report, Hadoop’s momentum is unstoppable. Its usage in the enterprise is continuously growing due to its ability to offer companies new ways to store, process, analyze, and share big data. The report takes a look at Hadoop vendors and ranks them.

Boris Lublinsky
on Mar 04, 2014
IBM Launches Contest for Cognitive Mobile Apps using Watson

At the Mobile World Congress, IBM has announced a developer contest for developers to create mobile consumer and business apps powered by IBM Watson cognitive computing platform. The winners of the IBM Watson Mobile Developer Challenge will receive design consulting and support from IBM to gain access to the market.

Sergio De Simone
on Mar 03, 2014
Spark Officially Graduates From Apache Incubator

Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.

Alex Giamas
on Feb 28, 2014
Elasticsearch 1.0.0 released

Elasticsearch released version 1.0.0 of its self-titled, open-source analytics tool. Elasticsearch is a distributed search engine which allows for real-time data analysis in big-data environments. The new version comes with various functional enhancements and changes to the API to make Elasticsearch more intuitive and powerful to use.

Ralph Winzinger
on Feb 14, 2014
Running Spark on R with SparkR

UC Berkeley’s AMPLab announced a developer preview of their new project SparkR to use Apache Spark natively from R.

Charles Menguy
on Feb 11, 2014
Interactive SQL in Apache Hadoop with Impala and Hive

In the race for interactive SQL in Big Data environments, there are two open source based front-runners, Impala and Hive with the Stinger project. Cloudera recently announced that Impala is up to 69 times faster than Hive 0.12 and can outperform DBMS. Other than raw speed, we take a look at other considerations in choosing a SQL engine for Hadoop and also Tez, an application framework for YARN.

Alex Giamas
on Feb 07, 2014

Newer News

Older News

InfoQ Software Architects' Newsletter

News