InfoQ Homepage Apache Spark Content on InfoQ
-
Modern Big Data Pipelines over Kubernetes
Container management technologies like Kubernetes make it possible to implement modern big data pipelines. Eliran Bivas, senior big data architect at Iguazio, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about big data pipelines and how Kubernetes can help develop them.
-
Microsoft Updates AI Services and Tools for Data Scientists and Developers
At the recent Ignite conference, Microsoft released several updates related to its Artificial Intelligence (AI) services and tools. These updates include the release of the Azure ML Experimentation service, Azure ML Model Management service, Azure ML Workbench and the general availability of Microsoft Cognitive Services.
-
Emerging Technologies for the Enterprise Conference 2017: Day Two Recap
Day Two of the 12th annual Emerging Technologies for the Enterprise Conference was held in Philadelphia. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Kyle Daigle (engineering manager at GitHub), Holden Karau (principal software engineer at IBM), and Karen Kinnear (JVM technical lead at Oracle).
-
Emerging Technologies for the Enterprise Conference 2017: Day One Recap
Day One of the 12th annual Emerging Technologies for the Enterprise Conference was held on Tuesday, April 18 in Philadelphia, PA. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Monica Beckwith (JVM consultant at Oracle), Yehuda Katz (co-creator of Ember.js), and Jessica Kerr (lead engineer at Atomist).
-
Lightbend Speaks to InfoQ on Their Acquisition of OpsClarity
Nine months after acquiring BoldRadius, Lightbend announced their acquisition of OpsClarity, a company specializing in monitoring reactive applications. InfoQ interviewed Mark Brewer, president and CEO at Lightbend and Alan Ngai, co-founder of OpsClarity and now VP of cloud services at Lightbend to learn more about this new partnership.
-
Apache Eagle, Originally from eBay, Graduates to top-level project
Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017. Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion.
-
Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing
A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.
-
Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow
Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.
-
Spark Summit EU Highlights: TensorFlow, Structured Streaming and GPU Hardware Acceleration
Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.
-
Reactive Summit 2016 Conference: Reactive Microservices and Staging Data Pipelines
Reactive microservices, data center scale operating system (DCOS), and staging reactive data pipelines were the highlighted topics at Reactive Summit 2016 Conference held this week. InfoQ team attended the conference and this post is a summary of the first day's events at the conference.
-
Neha Narkhede: Large-Scale Stream Processing with Apache Kafka
In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like, for example, machine learning.
-
Apache Spark 2.0 Technical Preview
Two years after the first release of Apache Spark, Databricks announced the technical preview of Apache Spark 2.0 , based on upstream branch 2.0.0-preview. The preview is not ready for production, neither in terms of stability nor API, but is a release intended to gather feedback from the community ahead of the general availability of the release.
-
Databricks Integrates Spark and TensorFlow for Deep Learning
Since announcements late last year about Google open-sourcing TensorFlow, the company’s open-source library for machine learning, and previous coverage at InfoQ, the data-science community has had an opportunity to try out TensorFlow for their own projects.
-
Yahoo! Benchmarks Apache Flink, Spark and Storm
Yahoo! has benchmarked three of the main stream processing frameworks: Apache Flink, Spark and Storm.
-
IBM Commits to Advance Apache Spark
Earlier last month in Las Vegas, at IBM Insight 2015, IBM announced a major commitment to the Apache Spark project. Referring to it as “potentially the most significant open source project of the next decade” tells a lot about how important IBM believes Apache Spark is. With IDC reporting that 80% of cloud applications in the future will be data intensive, Apache Spark can unlock previously...