InfoQ Homepage Apache Spark Content on InfoQ
-
Databricks Open Sources Delta Lake to Make Data Lakes More Reliable
Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...
-
Microsoft Releases High-Performance C# and F# Support for Apache Spark
Microsoft announced the release of .NET for Apache Spark, adding new high-performance C# and F# binding to the big-data computation engine.
-
The Evolution of Uber’s 100+ Petabyte Big Data Platform
Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.
-
DevOps Workbench Launched by ZeroStack
Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.
-
Modern Big Data Pipelines over Kubernetes
Container management technologies like Kubernetes make it possible to implement modern big data pipelines. Eliran Bivas, senior big data architect at Iguazio, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about big data pipelines and how Kubernetes can help develop them.
-
Microsoft Updates AI Services and Tools for Data Scientists and Developers
At the recent Ignite conference, Microsoft released several updates related to its Artificial Intelligence (AI) services and tools. These updates include the release of the Azure ML Experimentation service, Azure ML Model Management service, Azure ML Workbench and the general availability of Microsoft Cognitive Services.
-
Emerging Technologies for the Enterprise Conference 2017: Day Two Recap
Day Two of the 12th annual Emerging Technologies for the Enterprise Conference was held in Philadelphia. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Kyle Daigle (engineering manager at GitHub), Holden Karau (principal software engineer at IBM), and Karen Kinnear (JVM technical lead at Oracle).
-
Emerging Technologies for the Enterprise Conference 2017: Day One Recap
Day One of the 12th annual Emerging Technologies for the Enterprise Conference was held on Tuesday, April 18 in Philadelphia, PA. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Monica Beckwith (JVM consultant at Oracle), Yehuda Katz (co-creator of Ember.js), and Jessica Kerr (lead engineer at Atomist).
-
Lightbend Speaks to InfoQ on Their Acquisition of OpsClarity
Nine months after acquiring BoldRadius, Lightbend announced their acquisition of OpsClarity, a company specializing in monitoring reactive applications. InfoQ interviewed Mark Brewer, president and CEO at Lightbend and Alan Ngai, co-founder of OpsClarity and now VP of cloud services at Lightbend to learn more about this new partnership.
-
Apache Eagle, Originally from eBay, Graduates to top-level project
Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017. Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion.
-
Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing
A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.
-
Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow
Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.
-
Spark Summit EU Highlights: TensorFlow, Structured Streaming and GPU Hardware Acceleration
Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.
-
Reactive Summit 2016 Conference: Reactive Microservices and Staging Data Pipelines
Reactive microservices, data center scale operating system (DCOS), and staging reactive data pipelines were the highlighted topics at Reactive Summit 2016 Conference held this week. InfoQ team attended the conference and this post is a summary of the first day's events at the conference.
-
Neha Narkhede: Large-Scale Stream Processing with Apache Kafka
In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like, for example, machine learning.