InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Predicting Movie Ratings: NLP Tools is What Film Studios Need
In this article, the author discusses how to use Natural Language Processing (NLP) techniques to predict the movie ratings using the data shared on social media platforms. Sentiment analysis of movie reviews can also be used to classify movies into different genres and to improve the movie recommendation systems.
-
Pascal Desmarets on NoSQL Data Modeling Best Practices
NoSQL databases are specialized to store different types of data like Key Value, Documents, Column Family, Time Series, Graph, and IoT data. Pascal Desmarets talks about how to perform data modeling in NoSQL databases compared to the modeling in Relational databases.
-
Virtual Panel: Data Science, ML, DL, AI and the Enterprise Developer
InfoQ caught up with experts in the field to demystify the different topics surrounding AI, and how enterprise developers can leverage them today and thereby render their solutions more intelligently.
-
From Alibaba to Apache: RocketMQ’s Past, Present, and Future
Feng Jia and Wang Xiaorui share the core distributed systems principals behind RocketMQ, Alibaba's distributed messaging and data streaming platform now open sourced through the Apache Foundation.
-
Key Takeaway Points and Lessons Learned from QCon London 2017
This year was the 11th for QCon London; it was also our largest London event to date. Including our 140 speakers we had 1435 team leads, architects, and project managers attending 112 technical sessions across 18 concurrent editorial tracks and 16 in-depth workshops.
-
Want to Know What’s in a GC Pause? Go Look at the GC Log!
Sometimes a superficial analysis of our application performance can incorrectly have the Garbage Collector point to itself. A proper GC log analysis can lead us past the “blame the collector” game. When this happens, we can make amazing discoveries that improve the performance and stability of our applications.
-
Building Pipelines for Heterogeneous Execution Environments for Big Data Processing
The Pipeline61 framework supports the building of data pipelines involving heterogeneous execution environments. It reuses the existing code of the deployed jobs in different environments and provides version control and dependency management that deals with typical software engineering issues. A real-world case study shows its effectiveness.
-
Introducing Reladomo - Enterprise Open Source Java ORM, Batteries Included!
Goldman Sachs is widely known as a leader in investment banking, but they are very much a leading technology firm as well. Reladomo is the primary Java ORM used at GS, and it is now open source. In this article GS Technology Fellow, Mohammad Rezaei, takes us on a deep dive into Reladomo.
-
There's No AI (Artificial Intelligence) without IA (Information Architecture)
Artificial intelligence (AI) is increasingly hyped by everyone, from well-funded startups to well-known software brands. In this article the author describes the need for high-quality, structured data before AI technologies can be of use to organizations and their customers.
-
Big Data Processing Using Apache Spark - Part 6: Graph Data Analytics with Spark GraphX
In this article, author Srini Penchikala discusses Apache Spark GraphX library used for graph data processing and analytics. The article includes sample code for graph algorithms like PageRank, Connected Components and Triangle Counting.
-
Three Experts on Big Data Engineering
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) answer three questions: What major challenges do you face when building scalable, big data systems? How do you address these challenges? Where should the research community focus its efforts to create tools and approaches for building highly reliable, scalable, big data systems?
-
Data Preprocessing vs. Data Wrangling in Machine Learning Projects
This article compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing, streaming ingestion and data wrangling. The article also discusses how this is related to visual analytics, and best practices for how different user roles such as the Data Scientist or Business Analyst should work together to build analytic models.