Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Apache InLong: Integration Framework for Massive Data

Apache InLong: Integration Framework for Massive Data

Apache InLong, an integration framework designed for massive data, was originally built at Tencent, where it was used in production for more than eight years to support massive data reporting services in big data scenarios. The project officially graduated as an Apache top-level project three years after the introduction of the project in the Apache Incubator.

InLong manages the entire data life cycle, from data collection to landing, and provides different processing modules according to different stages of data. It comprises five main modules: Agent, DataProxy, MQ, Sort, and Manager. It adopts a pluggable architecture that allows for modules to be plugged into the system based on specific protocols.

InLong Agent is responsible for collecting data from heterogeneous data sources and writing it into target systems. It supports a Job/Task architecture where every task contains a reader, a sink and a channel. A series of connectors, based on Apache Flink, allow to read and write data to-and-from files, SQL databases, Apache Kafka, Apache Pulsar, MongoDB, Elasticsearch, HDFS and others; InLong DataProxy acts as a bridge from the InLong Agent to the message queue (MQ) and is based on Apache Flume; InLong MQ was firstly developed at Tencent as TubeMQ where it focused on the transmission of massive data and supports a master/broker architecture and 2-tier storage divided in memory and file; InLong Sort acts as an ETL service based on Apache Flink SQL; InLong Manager provides complete data service management and control capabilities, including metadata, OpenAPI, task flow and authority.

Dashboard and Audit modules are also provided to help with the use of the InLong platform.

The image below shows the standard architecture of Apache InLong:

Apache InLong can be deployed as standalone software with MySQL, Flink and Docker as requirements, or a Docker Compose YAML or a Helm Chart for Kubernetes are provided.

Currently, InLong is used in various industries such as advertising, payment, social networking, games, artificial intelligence, etc., to provide efficient and convenient customer services in multiple fields.

Apache InLong was initially donated to the Apache Incubator by the Tencent Big Data team in November 2019 and on their GitHub repository they have reached more than 100 contributors and almost 1k stars.

The Apache Software Foundation, founded in 1999, is the world's largest open source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software at 100% no cost.

About the Author

Rate this Article