Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News AWS Deep Graph Knowledge Embedding for Bond Trading Predictions

AWS Deep Graph Knowledge Embedding for Bond Trading Predictions

AWS developed the Deep Graph Knowledge Embedding Library (DGL-KE), a knowledge graph embedding library built on the Deep Graph Library (DGL). DGL is a scalable, high performance Python library for deep learning in graphs. This library is used by the advanced machine learning systems developed with Trumid to build a credit trading platform.

Trumid developed an electronic trading platform where traders can buy and sell bonds and interact with the community. With expanding the network of users, Trumid needs a ML system to deliver personalized trading experience by modeling the preferences and interests of its platform users. In this way the most relevant insights and information are displayed to each user to allow a faster and curated trading experience.

AWS Machine learning solutions Lab is engaged to help the Trumid’s AI and Data strategy team to develop together an end-to-end pipeline composed by data preparation, model training and inference process based on deep neural network model built using Deep Graph Library for Knowledge Embedding (DGL-KE).

Bond trading can be thought of as a network of interaction between buyers and sellers involving various types of bond, so a graph provides a natural way to model this real world complexity with the embedded information in the relationship between entities.

In this case, the graph ML algorithms fit better than traditional ML algorithms due to the nature of the dataset. A traditional ML algorithm works with a tabled structured data, a graph ML algorithm learns from a graph dataset that includes information about constituent nodes, edges and other features.

The dataset used by Trumid and AWS is characterized by the dimensions as trade size, term, issuer, rate, coupon values, bid/ask offer, type of trading protocol and indications of interest (IOIs). These data are used to build graphs of interactions between traders, bonds and issuer and a graph ML model is developed to predict the future interactions.

The first step of the recommendation pipeline is the data preparation: the trading data are represented as a graph with only nodes and typed edges where nodes are traders or bonds and the edges are the relations and this dataset is saved in TSV format.

Graph of relations between traders, bonds and bond issuers

Graph or relations between traders, bonds amd bond issuers

DGL-KE fits well for knowledge graphs, that are graphs composed only by nodes and relations. Knowledge graph is a structured meld of entities, relationship and semantic description. The information stored in the knowledge graph is often specified in triplets: head,relation and tail ([h,r,t]) where head and tails are the entities and the union are also known as statements.

Knowledge Graph embeddings are low dimensional representation of the entities and relations of a knowledge graph. Popular KGE models are: TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE. The differences between these models is the score function. This function measures the distance between associated entities by their relation. In other words: entities connected by a relation are closed to each other, whereas the other not connected entities are far apart in the vector space.

For this particular application the TransE embedding model is used for the training phase and to predict new trades is used the equality:

Source node embedding + relation embedding = target node embedding

Where source node embedding is the trader embedding, the relation embedding is the trade-recent embedding and the target node are the bonds closest to the resulting embedding.

This approach is tested to compute scores for all possible trade-recent relations and compute the top 100 highest scores for each trader.

The solution is released in production as a single script in SageMaker processing. This is possible because it is not necessary to separate data preparation, model training and prediction.

With this implementation the mean recall that is the percentage of actual trades predicted by the recommender, averaged all over the traders, is improved by 80% with respect to the other methods across all trades types.

About the Author

Rate this Article