Transformers.js: ML for the Web, Now with Text-to-Speech

Transformers.js, the JavaScript counterpart to the Python Transformers library, is designed for running Transformers models directly within web browsers, eliminating the necessity for external server processing. In the recent update to version 2.7, Transformers.js introduced enhancements, including notable text-to-speech (TTS) support. This upgrade, responding to user demand, increased the library's versatility for additional use cases.

Text-to-speech (TTS) involves creating natural-sounding speech from text, supporting multiple spoken languages and speakers. Currently, Transformers.js only supports TTS with Xenova/speecht5_tts, which is based on Microsoft's SpeechT5 with ONNX weights. There are plans for future updates, including adding support for bark and MMS.

Developers can use the text-to-speech functionality by employing the pipeline function from @xenova/transformers. This involves specifying the 'text-to-speech' task and the model ('Xenova/speecht5_tts') to be used, with the option { quantized: false }. Additionally, a link to a file containing speaker embeddings is provided.

Once the TTS model is applied to a given text, the output includes an audio array and the sampling rate. This array represents the synthesized speech, which can be further processed or played directly in the browser.

Transformers.js caters to various use cases, including style transfer, image inpainting, image colorization, and super-resolution. Its versatility and regular updates position it as a valuable asset for developers exploring the intersection of machine learning and web development, making it a reliable tool in the realm of web-based machine learning.

Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pre-trained models using a very similar API.

Supporting a vast array of tasks and models, Transformers.js spans natural language processing, vision, audio, tabular data, multimodal applications, and reinforcement learning. The library covers tasks from text classification and summarization to image segmentation and object detection, making it a versatile tool for various machine learning applications.

The extensive list of supported models includes architectures such as BERT, GPT-2, T5, and Vision Transformer (ViT), among many others, ensuring users can choose the right model for their specific task.

The community has been positive about the release of Transformers.js. In a Reddit thread initiated earlier this year, user Intrepid-Air6525 stated:

I decided to use it to replace openai’s embeddings model. Works pretty fast. I am using webLLM for the actual LLM since I don’t want to use up too much CPU processing.

User 1EvilSexyGenius commented about Hugging Face’s positioning in the market and the related focus on the discussion of practical implementations:

[...]Between transformers.js and their optimum libraries I think it's clear that [Hugging Face] are truly trying to democratize language models and bring them to the people. This community could benefit from posts like these vs all of the daily model releases.

Interested readers can learn more from the Hugging Face Transformers.js website and associated GitHub repo.

About the Author

Agazi Mekonnen

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

About the Author

Agazi Mekonnen

Rate this Article

This content is in the Web Development topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter