BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Sonic: A Lightweight, Schema-Less Search

Sonic: A Lightweight, Schema-Less Search

This item in japanese

Bookmarks

Sonic is an open source, schema-less search backend promoted as an alternative to full-feature search systems such as Elasticsearch. Sonic can normalize natural language search queries, provide auto-complete, and return the most relevant results for a search query. Sonic implements an identifier index, as opposed to a document index, and queries return a list of IDs that can be resolved in an external database. Designed by engineers at Crisp, Sonic is used for the Crisp customer support messaging service search products. 

Indexed data in Sonic is stored into collections that are organized by buckets. Sonic can run with a single bucket or be configured to use multiple buckets, which enables customizing indexes for specific search use cases. A search query returns object identifiers that are intended to be resolved by an external database. This design decision minimizes the data stored on disk. Sonic provides additional features including typo corrections in queries, query term auto-complete, and Unicode support for over eighty languages. Querying and indexing are done via the Sonic Channel protocol, which offers libraries for languages including Node.js, PHP, Rust, Go, and Python. 

The Sonic search engine is comprised of an inverted index. Indexed sentences are broken into words and stored as key value pairs with each word as a key and sentences as indexed value objects that are returned when query matches occur. Data is stored in a key-value database powered by RocksDB. As objects are added or removed from an index, a background job consolidates the index in order for changes to be made available for search. Sonic is written in Rust, a modern programming language focused on safety that provides a compiled binary and does not have a garbage collector, which would interfere with Sonic's real-time memory management needs.

According to Crisp co-founder Valerian Saliou, Sonic was inspired by Redis's lightweight, open source design and Sonic is promoted as the "Redis of search". To achieve a similarly lightweight implementation, the designers of Sonic used the following criteria when deciding Sonic features:

  • Is this feature really needed?
  • How can we make it simple?
  • Is Sonic still fast and lightweight with it?
  • Is configuring Sonic getting harder with that new shiny thing?

In meeting this criteria, the creators of Sonic made several trade offs. The natural language processing (NLP) system works at the word level but not the sentence level, and can predict words but not the next word in a sentence. This decision allowed the FST graph, which corrects input typos, to remain shallow, thus reducing time and space complexity and minimizing storage requirements. Sonic batches FST rebuild cycles, which means updates to the search index are not fully real-time and new terms may not appear until the next build cycle. Additionally, the only supported protocol for interacting with Sonic is the Sonic Protocol Channel; there is no HTTP API available.

Sonic can be tested live on the Crisp helpdesk. Learn more about installing and managing Sonic at the Sonic Github page.

Rate this Article

Adoption
Style

BT