A team of scientists from Salesforce Research and Chinese University of Hong Kong have released Photon, a natural language interface to databases (NLIDB). The team used deep-learning to construct a parser that achieves 63% accuracy on a common benchmark and an error-detecting module that prompts users to clarify ambiguous questions.
The team demonstrated Photon at the recent ACL 2020 conference, and team member Victoria Lin described the system in a recent blog post. The core of Photon is a neural-network-based semantic parser which converts natural language questions from a human user into SQL queries; the parser achieves 63.2% exact-match accuracy on the Spider dataset, which is the second highest result achieved to date. Photon also incorporates a question corrector which can detect when the human input cannot be translated into SQL; the question corrector initiates a dialog with the user to further refine the question, using a "chat-bot" style interface. Expert users can also input queries directly as SQL. According to Lin,
Given the advances of modern NLP, we believe an era of natural language information systems is just around the corner.
The goal of a NLIDB is to "democratize" the ability to extract useful data from relational databases, allowing users to ask questions in natural language instead of requiring the construction of a query in a programming language such as SQL. Like many of these systems, Photon uses a strategy called semantic parsing which converts the natural-language question into a logical form---essentially translating human language into programming language statements. Photon's parser is based on a neural-network whose input is a natural-language question concatenated with the database schema, and whose output is an SQL query. The parser does not have access to the complete content of the database, but for categorical columns it does have access to the possible values. The parser consists of a pre-trained BERT model and a series of LSTM sub-networks. Photon then performs beam-search decoding of the network output and applies a static SQL correctness check on the results. According to the authors, this provides an improvement of approximately 5% on the Spider dataset.
To improve the robustness of the system, Photon includes a "human-in-the-loop" question corrector. The corrector uses another neural network, a classifier that determines if a question cannot be accurately translated to SQL. The classifier is trained on a synthetic dataset constructed by the researchers by applying "swap" and "drop" operations on translatable questions. For example, a question such as "how many countries exist?" might be converted to "how many exist?" The confusion detector also identifies particular portions of a questions (spans) that are confusing. These spans are used to suggest corrections, which are fed back to the user via a chat interface.
Other tech companies are also building similar NLIDB systems. Microsoft Research developed a neural-network semantic-parsing system called CAMP which uses a series of gated recurrent units (GRU) to convert natural language questions to SQL queries. Google's TAPAS uses a slightly different approach; instead of parsing natural language to SQL, the training process for TAPAS includes the table data directly. Photon's authors point out that training the network on the table data raises data privacy concerns.
In a discussion on Hacker News, users commented on the quality of NLIDB results. One user noted:
[T]he models are bad at saying they don't know. I'm optimistic though. There's significant year-on-year improvement (driven by real progress in NLP), and the training datasets are getting more interesting. There are now conversational datasets (e.g. https://yale-lily.github.io/cosql) where the model is trained to ask follow-up questions, and an explicit goal is "system responses to clarify ambiguous questions, verify returned results, and notify users of unanswerable or unrelated questions". That could be a big win.
A demo version of Photon is available to the public. Lin says that future work includes "voice input, auto-completion, and visualization of the output," but no dates for these features have been announced.