OpenAI recently announced the release of several updates to their models, including two new embedding models and updates to GPT-4 Turbo and GPT-3.5 Turbo. The company also announced improvements to their free text moderation tool and to their developer API management tools.
InfoQ covered the original release of OpenAI's embedding model, text-embedding-ada-002, which merged the capabilities of five previous models for text search, text similarity, and code search. The new release includes two models: text-embedding-3-small, and text-embedding-3-large. The small model is a "significant upgrade" over text-embedding-ada-002, with better performance on benchmarks and a lower price per token. The large model has even better benchmark performance and can support a dimension size up to 3072.
Both new embedding models also support reducing output dimensions. While the default dimensions for text-embedding-3-small and text-embedding-3-large are 1536 and 3072 respectively, API calls to these models can specify a smaller value for desired dimension. This would allow developers to use a more efficient vector store for embeddings in their applications, without sacrificing accuracy. For example, OpenAI claims that text-embedding-3-large can output embeddings of size 256 that outperform full-size embeddings from text-embedding-ada-002.
OpenAI announced the first release of GPT-4 Turbo, which featured a longer context window and lower prices, at their developer conference in late 2023. Since then, "over 70% of requests" to GPT-4 have been using the Turbo model. The new updated Turbo model fixes bugs and also addresses model "laziness," where models fail to complete tasks, such as code generation.
OpenAI also released two updates to their developer platform. One new feature is access control for API keys: admins can assign each key to have read-only or write permissions for several API endpoints, including fine-tuning. Admins can also enable usage tracking per API key.
Romain Huet, OpenAI's head of developer experience, posted about the new releases on X. When a user asked about GPT-4 Turbo "coming out of preview", Huet replied:
No precise timeline on getting it out of preview, but you can absolutely use it in production. In fact, over 70% of requests from GPT-4 API customers have already transitioned to GPT-4 Turbo!
In a discussion on Hacker News, several users discussed whether the new GPT-4 Turbo model reduced "laziness." One user claimed that it performed worse on a lazy coding benchmark. Another user wrote about the embedding models' ability to "shorten" their dimensions:
I've found evidence that the OpenAI 1536D embeddings are unnecessarily big for 99% of use cases (and now there's a 3072D model?!) so the ability to reduce dimensionality directly from the API is appreciated for the reasons given in this post. Just chopping off dimensions to an arbitrary dimensionality is not a typical dimensionality reduction technique so that likely requires a special training/alignment technique that's novel...The embeddings aren't "chopped off", the first components of the embedding will change as dimensionality reduces, but not much.
Other users wondered how the models were able to perform the shortening, as did users on OpenAI's Developer Forum. In a post on X, AI researcher Delip Rao speculated that OpenAI used a training technique called the Matryoshka method to add this functionality.