InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Stanford Researchers Present AI Framework to Implement and Validate Complex Algorithms
Parsel, an AI framework created by a group of researchers at Stanford, uses large language model (LLM) reasoning to transform hierarchical functions descriptions in natural language into an implementation in code. Additionally, the researchers maintain, Parsel can be used for robot planning and theorem proving.
-
Google Unveils MusicLM, an AI That Can Generate Music from Text Prompts
Google researchers have introduced MusicLM, an AI model that can generate high-fidelity music from text. MusicLM creates music at a constant 24 kHz throughout a number of minutes by modeling the conditional music generating process as a hierarchical sequence-to-sequence modeling problem.
-
DeepMind Announces Minecraft-Playing AI DreamerV3
Researchers from DeepMind and the University of Toronto announced DreamerV3, a reinforcement-learning (RL) algorithm for training AI models for many different domains. Using a single set of hyperparameters, DreamerV3 outperforms other methods on several benchmarks and can train an AI to collect diamonds in Minecraft without human instruction.
-
Intel oneDAL Available in ML.NET
The first preview release of ML.NET 3.0, available since December, contains the integration with Intel oneAPI Data Analytics Library that leverages SIMD extensions on 64-bit architectures, which are available on Intel and AMD processors.
-
Microsoft Unveils VALL-E, a Game-Changing TTS Language Model
Microsoft has introduced VALL-E, a novel language model method for text-to-speech synthesis (TTS) that employs audio codec codes as intermediate representations and can replicate anyone's voice after listening to just three seconds of audio recording.
-
AI Developers Release Open-Source Implementations of ChatGPT Training Algorithm
AI research groups LAION and CarperAI have released OpenAssistant and trlX, open-source implementations of reinforcement learning from human feedback (RLHF), the Algorithm used to train ChatGPT. Independent AI developer Phil Wang has also open-sourced his own implementation of the algorithm.
-
Doordash Introduces ML to Understand the Marketplaces Status
DoorDash introduces an ML model to predict the operational status of a store in order to increase the user experience and save thousands of orders cancellation. Understanding the merchant’s operational status and the ability to receive and fulfill orders is crucial for the DoorDash platform.
-
Amazon Athena Now Supports Apache Spark Engine
Amazon Athena now supports the open-source distributed processing system Apache Spark to run fast analytics workloads. Data analysts and engineers can use Jupyter Notebook in Athena to perform data processing and programmatically interact with Spark applications.
-
Google Address Validation API Is Generally Available to Improve Address Accuracy
Google recently announced the general availability of the Address Validation API. The new feature of the Google Maps Platform validates an address, standardizes it for mailing, and determines the best-known geocode location for it.
-
GitHub Releases Copilot for Business amid Ongoing Legal Controversy
GitHub has announced Copilot for Business, a business plan for their OpenAI-powered coding assistant Copilot. The release follows a recent class action lawsuit against Microsoft, GitHub, and OpenAI for violating open-source licenses.
-
eBay New Recommendations Model with Three Billion Item Titles
eBay developed a new recommendations model based on Natural Language Processing (NLP) techniques and in particular on BERT model. This new model, called “ranker,” uses the distance score between the embeddings as a feature; in this way the information in the titles of the products is analyzed from the semantic points of view.
-
BigCode Project Releases Permissively Licensed Code Generation AI Model and Dataset
The BigCode Project recently released The Stack, a 6.4TB dataset containing de-duplicated source code from permissively licensed GitHub repositories which can be used to train code generation AI models. BigCode also released SantaCoder, a 1.1B parameter code generation model trained on The Stack. SantaCoder outperforms similar open-source code generation models.
-
AWS Releases SimSpace Weaver for Real-Time Spatial Simulations
AWS recently released SimSpace Weaver, a managed option to run real-time spatial simulations across multiple EC2 instances. Distributing simulation workloads, the service can handle large real-world environments, crowd simulations, and immersive interactive experiences.
-
3D Point Cloud Object from Text Prompts Using Diffusion Models
OpenAI recently released an alternative method called Point-E for 3D object generation from text prompts that takes less than two minutes on a single GPU, versus the other methods that could take a few GPU hours. This new model is based on diffusion models, which are generative models like GLIDE and StableDiffusion.
-
Google AI Unveils Muse, a New Text-to-Image Transformer Model
Google AI released a research paper about Muse, a new Text-To-Image Generation via Masked Generative Transformers that can produce photos of a high quality comparable to those produced by rival models like the DALL-E 2 and Imagen at a rate that is far faster.
Our Journey Into High Performance and Reliable Document Databases with RavenDB
At Kobo, our initial database technology choice didn't work out for us in terms of reliability, performance, or flexibility, so we looked for something new. In this talk, you'll hear about our challenges, how we evaluated the options, and our experience since widely adopting RavenDB.