InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
OpenAI Releases an Advanced Classifier to Distinguish AI and Human Writing Styles
OpenAI is releasing a trained classifier to distinguish between text written by a human and text written by AIs. This classifier comes from a growing need for technologies that can help discern between material authored by people and that written by machines.
-
Microsoft Open Sources AI Prompt Optimization Toolkit LMOps
Microsoft Research open sourced LMOps, a collection of tools for improving text prompts used as input to generative AI models. The toolkit includes Promptist, which optimizes a user's text input for text-to-image generation, and Structured Prompting, a technique for including more examples in a few-shot learning prompt for text generation.
-
Stanford Researchers Present AI Framework to Implement and Validate Complex Algorithms
Parsel, an AI framework created by a group of researchers at Stanford, uses large language model (LLM) reasoning to transform hierarchical functions descriptions in natural language into an implementation in code. Additionally, the researchers maintain, Parsel can be used for robot planning and theorem proving.
-
Google Unveils MusicLM, an AI That Can Generate Music from Text Prompts
Google researchers have introduced MusicLM, an AI model that can generate high-fidelity music from text. MusicLM creates music at a constant 24 kHz throughout a number of minutes by modeling the conditional music generating process as a hierarchical sequence-to-sequence modeling problem.
-
DeepMind Announces Minecraft-Playing AI DreamerV3
Researchers from DeepMind and the University of Toronto announced DreamerV3, a reinforcement-learning (RL) algorithm for training AI models for many different domains. Using a single set of hyperparameters, DreamerV3 outperforms other methods on several benchmarks and can train an AI to collect diamonds in Minecraft without human instruction.
-
Intel oneDAL Available in ML.NET
The first preview release of ML.NET 3.0, available since December, contains the integration with Intel oneAPI Data Analytics Library that leverages SIMD extensions on 64-bit architectures, which are available on Intel and AMD processors.
-
Microsoft Unveils VALL-E, a Game-Changing TTS Language Model
Microsoft has introduced VALL-E, a novel language model method for text-to-speech synthesis (TTS) that employs audio codec codes as intermediate representations and can replicate anyone's voice after listening to just three seconds of audio recording.
-
AI Developers Release Open-Source Implementations of ChatGPT Training Algorithm
AI research groups LAION and CarperAI have released OpenAssistant and trlX, open-source implementations of reinforcement learning from human feedback (RLHF), the Algorithm used to train ChatGPT. Independent AI developer Phil Wang has also open-sourced his own implementation of the algorithm.
-
Doordash Introduces ML to Understand the Marketplaces Status
DoorDash introduces an ML model to predict the operational status of a store in order to increase the user experience and save thousands of orders cancellation. Understanding the merchant’s operational status and the ability to receive and fulfill orders is crucial for the DoorDash platform.
-
Amazon Athena Now Supports Apache Spark Engine
Amazon Athena now supports the open-source distributed processing system Apache Spark to run fast analytics workloads. Data analysts and engineers can use Jupyter Notebook in Athena to perform data processing and programmatically interact with Spark applications.
-
Google Address Validation API Is Generally Available to Improve Address Accuracy
Google recently announced the general availability of the Address Validation API. The new feature of the Google Maps Platform validates an address, standardizes it for mailing, and determines the best-known geocode location for it.
-
GitHub Releases Copilot for Business amid Ongoing Legal Controversy
GitHub has announced Copilot for Business, a business plan for their OpenAI-powered coding assistant Copilot. The release follows a recent class action lawsuit against Microsoft, GitHub, and OpenAI for violating open-source licenses.
-
eBay New Recommendations Model with Three Billion Item Titles
eBay developed a new recommendations model based on Natural Language Processing (NLP) techniques and in particular on BERT model. This new model, called “ranker,” uses the distance score between the embeddings as a feature; in this way the information in the titles of the products is analyzed from the semantic points of view.
-
BigCode Project Releases Permissively Licensed Code Generation AI Model and Dataset
The BigCode Project recently released The Stack, a 6.4TB dataset containing de-duplicated source code from permissively licensed GitHub repositories which can be used to train code generation AI models. BigCode also released SantaCoder, a 1.1B parameter code generation model trained on The Stack. SantaCoder outperforms similar open-source code generation models.
-
AWS Releases SimSpace Weaver for Real-Time Spatial Simulations
AWS recently released SimSpace Weaver, a managed option to run real-time spatial simulations across multiple EC2 instances. Distributing simulation workloads, the service can handle large real-world environments, crowd simulations, and immersive interactive experiences.