InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
University of Chinese Academy of Sciences Open-Sources Multimodal LLM LLaMA-Omni
Researchers at the University of Chinese Academy of Sciences (UCAS) recently open-sourced LLaMA-Omni, an LLM that can operate on both speech and text data. LLaMA-Omni is based on Meta's Llama-3.1-8B-Instruct LLM and outperforms similar baseline models while requiring less training data and compute.
-
Meta Unveils Movie Gen, a New AI Model for Video Generation
Meta has announced Movie Gen, a new AI model designed to create high-quality 1080p videos with synchronized audio. The system enables instruction-based video editing and allows for personalized content generation using user-supplied images.
-
Meta Releases Llama 3.2 with Vision, Voice, and Open Customizable Models
Meta recently announced Llama 3.2, the latest version of Meta's open-source language model, which includes vision, voice, and open customizable models. This is the first multimodal version of the model, which will allow users to interact with visual data in ways like identifying objects in photos or editing images with natural language commands among other use cases.
-
OpenAI Releases Stable Version of .NET Library with GPT-4o Support and API Enhancements
OpenAI has released the stable version of its official .NET library, following June's beta launch. Available as a NuGet package, it supports the latest models like GPT-4o and GPT-4o mini, and the full OpenAI REST API. The release includes both sync and async APIs, streaming chat completions, and key-breaking changes for improved API consistency.
-
Valkey 8.0 Now Generally Available with Improved Memory Efficiency
The Linux Foundation has announced the general availability of Valkey 8.0, the open source in-memory storage solution developed as a successor to Redis. By introducing a dictionary per slot and embedding keys directly into dictionary entries, developers can achieve up to 20% more capacity, allowing for the storage of additional keys per node.
-
Google Develops Voice Transfer AI for Restoring Voices
A team at Google Research developed a zero-shot voice transfer (VT) model that can be used to customize a text-to-speech (TTS) with a specific person's voice. This allows speakers who have lost their voice, for example from Parkinson's disease or ALS, to use a TTS device to replicate their original voice. The model also works across languages.
-
PyTorch Conference 2024: PyTorch 2.4/Upcoming 2.5, and Llama 3.1
The PyTorch Conference 2024, held by The Linux Foundation, showcased groundbreaking advancements in AI, featuring insights on PyTorch 2.4, Llama 3.1, and open-source projects like OLMo. Key discussions on LLM deployment, ethical AI, and innovative libraries like Torchtune and TorchChat emphasized collaboration and responsible practices in the evolving landscape of generative AI.
-
Anthropic Unveils Contextual Retrieval for Enhanced AI Data Handling
Anthropic has announced Contextual Retrieval, a significant advancement in AI systems' interaction with extensive knowledge bases. This technique addresses the challenge of context loss in Retrieval-Augmented Generation (RAG) systems by enriching text chunks with contextual information before embedding or indexing.
-
Microsoft Launches Azure AI Inference SDK for .NET
Microsoft launched Azure AI Inference SDK for .NET, streamlining access to generative AI models in the Azure AI Studio model catalog. This catalog includes models from providers like Azure OpenAI Service, Mistral, Meta, Cohere, NVIDIA, and Hugging Face, organized into three collections: Curated by Azure AI, Azure OpenAI Models, and Open Models from Hugging Face Hub.
-
Uber Creates GenAI Gateway Mirroring OpenAI API to Support over 60 LLM Use Cases
Uber created a unified platform for serving large language models (LLMs) from external vendors and self-hosted ones and opted to mirror OpenAI API to help with internal adoption. GenAI Gateway provides a consistent and efficient interface and serves over 60 distinct LLM use cases across many areas.
-
Study Shows AI Coding Assistant Improves Developer Productivity
Researchers from Microsoft, MIT, Princeton University, and the Wharton School of the University of Pennsylvania recently published a study that showed the use of GitHub Copilot increased developer productivity. The team conducted three separate randomized controlled trials (RCT) involving over 4,000 developers; the ones using Copilot achieved a 26% increase in productivity.
-
Google Proposes Adding Pipe Syntax to SQL
In a recent paper titled "SQL Has Problems. We Can Fix Them", a research team at Google proposed the introduction of a new Pipe Syntax in SQL. Google's solution to address perceived limitations in SQL is currently available in the GoogleSQL and ZetaSQL dialects, but it has received mixed feedback from the community.
-
Stability AI Announces Integration of Top Text-to-Image Models with Amazon Bedrock
Stability AI has introduced three new text-to-image models to Amazon Bedrock: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These models focus on improving performance in multi-subject prompts, image quality, and typography. They are designed to generate high-quality visuals for various use cases in marketing, advertising, media, entertainment, retail, and more.
-
AWS Announces General Availability of EC2 P5e Instances, Powered by NVIDIA H100 Tensor Core GPUs
Amazon Web Services (AWS) has launched EC2 P5e instances featuring NVIDIA H100 Tensor Core GPUs, substantially boosting AI and HPC performance. With enhanced memory bandwidth, these instances reduce latency for real-time applications. Ideal for tasks like LLM training and simulations, they offer improved scalability and cost-efficiency, making them pivotal for modern cloud computing.
-
Apple Open-Sources Multimodal AI Model 4M-21
Researchers at Apple and the Swiss Federal Institute of Technology Lausanne (EPFL) have open-sourced 4M-21, a single any-to-any AI model that can handle 21 input and output modalities. 4M-21 performs well "out of the box" on several vision benchmarks and is available under the Apache 2.0 license.