InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Vertex AI in Firebase Aims to Simplify the Creation of Gemini-powered Mobile Apps
Currently available in beta, the Vertex AI SDK for Firebase enables the creation of apps that go beyond the simple chat model and text prompting. Google has just made available a colab to help developers through the steps required to integrate it into their apps.
-
Google Publishes LLM Self-Correction Algorithm SCoRe
Researchers at Google DeepMind recently published a paper on Self-Correction via Reinforcement Learning (SCoRe), a technique for improving LLMs' ability to self-correct when solving math or coding problems. Models fine-tuned with SCoRe achieve improved performance on several benchmarks compared to baseline models.
-
OpenAI Launches Public Beta of Realtime API for Low-Latency Speech Interactions
OpenAI launched the public beta of the Realtime API, offering developers the ability to create low-latency, multimodal voice interactions within their applications. Additionally, audio input/output is now available in the Chat Completions API, expanding options for voice-driven applications. Early feedback highlights limited voice options and response cutoffs.
-
NVIDIA Unveils NVLM 1.0: Open-Source Multimodal LLM with Improved Text and Vision Capabilities
NVIDIA unveiled NVLM 1.0, an open-source multimodal large language model (LLM) that performs strongly on both vision-language and text-only tasks. NVLM 1.0 shows improvements in text-based tasks after multimodal training, standing out among current models. The model weights are now available on Hugging Face, with the training code set to be released shortly.
-
OpenAI Developer Day 2024 (SF) Announces Real-Time API, Vision Fine-Tuning, and More
On October 1, 2024, OpenAI SF DevDay unveiled innovative features, including a Real-Time API enabling instant voice interactions and function calling. Enhanced model distillation and vision fine-tuning empower developers to customize AI for diverse applications. Upcoming events in London and Singapore will further expand these capabilities.
-
Setting up a Data Mesh Organization
A data mesh organization: producers, consumers, and the platform. According to Matthias Patzak, the mission of the platform team is to make the lives of the producer and consumers simple, efficient and stress free. Data must be discoverable and understandable, trustworthy, and shared securely and easily across the organization.
-
Hugging Face Upgrades Open LLM Leaderboard v2 for Enhanced AI Model Comparison
Hugging Face has recently released Open LLM Leaderboard v2, an upgraded version of their benchmarking platform for large language models. Hugging Face created the Open LLM Leaderboard to provide a standardized evaluation setup for reference models, ensuring reproducible and comparable results.
-
Data Teams Survey: Lag in DataOps and Value Delivered
We report on Jesse Anderson's 2024 Data Teams Survey which showed a lag in DataOps capabilities, slow LLM adoption, and a concerning decline in perceived value creation by data teams. It called out the importance of teams spread with data science, engineering, and operations capabilities. We also cover Petr Janda's recent podcast on the need for more engineering rigour for parity with other teams.
-
MongoDB 8.0 Now Available with Performance Gains and Enhanced Sharding
MongoDB has announced the general availability of MongoDB 8.0, introducing significant performance enhancements and new features. Highlights include embedded sharding configuration servers, expanded support for queryable encryption, and the capability to move collections across shards without requiring a shard key.
-
PayPal Adds GenAI Support with LLMs to Its Cosmos.AI MLOps Platform
PayPal extended its MLOps platform Cosmos.AI to support the development of generative AI applications using large language models (LLMs). The company incorporated support for vendor, open-source, and self-tuned LLMs and provided capabilities around retrieval-augmented generation (RAG), semantic caching, prompt management, orchestration, and AI application hosting.
-
University of Chinese Academy of Sciences Open-Sources Multimodal LLM LLaMA-Omni
Researchers at the University of Chinese Academy of Sciences (UCAS) recently open-sourced LLaMA-Omni, an LLM that can operate on both speech and text data. LLaMA-Omni is based on Meta's Llama-3.1-8B-Instruct LLM and outperforms similar baseline models while requiring less training data and compute.
-
Meta Unveils Movie Gen, a New AI Model for Video Generation
Meta has announced Movie Gen, a new AI model designed to create high-quality 1080p videos with synchronized audio. The system enables instruction-based video editing and allows for personalized content generation using user-supplied images.
-
Meta Releases Llama 3.2 with Vision, Voice, and Open Customizable Models
Meta recently announced Llama 3.2, the latest version of Meta's open-source language model, which includes vision, voice, and open customizable models. This is the first multimodal version of the model, which will allow users to interact with visual data in ways like identifying objects in photos or editing images with natural language commands among other use cases.
-
OpenAI Releases Stable Version of .NET Library with GPT-4o Support and API Enhancements
OpenAI has released the stable version of its official .NET library, following June's beta launch. Available as a NuGet package, it supports the latest models like GPT-4o and GPT-4o mini, and the full OpenAI REST API. The release includes both sync and async APIs, streaming chat completions, and key-breaking changes for improved API consistency.
-
Valkey 8.0 Now Generally Available with Improved Memory Efficiency
The Linux Foundation has announced the general availability of Valkey 8.0, the open source in-memory storage solution developed as a successor to Redis. By introducing a dictionary per slot and embedding keys directly into dictionary entries, developers can achieve up to 20% more capacity, allowing for the storage of additional keys per node.