InfoQ Homepage Hugging Face Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Daggr Introduced as an Open-Source Python Library for Inspectable AI Workflows

The Gradio team has released Daggr, a new open-source Python library designed to simplify the construction and debugging of multi-step AI workflows. Daggr allows developers to define workflows programmatically in Python while automatically generating a visual canvas that exposes intermediate states, inputs, and outputs for each step in the pipeline.

Robert Krzaczyński
on Feb 02, 2026
Cloud

Google BigQuery Adds SQL-Native Managed Inference for Hugging Face Models

Google has launched SQL-native managed inference for 180,000+ Hugging Face models in BigQuery. The preview release collapses the ML lifecycle into a unified SQL interface, eliminating the need for separate Kubernetes or Vertex AI management. Key features include automated resource governance via endpoint_idle_ttl and secure identity-based execution using existing data warehouse permissions.

Steef-Jan Wiggers
on Jan 28, 2026
AI, ML & Data Engineering

Hugging Face Releases FineTranslations, a Trillion-Token Multilingual Parallel Text Dataset

Hugging Face has released FineTranslations, a large-scale multilingual dataset containing more than 1 trillion tokens of parallel text across English and 500+ languages. The dataset was created by translating non-English content from the FineWeb2 corpus into English using Gemma3 27B, with the full data generation pipeline designed to be reproducible and publicly documented.

Robert Krzaczyński
on Jan 18, 2026
AI, ML & Data Engineering

NVIDIA Releases Open Models, Datasets, and Tools across AI, Robotics, and Autonomous Driving

NVIDIA has released a set of open models, datasets, and development tools covering language, agentic systems, robotics, autonomous driving, and biomedical research. The update expands several existing NVIDIA model families and makes accompanying training data and reference implementations available through GitHub, Hugging Face, and NVIDIA’s developer platforms.

Robert Krzaczyński
on Jan 10, 2026
AI, ML & Data Engineering

Transformers v5 Introduces a More Modular and Interoperable Core

Hugging Face has released the first candidate for Transformers v5, marking a significant evolution from v4 five years ago. The library has grown from a specialized model toolkit to a critical resource in AI development, achieving over three million installations daily and more than 1.2 billion total installs.

Robert Krzaczyński
on Dec 16, 2025
AI, ML & Data Engineering

AnyLanguageModel: Unified API for Local and Cloud LLMs on Apple Platforms

Developers on Apple platforms often face a fragmented ecosystem when using language models. Local models via Core ML or MLX offer privacy and offline capabilities, while cloud services like OpenAI, Anthropic, or Google Gemini provide advanced features. AnyLanguageModel, a new Swift package, simplifies integration by offering a unified API for both local and remote models.

Robert Krzaczyński
on Nov 24, 2025
AI, ML & Data Engineering

Meta and Hugging Face Launch OpenEnv, a Shared Hub for Agentic Environments

Meta's PyTorch team and Hugging Face have launched OpenEnv, an open-source platform for standardizing AI agent environments. The OpenEnv Hub features secure sandboxes that define the necessary tools and APIs for safe, predictable AI operation. Developers can explore, contribute, and refine environments, paving the way for scalable agent development in the open-source RL ecosystem.

Robert Krzaczyński
on Nov 04, 2025
AI, ML & Data Engineering

Hugging Face Introduces RTEB, a New Benchmark for Evaluating Retrieval Models

Hugging Face unveils the Retrieval Embedding Benchmark (RTEB), a pioneering framework to assess embedding models' real-world retrieval accuracy. By merging public and private datasets, RTEB narrows the "generalization gap," ensuring models perform reliably across critical sectors. Now live and inviting collaboration, RTEB aims to set a community standard in AI retrieval evaluation.

Robert Krzaczyński
on Oct 16, 2025
AI, ML & Data Engineering

Hugging Face Introduces mmBERT, a Multilingual Encoder for 1,800+ Languages

Hugging Face has released mmBERT, a new multilingual encoder trained on more than 3 trillion tokens across 1,833 languages. The model builds on the ModernBERT architecture and is the first to significantly improve upon XLM-R, a long-time baseline for multilingual understanding tasks.

Robert Krzaczyński
on Sep 29, 2025
AI, ML & Data Engineering

Baidu’s PP-OCRv5 Released on Hugging Face, Outperforming VLMs in OCR Benchmarks

Baidu has released PP-OCRv5 on Hugging Face, a new optical character recognition (OCR) model built to outperform large vision-language models (VLMs) in specialized text recognition tasks. Unlike general-purpose architectures such as Gemini 2.5 Pro, Qwen2.5-VL, or GPT-4o, which handle OCR as part of broader multimodal workflows, PP-OCRv5 is purpose-built for accuracy, efficiency, and speed.

Robert Krzaczyński
on Sep 25, 2025
AI, ML & Data Engineering

Hugging Face Releases FinePDFs: a 3-Trillion-Token Dataset Built from PDFs

Hugging Face has unveiled FinePDFs, the largest publicly available corpus built entirely from PDFs. The dataset spans 475 million documents in 1,733 languages, totaling roughly 3 trillion tokens. At 3.65 terabytes in size, FinePDFs introduces a new dimension to open training datasets by tapping into a resource long considered too complex and expensive to process.

Robert Krzaczyński
on Sep 15, 2025
AI, ML & Data Engineering

Hugging Face Introduces AI Sheets, a No-Code Tool for Dataset Transformation

Hugging Face has released AI Sheets, an open-source application designed to let users build, transform, and enrich datasets using AI models through a spreadsheet-like interface. The tool, available both on the Hub and for local deployment, allows users to experiment with thousands of open models, including OpenAI’s gpt-oss, without requiring code.

Robert Krzaczyński
on Sep 08, 2025
AI, ML & Data Engineering

Hugging Face Releases Trackio, a Lightweight Open-Source Experiment Tracking Library

Hugging Face has introduced Trackio, a new open-source Python library for experiment tracking designed to be lightweight, transparent, and easy to integrate. Built as a drop-in replacement for Weights & Biases (wandb), Trackio offers local dashboards by default and seamless syncing with Hugging Face Spaces for sharing and collaboration.

Robert Krzaczyński
on Sep 02, 2025
AI, ML & Data Engineering

Hugging Face Launches Reachy Mini Robots for Human-Robot Interaction

Hugging Face has launched its Reachy Mini robots, now available for order. Designed for AI developers, researchers, and enthusiasts, the robots offer an exciting opportunity to experiment with human-robot interaction and AI applications.

Daniel Dominguez
on Jul 15, 2025
AI, ML & Data Engineering

MiniMax Releases M1: a 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

MiniMax has introduced MiniMax-M1, a new open-weight reasoning model built to handle extended contexts and complex problem-solving with high efficiency. Built on top of the earlier MiniMax-Text-01, M1 features a hybrid Mixture-of-Experts (MoE) architecture and a novel “lightning attention” mechanism.

Robert Krzaczyński
on Jun 24, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News