InfoQ Homepage Large language models Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Google Releases PaliGemma 2 Vision-Language Model Family

Google DeepMind released PaliGemma 2, a family of vision-language models (VLM). PaliGemma 2 is available in three different sizes and three input image resolutions and achieves state-of-the-art performance on several vision-language benchmarks.

Anthony Alford
on Jan 14, 2025
AI, ML & Data Engineering

Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer

Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell chip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.

Sergio De Simone
on Jan 13, 2025
AI, ML & Data Engineering

Nvidia Nemotron Models Aim to Accelerate AI Agent Development

Nvidia has launched Llama Nemotron large language models (LLMs) and Cosmos Nemotron vision language models (VLMs) with a special emphasis on workflows powered by AI agents such as customer support, fraud detection, product supply chain optimization, and more. Models in the Nemotron family come in Nano, Super, and Ultra sizes to better fit the requirements of diverse systems.

Sergio De Simone
on Jan 11, 2025
AI, ML & Data Engineering

Meta Open-Sources Byte Latent Transformer LLM with Improved Scalability

Meta open-sourced Byte Latent Transformer (BLT), a LLM architecture that uses a learned dynamic scheme for processing patches of bytes instead of a tokenizer. This allows BLT models to match the performance of Llama 3 models but with 50% fewer inference FLOPS.

Anthony Alford
on Jan 07, 2025
AI, ML & Data Engineering

Hugging Face Smolagents is a Simple Library to Build LLM-Powered Agents

Smolagents is a library created at Hugging Face to build agents based on large language models (LLMs). Hugging Faces says its new library aims to be simple and LLM-agnostic. It supports secure "agents that write their actions in code" and is integrated with Hugging Face Hub.

Sergio De Simone
on Jan 04, 2025
AI, ML & Data Engineering

LLaMA-Mesh: NVIDIA’s Breakthrough in Unifying 3D Mesh Generation and Language Models

NVIDIA researchers have introduced LLaMA-Mesh, a groundbreaking approach that extends large language models (LLMs) to generate and interpret 3D mesh data in a unified, text-based framework. LLaMA-Mesh tokenizes 3D meshes as plain text, enabling the seamless integration of spatial and textual information.

Robert Krzaczyński
on Jan 02, 2025
AI, ML & Data Engineering

DeepThought-8B Leverages LLaMA-3.1 8B to Create a Compact Reasoning Model

DeepThought-8B is a small "reasoning" model built on LLaMA-3.1 8B that can carry through decision-making processes step by step, similarly to how OpenAI o1 does but in a much smaller package.

Sergio De Simone
on Dec 31, 2024
AI, ML & Data Engineering

Qwen Team Unveils QwQ-32B-Preview: Advancing AI Reasoning and Analytics

Qwen Team introduced QwQ-32B-Preview, an experimental research model designed to improve AI reasoning and analytical capabilities. Featuring a 32,768-token context and cutting-edge transformer architecture, it excels in math, programming, and scientific benchmarks like GPQA and MATH-500. Available on Hugging Face, it invites researchers to explore its features and contribute to its development.

Robert Krzaczyński
on Dec 31, 2024
AI, ML & Data Engineering

InstaDeep Open-Sources Genomics AI Model Nucleotide Transformers

Researchers from InstaDeep and NVIDIA have open-sourced Nucleotide Transformers (NT), a set of foundation models for genomics data. The largest NT model has 2.5 billion parameters and was trained on genetic sequence data from 850 species. It outperforms other state-of-the-art genomics foundation models on several genomics benchmarks.

Anthony Alford
on Dec 31, 2024
Development

AWS Adds News Amazon Q Developer Agent Capabilities: Doc Generation, Code Reviews, and Unit Tests

AWS has enhanced its generative AI-powered Amazon Q Developer, streamlining software development with new agent capabilities. Key features include automated documentation, code reviews, and unit test generation, allowing developers to focus on coding. Available in all AWS Regions, Amazon Q Developer simplifies processes in IDEs like Visual Studio Code and IntelliJ IDEA.

Steef-Jan Wiggers
on Dec 31, 2024
AI, ML & Data Engineering

EuroLLM-9B Aims to Improve State of the Art LLM Support for European Languages

EuroLLM-9B is an open-source large language model built in Europe and tailored to European languages, including all the official EU languages as well as 11 other non-official albeit commercially important languages. According to the team behind it, its performance makes it one of the best European-made LLM of this size.

Sergio De Simone
on Dec 27, 2024
AI, ML & Data Engineering

Anthropic Publishes Model Context Protocol Specification for LLM App Integration

Anthropic recently released their Model Context Protocol (MCP), an open standard describing a protocol for integrating external resources and tools with LLM apps. The release includes SDKs implementing the protocol, as well as an open-source repository of reference implementations of MCP.

Anthony Alford
on Dec 24, 2024
AI, ML & Data Engineering

Recap of OpenAI Highlights Key Updates in 12-Day "Shipmas"

OpenAI's "12 Days of Shipmas" event featured daily announcements of new AI features and tools. Below is a summary of the key developments.

Daniel Dominguez
on Dec 21, 2024
AI, ML & Data Engineering

OpenAI Releases Sora and Full Version of O1 Reasoning Model with Fine-Tuning

OpenAI has unveiled its advanced o1 reasoning model and the video generation model Sora, enhancing complex reasoning and video creation capabilities. Sora produces high-quality videos using innovative diffusion techniques, while o1 excels in nuanced reasoning and safety. Together, they signal a transformative leap in AI, bridging creativity and rigorous reasoning.

Andrew Hoblitzell
on Dec 16, 2024
AI, ML & Data Engineering

Meta Releases Llama 3.3: a Multilingual Model with Enhanced Performance and Efficiency

Meta has released Llama 3.3, a multilingual large language model aimed at supporting a range of AI applications in research and industry. Featuring a 128k-token context window and architectural improvements for efficiency, the model demonstrates strong performance in benchmarks for reasoning, coding, and multilingual tasks. It is available under a community license on Hugging Face.

Robert Krzaczyński
on Dec 14, 2024

Newer News

Older News

InfoQ Software Architects' Newsletter

News