InfoQ Homepage Model Inference Content on InfoQ
News
RSS Feed-
Hugging Face Expands Serverless Inference Options with New Provider Integrations
Hugging Face has launched the integration of four serverless inference providers Fal, Replicate, SambaNova, and Together AI, directly into its model pages. These providers are also integrated into Hugging Face's client SDKs for JavaScript and Python, allowing users to run inference on various models with minimal setup.
-
Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer
Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell chip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.
-
Meta Optimises AI Inference by Improving Tail Utilisation
Meta (formerly Facebook) has reported substantial improvements in the efficiency and reliability of its machine-learning model serving infrastructure by focusing on optimising tail utilisation.
-
JLama: The First Pure Java Model Inference Engine Implemented With Vector API and Project Panama
Karpathy's 700-line llama.c inference interface demystified how developers can interact with LLMs. Even before that, JLama started its journey of becoming the first pure Java-implemented inference engine for any Hugging Face model, from Gemma to Mixtral. Leveraging the new Vector API and PanamaTensorOperations class with native fallback the library is available in Maven Central.