InfoQ Homepage Data Science Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs

Decathlon, one of the world's leading sports retailers, recently shared why it adopted the open source library Polars to optimize its data pipelines. The Decathlon Digital team found that migrating from Apache Spark to Polars for small input datasets provides significant speed and cost savings.

Renato Losio
on Dec 20, 2025
Culture & Methods

Shaping an Impactful Data Product Strategy

Lior Barak and Gaëlle Seret advocate proactive, business-focused strategies for data engineering. Barak proposes a 3-year roadmap using his Data Ecosystem Vision Board to align teams on strategic capabilities and measure ROI, cost, and impact. Seret promotes a "data as a product" approach, co-creating visions with stakeholders and evolving shared taxonomies to ensure long-term alignment.

Rafiq Gemmail
on Jan 15, 2025
DevOps

Netflix Enhances Metaflow with New Configuration Capabilities

Netflix has introduced a significant enhancement to its Metaflow machine learning infrastructure: a new Config object that brings powerful configuration management to ML workflows. This addition addresses a common challenge faced by Netflix's teams, which manage thousands of unique Metaflow flows across diverse ML and AI use cases.

Claudio Masolo
on Jan 10, 2025
AI, ML & Data Engineering

Hugging Face and Entalpic Unveil LeMaterial: Transforming Materials Science through AI

Entalpic, in collaboration with Hugging Face, has launched LeMaterial, an open-source initiative to tackle key challenges in materials science. By unifying data from major resources into LeMat-Bulk, a harmonized dataset with 6.7 million entries, LeMaterial aims to streamline materials discovery and accelerate innovation in areas such as LEDs, batteries, and photovoltaic cells.

Robert Krzaczyński
on Dec 19, 2024
AI, ML & Data Engineering

Meta Releases Llama 3.3: a Multilingual Model with Enhanced Performance and Efficiency

Meta has released Llama 3.3, a multilingual large language model aimed at supporting a range of AI applications in research and industry. Featuring a 128k-token context window and architectural improvements for efficiency, the model demonstrates strong performance in benchmarks for reasoning, coding, and multilingual tasks. It is available under a community license on Hugging Face.

Robert Krzaczyński
on Dec 14, 2024
AI, ML & Data Engineering

PyTorch Conference 2024: PyTorch 2.4/Upcoming 2.5, and Llama 3.1

The PyTorch Conference 2024, held by The Linux Foundation, showcased groundbreaking advancements in AI, featuring insights on PyTorch 2.4, Llama 3.1, and open-source projects like OLMo. Key discussions on LLM deployment, ethical AI, and innovative libraries like Torchtune and TorchChat emphasized collaboration and responsible practices in the evolving landscape of generative AI.

Andrew Hoblitzell
on Sep 26, 2024
Culture & Methods

Challenges and Solutions for Building Machine Learning Systems

According to Camilla Montonen, the challenges of building machine learning systems are mostly creating and maintaining the model. MLOps platforms and solutions contain components needed to build machine systems. MLOps is not about the tools; it is a culture and a set of practices. Montonen suggests that we should bridge the divide between practices of data science and machine learning engineering.

Ben Linders
on May 09, 2024
AI, ML & Data Engineering

Custom GPTs from OpenAI May Leak Sensitive Information

After it was reported that OpenAI has started rolling out its new GPT Store, it was also discovered that some of the data they’re built on is easily exposed. Multiple groups have begun finding that the system has the potential to leak otherwise sensitive information.

Andrew Hoblitzell
on Jan 14, 2024
AI, ML & Data Engineering

Mojo Language SDK Available: Mojo Driver, VS Code extension, and Jupyter Kernel

Mojo SDK is available for developers. It contains the mojo driver, the Visual Studio Code extension and the Jupyter kernel. For now, SDK is available for MacOS and Linux.

Robert Krzaczyński
on Nov 09, 2023
AI, ML & Data Engineering

Google Announces Ray Support for Vertex AI to Boost Machine Learning Workflows

Google has announced that it is expanding its open-source support for Vertex AI, its machine learning platform, by adding support for Ray, an open-source unified compute framework. This move is aimed at efficiently scaling AI workloads and enhancing the productivity and operational efficiency of data science teams.

Andrew Hoblitzell
on Sep 07, 2023
AI, ML & Data Engineering

Introduction to Mojo Programming Language

Mojo is a newly presented programming language that combines the simplicity of Python with the speed and memory security of Rust. It is at an early stage of development and offers users an online playground to explore its features. Mojo aims for excellence in data science and machine learning, providing a fast alternative to Python. There are gradual plans to make it available to open-source.

Robert Krzaczyński
on Jul 19, 2023
AI, ML & Data Engineering

Zero-Copy In-Memory Sharing of Large Distributed Data: V6d

Zero-copy and in-memory data manager Vineyard (v6d) is maintained as a CNCF sandbox project and provides distributed operators that can be utilized to share immutable data within or across cluster nodes. V6d is of interest particularly for deep network training on big (sharded) datasets such as large language and graph models.

Sabri Bolkar
on Mar 14, 2023
AI, ML & Data Engineering

NVIDIA Kubernetes Device Plug-in Brings Temporal GPU Concurrency

Starting from the v12 release, the Nvidia GPU device plug-in framework started supporting time-sliced sharing between CUDA workloads on Kubernetes. This feature aims to prevent under-utilization of GPU units and make it easier to scale applications by leveraging concurrently-executing CUDA contexts.

Sabri Bolkar
on Dec 19, 2022
AI, ML & Data Engineering

Anaconda Publishes 2022 State of Data Science Report

Anaconda, makers of a Python distribution popular among data scientists, recently published a report on the results of their State of Data Science survey. The report summarizes responses from nearly 3,500 students, academics, and professionals from 133 countries, and covers topics about respondent demographics and jobs as well as trends within the community.

Anthony Alford
on Nov 01, 2022
AI, ML & Data Engineering

Alpa: Automating Model Sharding for Distributed Deep Learning

A new open-source library called Alpa aims to automate distributed training and serving of large deep networks. It proposes a compiler where existing model-parallel strategies are combined and the usage of computing resources is optimized according to the deep network architecture.

Sabri Bolkar
on Oct 31, 2022

Newer News

Older News

InfoQ Software Architects' Newsletter

News