InfoQ Homepage Data Science Content on InfoQ
-
Challenges and Solutions for Building Machine Learning Systems
According to Camilla Montonen, the challenges of building machine learning systems are mostly creating and maintaining the model. MLOps platforms and solutions contain components needed to build machine systems. MLOps is not about the tools; it is a culture and a set of practices. Montonen suggests that we should bridge the divide between practices of data science and machine learning engineering.
-
Custom GPTs from OpenAI May Leak Sensitive Information
After it was reported that OpenAI has started rolling out its new GPT Store, it was also discovered that some of the data they’re built on is easily exposed. Multiple groups have begun finding that the system has the potential to leak otherwise sensitive information.
-
Mojo Language SDK Available: Mojo Driver, VS Code extension, and Jupyter Kernel
Mojo SDK is available for developers. It contains the mojo driver, the Visual Studio Code extension and the Jupyter kernel. For now, SDK is available for MacOS and Linux.
-
Google Announces Ray Support for Vertex AI to Boost Machine Learning Workflows
Google has announced that it is expanding its open-source support for Vertex AI, its machine learning platform, by adding support for Ray, an open-source unified compute framework. This move is aimed at efficiently scaling AI workloads and enhancing the productivity and operational efficiency of data science teams.
-
Jupyter AI Brings Generative AI to Notebooks
The open-source Project Jupyter, used by millions for data science and machine learning, has released Jupyter AI, a free tool bringing powerful generative AI capabilities to Jupyter notebooks.
-
AI, ML, Data Engineering News Roundup: Jupyter AI, AudioCraft, OverflowAI, StableCode and Tabnine
The latest update, which covers developments until August 7, 2023, highlights significant accomplishments and statements made in the fields of artificial intelligence, machine learning, and data science. This week's major news involved Jupyter, Meta AI, Overflow, Stability AI and Tabnine.
-
Introduction to Mojo Programming Language
Mojo is a newly presented programming language that combines the simplicity of Python with the speed and memory security of Rust. It is at an early stage of development and offers users an online playground to explore its features. Mojo aims for excellence in data science and machine learning, providing a fast alternative to Python. There are gradual plans to make it available to open-source.
-
JetBrains Launches the Kotlin Notebook Plugin for IntelliJ IDEA
Using the experimental Kotlin Notebook plugin for IntelliJ IDEA, developers will be able to combine code, visualizations, and text, as well as to run code snippets and view their results, all in a single document.
-
Zero-Copy In-Memory Sharing of Large Distributed Data: V6d
Zero-copy and in-memory data manager Vineyard (v6d) is maintained as a CNCF sandbox project and provides distributed operators that can be utilized to share immutable data within or across cluster nodes. V6d is of interest particularly for deep network training on big (sharded) datasets such as large language and graph models.
-
AWS Makes it Simpler to Share ML Models and Notebooks with Amazon SageMaker JumpStart
AWS announced that it is now easier to share machine learning artifacts like models and notebooks with other users using SageMaker JumpStart. Amazon SageMaker JumpStart is a machine learning hub that helps users accelerate their journey into the world of machine learning.
-
NVIDIA Kubernetes Device Plug-in Brings Temporal GPU Concurrency
Starting from the v12 release, the Nvidia GPU device plug-in framework started supporting time-sliced sharing between CUDA workloads on Kubernetes. This feature aims to prevent under-utilization of GPU units and make it easier to scale applications by leveraging concurrently-executing CUDA contexts.
-
Anaconda Publishes 2022 State of Data Science Report
Anaconda, makers of a Python distribution popular among data scientists, recently published a report on the results of their State of Data Science survey. The report summarizes responses from nearly 3,500 students, academics, and professionals from 133 countries, and covers topics about respondent demographics and jobs as well as trends within the community.
-
Alpa: Automating Model Sharding for Distributed Deep Learning
A new open-source library called Alpa aims to automate distributed training and serving of large deep networks. It proposes a compiler where existing model-parallel strategies are combined and the usage of computing resources is optimized according to the deep network architecture.
-
TensorFlow DTensor: Unified API for Distributed Deep Network Training
Recently released TensorFlow v2.9 introduces a new API for the model, data, and space-parallel (aka spatially tiled) deep network training. DTensor aims to decouple sharding directives from the model code by providing higher-level utilities to partition the model and batch parameters between devices.
-
Ten Lessons from Three Generations of Tensor Processing Units
A recent report published by Google’s TPU group highlights ten takeaways from developing three generations of tensor processing units. The authors also discuss how their previous experience will affect the development of future tensor processing units.