Scikit-learn 1.0 Supports Spline Transformers, Quantile Regression and Improved Plotting API

Scikit-learn, the popular Python-based machine learning (ML) library, has released version 1.0. Although the library has been stable for some time, and the release contains no breaking changes, the project maintainers opted for a major version revision to signal to users that the software is mature and production-ready.

The project team announced the release on Twitter. Containing 2,100 merged pull requests since the previous 0.24 release, version 1.0 contains several new features, including spline transformers, quantile regression, online one-class support vector machines (SVM), and an improved plotting API. There are also many documentation improvements, representing nearly 800 of the merged pull requests. Although there are no breaking changes, apart from those in the project's normal two-release deprecation cycle, the team decided to increment the library's major version number from 0 to 1 in recognition of the code's long-term stability and maturity. According to Adrin Jalali, a core developer on the project:

The library has been stable for a while, and we'd like to signal that by the versioning of the release....[It] includes some features which we've wanted to have for years, so it felt right to finally do it!

Scikit-learn, billed as an "easy-to-use and general-purpose machine learning in Python," is used by over 80% of data scientists, according to Kaggle's 2020 survey. The library contains implementations of many common ML algorithms and models, including the widely-used linear regression, decision tree, and gradient-boosting algorithms. Begun in 2007 as a Google Summer of Code project, it was originally conceived as an ML "toolkit" for the Python-based scientific computing library SciPy. Scikit-learn's first public beta release was in early 2010, and in 2020 the library was accepted as a Sponsored Project by NumFOCUS, the non-profit foundation that funds SciPy and many other open-source scientific computing packages.

Several new features were included in the release. One important change is that constructor and function parameters are required to be keyword arguments instead of positional. Existing histogram-based gradient boosting models have moved from experimental to stable status, and there are also new models. First, the SGDOneClassSVM model is a linear version of the One-Class SVM that is fit using stochastic gradient descent (SGD). This can approximate the solution of a kernelized One-Class SVM with "several orders of magnitude faster" time to fit. Quantile regression models can estimate the median or other quantiles of a function; the model is fit by minimizing the pinball loss.

In a discussion about the release on Hacker News, some users noted that scikit-learn still is not a good choice for deep learning models:

- No saving checkpoints (can be crucial for large models who need a lot of compute and time)
- No way to assign different activation functions to different layers
- No complex nodes like LSTM, GRU
- No way to implement complex architectures like transformers, encoders etc

Other users also pointed out that scikit-learn does not support GPU hardware. However, most users praised the library for having good documentation and being easy to use:

scikit-learn (next to NumPy) is the one library I use in every single project at work. Every time I consider switching away from Python I am faced with the fact that I'd lose access to this workhorse of a library. Of course it's not all sunshine and rainbows - I had my fair share of rummaging through its internals - but its API design is a de-facto standard for a reason.

The scikit-learn code is available on GitHub.

Topics

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Cell-Based Architecture Adoption Guidelines

Launching AI Agents Across Europe at Breakneck Speed With an Agent Computing Platform

Making Digital Accessibility More Than Just High Contrast: Building Truly Inclusive Software

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Cloudflare Introduces Workflows for Building Scalable Resilient Multi-Step Applications

Cloudflare Introduces Short-Lived SSH Access, Eliminating the Need for SSH Credentials

Microsoft Introduces Modern Web App Pattern for .NET: Accelerating App Modernization to the Cloud

Apache Tomcat 11.0 Delivers Support for Virtual Threads and Jakarta EE 11

AWS Lambda Introduces a Visual Studio Code-Based Editor with Advanced Features and AI Integration

Generally AI - Season 2 - Episode 5: Do Robots Dream of Electric Pianos?

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Steve Klabnik and Herb Sutter Talk about Rust and C++

Challenges and Lessons Porting Code from C to Rust

Grab Employs LLMs for Conversational Data Discovery with GPT-4, Glean and Slack

Cell-Based Architecture Adoption Guidelines

Software Architecture Tracks at QCon San Francisco 2024 – Navigating Current Challenges and Trends

Making Digital Accessibility More Than Just High Contrast: Building Truly Inclusive Software

What Developers Can Do to Continue to Program as They Age

How Rules Can Foster Creativity: The Design System of Reykjavík

Launching AI Agents Across Europe at Breakneck Speed With an Agent Computing Platform

OSI Releases New Definition for Open Source AI, Setting Standards for Transparency and Accessibility

Being a Responsible Developer in the Age of AI Hype

Optimizing Uber's Search Infrastructure: Upgrading to Apache Lucene 9.5

Improving the Efficiency of Goku Time-Series Database at Pinterest

Expedia Migrates a Massive Cassandra Cluster to ScyllaDB with Zero Downtime

QCon San Francisco

QCon London

InfoQ Dev Summit Boston

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?