Benchmarking Rust's Top Machine Learning Frameworks

vs

About

linfa and smartcore have emerged as two leading scikit-learn-analogous machine learning frameworks for Rust. Both provide access to a number of algorithms that form the backbone of machine learning analysis. This repository provides a comparison between the training time of algorithms in these two machine learning frameworks. The algorithms included are:

Algorithm Smartcore v2.0.0 Linfa v5.0.0 Benchmarked here?
Linear Regression
Ridge Regression    
LASSO Regression    
Decision Tree Regression    
Random Forest Regression    
Support Vector Regression
KNN Regression    
Elastic Net Regression
Partial Least Squares    
Logistic Regression
Decision Tree Classification
Random Forest Classification    
Support Vector Classification
KNN Classification    
Gaussian Naive Bayes
K-Means
DBSCAN
Hierarchical Clustering    
Approximated DBSCAN    
Gaussian Mixture Model    
PCA
ICA    
SVD    
t-SNE    
Diffusion Mapping    

The full report is available here, but summary violin plots are provided below.

Considerations Besides Execution Time

Over the process of creating this benchmark study, a few additional differences between the libraries emerged.

Documentation

The documentation for smartcore is a bit more consistent across algorithms. This may be due to the fact that it is maintained in a single crate.

Dependencies

While linfa requires a BLAS/LAPACK backend (either openblas, netblas, or intel-mkl), smartcore does not. This allows linfa to take advantage of some additional optimization, but it limits portability.

Results

Regression

Linear Regression

No customization needed to equate algorithms.

Elastic Net

Support Vector Regression

Classification

Logistic Regression

The smartcore implementation has no parameters, but the linfa settings were modified to align it with smartcore defaults:

Decision Tree

Gaussian Naive Bayes

Support Vector Classification

Clustering

K-Means

Since the two implementations use different convergence criteria, the number of max iterations was equated at a low value, and only 1 run of the linfa algorithm was permitted:

DBSCAN

Dimensionality Reduction

PCA