This rule raises an issue when a Scikit-learn model is instantiated without specifying the important hyperparameters.

Why is this an issue?

When instantiating a Scikit-learn estimator, it will use default values for the hyperparameters that are not specified. Relying on the default values can lead to non-reproducible results across diffferent versions of the library.

Furthermore, the default values might not be the best choice for the specific problem at hand and can lead to suboptimal performance.

Here are the estimators and the parameters considered by this rule :

Estimator

Hyperparameters

AdaBoostClassifier

learning_rate

AdaBoostRegressor

learning_rate

GradientBoostingClassifier

learning_rate

GradientBoostingRegressor

learning_rate

HistGradientBoostingClassifier

learning_rate

HistGradientBoostingRegressor

learning_rate

RandomForestClassifier

min_samples_leaf, max_features

RandomForestRegressor

min_samples_leaf, max_features

ElasticNet

alpha, l1_ratio

NearestNeighbors

n_neighbors

KNeighborsClassifier

n_neighbors

KNeighborsRegressor

n_neighbors

NuSVC

nu, kernel, gamma

NuSVR

C, kernel, gamma

SVC

C, kernel, gamma

SVR

C, kernel, gamma

DecisionTreeClassifier

ccp_alpha

DecisionTreeRegressor

ccp_alpha

MLPClassifier

hidden_layer_sizes

MLPRegressor

hidden_layer_sizes

PolynomialFeatures

degree, interaction_only

How to fix it

Specify the hyperparameters when instantiating the estimator.

Code examples

Noncompliant code example

from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier() # Noncompliant : n_neighbors is not specified, different values can change the behaviour of the predictor significantly

Compliant solution

from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier( # Compliant
    n_neighbors=5
)

Resources

Articles & blog posts

External coding guidelines