This rule raises an issue when a Scikit-learn model is instantiated without specifying the important hyperparameters.
When instantiating a Scikit-learn estimator, it will use default values for the hyperparameters that are not specified. Relying on the default values can lead to non-reproducible results across diffferent versions of the library.
Furthermore, the default values might not be the best choice for the specific problem at hand and can lead to suboptimal performance.
Here are the estimators and the parameters considered by this rule :
Estimator |
Hyperparameters |
AdaBoostClassifier |
learning_rate |
AdaBoostRegressor |
learning_rate |
GradientBoostingClassifier |
learning_rate |
GradientBoostingRegressor |
learning_rate |
HistGradientBoostingClassifier |
learning_rate |
HistGradientBoostingRegressor |
learning_rate |
RandomForestClassifier |
min_samples_leaf, max_features |
RandomForestRegressor |
min_samples_leaf, max_features |
ElasticNet |
alpha, l1_ratio |
NearestNeighbors |
n_neighbors |
KNeighborsClassifier |
n_neighbors |
KNeighborsRegressor |
n_neighbors |
NuSVC |
nu, kernel, gamma |
NuSVR |
C, kernel, gamma |
SVC |
C, kernel, gamma |
SVR |
C, kernel, gamma |
DecisionTreeClassifier |
ccp_alpha |
DecisionTreeRegressor |
ccp_alpha |
MLPClassifier |
hidden_layer_sizes |
MLPRegressor |
hidden_layer_sizes |
PolynomialFeatures |
degree, interaction_only |
Specify the hyperparameters when instantiating the estimator.
from sklearn.neighbors import KNeighborsClassifier clf = KNeighborsClassifier() # Noncompliant : n_neighbors is not specified, different values can change the behaviour of the predictor significantly
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier( # Compliant
n_neighbors=5
)