This rule raises an issue when random number generators do not specify a seed parameter.
Data science and machine learning tasks make extensive use of random number generation. It may, for example, be used for:
To ensure that results are reproducible, it is important to use a predictable seed in this context.
The preferred way to do this in numpy is by instantiating a Generator object, typically through
numpy.random.default_rng, which should be provided with a seed parameter.
Note that a global seed for RandomState can be set using numpy.random.seed or numpy.seed, this will set the
seed for RandomState methods such as numpy.random.randn. This approach is, however, deprecated and Generator
should be used instead. This is reported by rule {rule:python:S6711}.
In contexts that are not related to data science and machine learning, having a predictable seed may not be the desired behavior. Therefore, this rule will only raise issues if machine learning and data science libraries are being used.
To fix this issue, provide a predictable seed to the random number generator.
import numpy as np
def foo():
generator = np.random.default_rng() # Noncompliant: no seed parameter is provided
x = generator.uniform()
import numpy as np
def foo():
generator = np.random.default_rng(42) # Compliant
x = generator.uniform()
To fix this issue, provide a predictable seed to the estimator or the utility function.
from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris X, y = load_iris(return_X_y=True) X_train, _, y_train, _ = train_test_split(X, y) # Noncompliant: no seed parameter is provided
from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris import numpy as np rng = np.random.default_rng(42) X, y = load_iris(return_X_y=True) X_train, _, y_train, _ = train_test_split(X, y, random_state=rng.integers(1)) # Compliant
numpy.random.Generator should be preferred to numpy.random.RandomState