This rule raises an issue when the pandas.DataFrame.values is used instead of the pandas.DataFrame.to_numpy() method.
The values attribute and the to_numpy() method in pandas both provide a way to return a NumPy representation of the
DataFrame. However, there are some reasons why the to_numpy() method is recommended over the values
attribute:
values attribute is considered a legacy feature, while the to_numpy() is
the recommended method to extract data and is considered more future-proof. DataFrame has columns with different data types, NumPy will choose a common data
type that can hold all the data. This may lead to loss of information, unexpected type conversions, or increased memory usage. The
to_numpy() allows you to select the common type manually, passing the dtype argument. values attribute can return a view or a copy of the data depending on whether the data needs to
be transposed. This can lead to confusion when modifying the extracted data. On the other hand, to_numpy() has copy
argument allowing to force it always to return a new NumPy array, ensuring that any changes you make won’t affect the original
DataFrame. to_numpy() allows to specify the default value used for missing values in the
DataFrame, while the values will always use numpy.nan for missing values. Use the to_numpy() method instead of the values attribute to get a NumPy representation of the
DataFrame.
import pandas as pd
df = pd.DataFrame({
'X': ['A', 'B', 'A', 'C'],
'Y': [10, 7, 12, 5]
})
arr = df.values # Noncompliant: using the 'values' attribute is not recommended
import pandas as pd
df = pd.DataFrame({
'X': ['A', 'B', 'A', 'C'],
'Y': [10, 7, 12, 5]
})
arr = df.to_numpy() # Compliant