-
Notifications
You must be signed in to change notification settings - Fork 886
Description
Describe the bug
When using sklearn's GridSearchCV with SequentialFeatureSelector, the configured hyperparameter values are not properly propagated to the actual classifier that is used for fitting and predicting. I put together a MWE below that is based on example 8 in the docs, the only major change is the custom classifier.
In the output listed in the docs you can see that the score doesn't change with the k parameter of the KNN, which is very strange.
While searching for similar issues I found that this has already been mentioned in multiple other issues, e.g. #456 and #511. Below you can see the unexpected behavior in the suggested approach.
Steps/Code to Reproduce
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
import mlxtend
import sklearn.base
import numpy as np
class DebugClassifier(sklearn.base.BaseEstimator):
def __init__(self, max_depth=10):
self.max_depth = max_depth
def fit(self, X, y, groups=None):
print("Fitting with max_depth =", self.max_depth)
def predict(self, X, **kwargs):
print("Predicting with max_depth =", self.max_depth)
return np.zeros(len(X))
def set_params(self, **kwargs):
print("Setting params:", kwargs)
super().set_params(**kwargs)
print("max_depth after setparams:", self.max_depth)
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=123)
clf = DebugClassifier(max_depth=10)
sfs1 = SFS(estimator=clf,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
cv=5)
pipe = Pipeline([('sfs', sfs1),
('clf', clf)])
param_grid = [
{#'sfs__k_features': [1, 4],
'sfs__estimator__max_depth': [1, 5]}
]
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=False)
# run gridearch
gs = gs.fit(X_train, y_train)Expected Results
Setting params: {'max_depth': 1}
max_depth after setparams: 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
...
Actual Results
Setting params: {'max_depth': 1}
max_depth after setparams: 1
Fitting with max_depth = 10
Predicting with max_depth = 10
Fitting with max_depth = 10
Predicting with max_depth = 10
...
As you can see, the value 1 for the hyperparameter max_depth is correctly configured for some classifier, however while fitting and predicting it appears that a different classifier is used, where the default value of max_depth=10 is still set.
Versions
MLxtend 0.18.0
Linux-5.8.0-48-generic-x86_64-with-glibc2.29
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
Scikit-learn 0.24.1
NumPy 1.20.1
SciPy 1.6.1