【Feature Selection】1-feature recursive elimination Recursive Feature Elimination

soundfeature selection(feature set selection) can sometimes give themouldRepresentation and Interpretation, Succinctness Bring Benefits. This paper is based on thegradient boosting machine and supportvector Machine is an example of this process.

Applied Machine Learning Using mlr3 in R - 6 Feature Selection

mlr-org - Recursive Feature Elimination on the Sonar Data Set

‘Currently, RFE works with support vector machines (SVM), decision tree algorithms and gradient boosting machines (GBM). Supported learners are tagged with the "importance" property.' (RFEs are ranked based on importance)

RFE-CVbeRFEA variant of。‘RFE-CV estimates the optimal number of features with cross-validation first. Then one more RFE is carried out on the complete dataset with the optimal number of features as the final feature set size.’

Typical RFE:

Creating an optimizer and storing the relevant parameter settings
Creating Machine Learning Tasks
Creating a Machine Learner
define the feature selection problem
commander-in-chief (military)feature selection problempass on tooptimizer
Train the final model with the best feature set on the full dataset and the test set to evaluate its performance




library(mlr3verse)




 



#1,retrieve the RFE optimizer with the fs() function.



optimizer = fs("rfe",




  n_features = 1,




  feature_number = 1,




  aggregation = "rank")




#The optimizer stops when the number of features equals n_features.



#The parameters feature_number, feature_fraction and subset_size determine the #number of features that are removed in each iteration.



 



 



#2



task = tsk("sonar")




 



#3



learner = lrn("",




  distribution = "bernoulli",




  predict_type = "prob")




 



#4



instance = fsi(




  task = task,




  learner = learner,




  resampling = rsmp("cv", folds = 6),#ResamplingStrategy 60% off cv




  measures = msr(""),#Model performance metrics for auc




  terminator = trm("none"))#Endpoint: none, because we previously set the final feature count as the endpoint.




 



#5



optimizer$optimize(instance)




 



instance$result

Visualization of the feature selection process




library(viridisLite)



library(mlr3misc)



 




data = as.data.table(instance$archive)




data[, n:= map_int(importance, length)]



 



ggplot(data, aes(x = n, y = )) +




  geom_line(



    color = viridis(1, begin = 0.5),



    linewidth = 1) +




  geom_point(



    fill = viridis(1, begin = 0.5),



    shape = 21,



    size = 3,



    stroke = 0.5,



    alpha = 0.8) +




  xlab("Number of Features") +




  scale_x_reverse() +




  theme_minimal()

Optimization path of the feature selection. We observe that the performance increases first as the number of features decreases. As soon as informative features are removed, the performance drops.

RFE-CV:

Principle: RFE-CV determines the optimal number of features by CV before filtering them.

RFE-CV estimates the optimal number of features before selecting a feature set. For this, an RFE is run in each resampling iteration and the number of features with the best mean performance is selected. Then one more RFE is carried out on the complete dataset with the optimal number of features as the final feature set size.




optimizer = fs("rfecv",




  n_features = 1,




  feature_number = 1) #no aggregation needed




 



learner = lrn("",




  type = "C-classification",




  kernel = "linear",




  predict_type = "prob")




 



instance = fsi(




  task = task,




  learner = learner,




  resampling = rsmp("cv", folds = 6),#6 fold cv to determine feature set size




  measures = msr(""),




  terminator = trm("none"),




  callback = clbk("mlr3fselect.svm_rfe"))




 



optimizer$optimize(instance)




 



 



library(ggplot2)




library(viridisLite)




library(mlr3misc)




 



data = (instance$archive)[!(iteration), ]




aggr = data[, list("y" = mean(unlist(.SD))), by = "batch_nr", .SDcols = ""]




aggr[, batch_nr := 61 - batch_nr]




 



data[, n:= map_int(importance, length)]




 



ggplot(aggr, aes(x = batch_nr, y = y)) +




  geom_line(




    color = viridis(1, begin = 0.5),




    linewidth = 1) +




  geom_point(




    fill = viridis(1, begin = 0.5),




    shape = 21,




    size = 3,




    stroke = 0.5,




    alpha = 0.8) +




  geom_vline(




    xintercept = aggr[y == max(y)]$batch_nr,




    colour = viridis(1, begin = 0.33),




    linetype = 3




  ) +




  xlab("Number of Features") +




  ylab("Mean AUC") +




  scale_x_reverse() +




  theme_minimal()




 



 



#We subset the task to the optimal feature set and train the learner.



task$select(instance$result_feature_set)




learner$train(task)




#The trained model can now be used to predict new, external data.

Estimation of the optimal number of features. The best mean performance is achieved with 19 features (blue line).