Process of Attribute Subset Selection
The brute force approach can be very expensive in which each subset (2^n possible subsets) of the data having ‘n’ attributes can be analyzed. The best way to do the task is to use the statistical significance tests such that best (or worst) attributes can be recognized. Statistical significance test assumes that attributes are independent of one another. This is a kind of greedy approach in which a significance level is decided (statistically ideal value of significance level is 5%) and the models are tested again and again until p-value (probability value) of all attributes is less than or equal to the selected significance level. The attributes having p-value higher than significance level are discarded. This procedure is repeated again and again until all the attribute in data set has p-value less than or equal to the significance level. This gives us the reduced data set having no irrelevant attributes.
Attribute Subset Selection in Data Mining
Attribute subset Selection is a technique which is used for data reduction in data mining process. Data reduction reduces the size of data so that it can be used for analysis purposes more efficiently.