mRMR (minimum Redundancy Maximum Relevance Feature Selection)

* Data file (Standard CSV file format, where each row is a sample and each column is a variable/attribute/feature. MAKE SURE YOUR DATA IS SEPARATED BY COMMA, BUT NOT BLANK SPACE OR OTHER CHARACTERS!! The first row must be the feature names, and the first column must be the classes for samples. You may download a testing example data set here,  which is microrray data of lung cancer (7 classes). The data has been discretized as 3-states. Note that the web-based program can only accept a data file with the maximum size 2M bytes, and maximum number of variables = 10000 -- if you have a larger data set, you should download the program and run on your own machine (see download links below).

* What is the feature selection scheme you want to use:

* How many features you want to select:

* What is the type of your data? "Categorical" means each attribute/variable/feature in your data is discretized as a few categorical states. "Continuous" means these attributes take numerical values. If you choose "Categorical" then the last option below will have no effect. For mutual information based feature selection methods like this web-version of mRMR, you might want to discretize your own data first as a few categorical states, -- empirically this leads to better results than continuous-value mutual information computation. You can also use the option below to discretize your data using two thresholds mean+/-k*std.

* In case you have continuous variables in your data file, what is the threshold you want to discretize your data, i.e. mean +/- k*std:

"Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,"

Hanchuan Peng, Fuhui Long, and Chris Ding
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 27, No. 8, pp.1226-1238, 2005. [PDF]

"Minimum redundancy feature selection from microarray gene expression data,"

Chris Ding, and Hanchuan Peng,
Journal of Bioinformatics and Computational Biology,
Vol. 3, No. 2, pp.185-205, 2005. [PDF]

(A conference version with a different set of results, but the same title, also appeared on:
Proc. 2nd IEEE Computational Systems Bioinformatics Conference (CSB 2003),
pp.523-528, Stanford, CA, Aug, 2003. [PDF])

(Names of Genes selected.)

Go Back to Hanchuan Peng's web site.
Site Meter