praznik-package {praznik} | R Documentation |
Praznik is a collection of feature selection filters performing greedy optimisation of mutual information-based usefulness criteria.
In a nutshell, an algorithm of this class requires an information system (X,Y) and a predefined number of features to selected k, and works like this. To start, it estimates mutual information between each feature and the decision, find a feature with maximal such score and stores it as a first on a list of selected features, S. Then, it estimates a value of a certain criterion function J(X,Y,S) for each feature X; this function also depends on Y and the set of already selected features S. As in the first step, the previously unselected feature with a greatest value of the criterion function is selected next. This is repeated until the method would gather k features, or, in case of some methods, when no more informative features can be found. The methods implemented in praznik consider the following criteria.
The mutual information maximisation filter, MIM
, simply selects top-k features of best mutual information, that is
J_{MIM}=I(X;Y).
The minimal conditional mutual information maximisation proposed by F. Fleauret, CMIM
, uses
J_{CMIM}(X)=\min(I(X;Y),\min_{W\in S} I(X;Y|W));
this method is also effectively identical to the information fragments method.
The minimum redundancy maximal relevancy proposed by H. Peng et al., MRMR
, uses
J_{MRMR}=I(X;Y)-\frac{1}{|S|}∑_{W\in S} I(X;W).
The joint mutual information filter by H. Yang and J. Moody, JMI
, uses
J_{JMI}=∑_{W\in S} I(X,W;Y).
The double input symmetrical relevance filter by P. Meyer and G. Bontempi, DISR
, uses
J_{DISR}(X)=∑_{W\in S} \frac{I(X,W;Y)}{H(X,W,Y)}.
The minimal joint mutual information maximisation filter by M. Bennasar, Y. Hicks and R. Setchi, JMIM
, uses
J_{JMIM}=\min_{W\in S} I(X,W;Y).
The minimal normalised joint mutual information maximisation filter by the same authors, NJMIM
, uses
J_{NJMIM}=\min_{W\in S} \frac{I(X,W;Y)}{H(X,W,Y)}.
While CMIM
, JMIM
and NJMIM
consider minimal value over already selected features, they may use a somewhat more sophisticated and faster algorithm.
The package also provides functions for scoring features.
miScores
returns
I(X;Y).
cmiScores
returns, for a given condition vector Z,
I(X;Y|Z).
jmiScores
returns
I(X,Z;Y).
njmiScores
returns
\frac{I(X,Z;Y)}{H(X,Y,Z)}.
Estimation of mutual information and its generalisations is a hard task; still, praznik aims at speed and simplicity and hence only offers basic, maximum likelihood estimator applicable on discrete data. For convenience, praznik automatically and silently coerces non-factor inputs into about ten equally-spaced bins, following the heuristic often used in literature.
Additionally, praznik has a limited, experimental support for replacing entropic statistics with Gini impurity-based; in such framework, entropy is replaced by Gini impurity
g(X):=1-∑_x p_x^2,
which leads to an impurity gain
G(X;Y):=g(Y)-E(g(Y)|X)=∑_{xy}\frac{p_{xy}^2}{p_x}-∑_{y} p_y^2,
a counterpart of mutual information or information gain. It does not possess most of elegant properties of mutual information, yet values of both are usually highly correlated; moreover, Gini gain is computationally easier to calculate, hence it often replaces MI in performance-sensitive applications, like optimising splits in decision trees.
In a present version, praznik includes impScores
for generating values of G for all features (an analog of miScores
, as well as JIM
, a Gini gain-based feature selection method otherwise identical to JMI
.
Maintainer: Miron B. Kursa M.Kursa@icm.edu.pl (0000-0001-7672-648X)
"Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection" G. Brown et al. JMLR (2012).
Useful links: