any way to do feature selection for dataframe by computing its mutual information or information gain in R?

I am interested in computing mutual information entropy for all rows of dataframe (a.k.a, features) then do feature selection by looking up its MI value. In my dataset, rows are raw feature set, columns are different groups. I understand the concept of pairwise mutual information (PMI) in NLP but not quite sure MI in R. Essentially I want to make feature selection by computing its mutual information entropy. How can I do that in R? any efficient way to make this happen? Or is there any R package can do this for feature selection? Any thought would be appreciated.

reproducible data:

here is the reproducible data that can be used:

> dput(HTA20_filt_corr[1:20, 1:5])
structure(c(6.06221469449721, 3.79648446367096, 4.44302662142323, 
5.83652223195279, 2.68934375273141, 2.74561888109989, 3.79468365910661, 
2.84818282222582, 2.14058977019523, 2.6928480064245, 2.35292391447048, 
2.48476830655452, 6.53876010917445, 4.65751152599579, 3.04781583130435, 
5.77123333840058, 3.12373340327186, 2.19534644753427, 2.97565909758917, 
3.32457362519432, 5.8755020052495, 3.45024474095539, 4.3934877055859, 
5.89836406552412, 2.55675627493564, 2.70765553292035, 4.29971184424969, 
2.48325694938049, 2.26880029802564, 3.03461160119094, 2.3853610213164, 
2.28880889278209, 7.38935014141236, 5.99396449205588, 2.81020023855867, 
6.15414625452898, 2.71038534186171, 2.23803889487068, 2.83352503485538, 
3.40195667040699, 6.12613148162098, 3.62841140410044, 4.6237834519809, 
6.01979203584278, 2.61341541015611, 2.80774129091983, 3.81085169542991, 
3.2386968734862, 2.3315210232915, 2.75618624035735, 2.36292219228603, 
2.31409329648109, 6.89661896623484, 4.94260091412701, 3.30560274327296, 
5.4547259473827, 2.41056409104863, 2.26899775961818, 2.6699701841279, 
3.01459760807053, 6.1345548976595, 3.51232455992681, 4.66743523288194, 
5.98400432133011, 2.69430042092269, 2.8653583834812, 3.81895258294878, 
2.72080210986981, 2.33064119419619, 2.77388400895015, 2.46939314182722, 
2.28927162732448, 6.93808821971072, 5.63306489420911, 2.75877942216047, 
5.82872398278859, 2.92710196023309, 2.34137181372226, 2.52271243341233, 
2.96285787017003, 6.28953417729806, 3.56819306931016, 4.97483476597509, 
6.1149144301085, 2.73207812554522, 3.00137677271996, 4.03594900960396, 
2.58058159047299, 2.24052626899434, 3.2286586324064, 2.30413560438815, 
2.38147147362554, 6.58149585137493, 4.16189923349488, 2.36086328728537, 
5.57065453220316, 2.57313948725185, 2.36046878474564, 2.54370710157379, 
2.97488700289993), .Dim = c(20L, 5L), .Dimnames = list(c("1_at", 
"10_at", "100009613_at", "100009676_at", "10003_at", "100033411_at", 
"100033414_at", "100033418_at", "100033422_at", "100033423_at", 
"100033424_at", "100033425_at", "100033426_at", "100033431_at", 
"100033432_at", "100033434_at", "100033436_at", "100033437_at", 
"100033438_at", "100033439_at"), c("Tarca_001_P1A01", "Tarca_004_P1A04", 
"Tarca_005_P1A05", "Tarca_007_P1A07", "Tarca_008_P1A08")))

my trivial attempt:

require(infotheo)
apply(HTA20_filt_corr,1, mutinformation)

but I think this is not a proper way of computing mutual information and make feature selection based on that. Can anyone point me out how to make this happen? Thanks

desired output:

basically, in my expected output, original dataframe should be shrinked /filtered the features by looking up its mutual information entropy table. How can I get this done in R? any thoughts?

Solution

Mutual information is a bit like a correlation: you need at least two vectors for that. With your data, you can calculate for example the mutual information between 100009613_at and 10003_at. Or all features against all features. But first, you need to transform your data: mutual information needs to be discretized first.

mtx <- data.matrix(HTA20_filt_corr)
mtx <- t(mtx) # features in columns
mtxd <- discretize(mtx, nbins=3)

mutinformation(mtxd[,"100009613_at"], mtxd[,"10003_at"])
# [1] 0.7776613

# or, each against each
eae <- mutinformation(mtxd)

Take a look at mtxd. It is a square matrix. So, how did you want to use it for filtering the features?