Search code examples
javasearch-enginesvd

Creating a term document matrix in java for LSI implementation


I am doing a search engine project using vector space model for which i need to create a Term-Document Matrix and then apply SVD on it.

Should i have the term as the row and document as columns ?

I am doing it in java so it should be like :

count[ keywordList.size() ] [ listOfFilesinCorpus.length ];

or Should it be the other way round?. I need to pass this 2D array to apache commons math's :

RealMatrix A = Array2DRowRealMatrix(TDM) ;

where TDM is the term document matrix.

I need the terms as the dimensions and then i will compare the documents in the vector space . Please help , Thank you .


Solution

  • It doesn't really matter, you can always switch between the two through transposition !

    But usually, rows are terms and columns are documents