I am doing a search engine project using vector space model for which i need to create a Term-Document Matrix and then apply SVD on it.
Should i have the term as the row and document as columns ?
I am doing it in java so it should be like :
count[ keywordList.size() ] [ listOfFilesinCorpus.length ];
or Should it be the other way round?. I need to pass this 2D array to apache commons math's :
RealMatrix A = Array2DRowRealMatrix(TDM) ;
where TDM is the term document matrix.
I need the terms as the dimensions and then i will compare the documents in the vector space . Please help , Thank you .
It doesn't really matter, you can always switch between the two through transposition !
But usually, rows are terms and columns are documents