Search code examples
pythoninformation-retrievalinverted-index

Inverted Index System using Python


I am working on building an inverted index using Python.

I am having some doubts regarding the performance it can provide me.

Would Python be almost equally as fast in indexing as Java or C?

Also, I would like to know if any modules/implementations exists (and what are they, some link please?) for the same and how well do they perform compared to the something developed in Java/C?

I read about this guy who optimized his Python twice as fast as C by using it with Psyco.

I know for a fact that this is misleading since gcc 3.x compilers are like super fast. Basically, my point is I know Python won't be faster than C. But is it somewhat comparable? And can someone shed some light on its performance compared with Java? I have no clue about that. (In terms of inverted index implementation, if possible because it would essentially require disk write and reads.)

I am not asking this here without googling first. I didn't get a definite answer, hence the question.

Any help is much appreciated!


Solution

  • I don't believe you are expected to see much difference between languages for inverted index, since the bottle neck there is usually IO [disk access!]

    If you want some existing implementations that help you to index information, have a look at Apache Lucene for java and its python version: PyLucene