I am working on building an inverted index using Python.
I am having some doubts regarding the performance it can provide me.
Would Python be almost equally as fast in indexing as Java or C?
Also, I would like to know if any modules/implementations exists (and what are they, some link please?) for the same and how well do they perform compared to the something developed in Java/C?
I read about this guy who optimized his Python twice as fast as C by using it with Psyco.
I know for a fact that this is misleading since gcc 3.x compilers are like super fast. Basically, my point is I know Python won't be faster than C. But is it somewhat comparable? And can someone shed some light on its performance compared with Java? I have no clue about that. (In terms of inverted index implementation, if possible because it would essentially require disk write and reads.)
I am not asking this here without googling first. I didn't get a definite answer, hence the question.
Any help is much appreciated!
I don't believe you are expected to see much difference between languages for inverted index, since the bottle neck there is usually IO [disk access!]
If you want some existing implementations that help you to index information, have a look at Apache Lucene for java and its python version: PyLucene