In my database I have citations between objects as a ManyToMany field. Basically, every object can cite any other object.
In Postgres, this has created an intermediate table. The table has about 12 million rows, each looks roughly like:
id | source_id | target_id
----+-----------+-----------
81 | 798429 | 767013
80 | 798429 | 102557
Two questions:
What's the most Django-tastic way to select this table?
Is there a way to iterate over this table without pulling the entire thing into memory? I'm not sure Postgres or my server will be pleased if I do a simple select * from TABLE_FOO
.
The solution I found to the first question was to grab the through table and then to use values_list
to get a flattened result.
So, from my example, this becomes:
through_table = AcademicPaper.papers_cited.through
all_citations = through_table.objects.values('source_id', 'target_id')
Doing that runs the very basic SQL that I'd expect:
print all_citations.query
SELECT 'source_id', 'target_id' FROM my_through_table;
And it returns flattened ValueList objects, which are fairly small and I can work with very easily. Even in my table with 12M objects, I was actually able to do this and put it all in memory without the server freaking out too much.
So this solved both my problems, though I think the advice in the comments about cursors looks very sound.