Search code examples
djangodjango-related-managerdjango-intermediate-table

How to select entire intermediate table in Django


In my database I have citations between objects as a ManyToMany field. Basically, every object can cite any other object.

In Postgres, this has created an intermediate table. The table has about 12 million rows, each looks roughly like:

 id | source_id | target_id 
----+-----------+-----------
 81 |    798429 |    767013
 80 |    798429 |    102557

Two questions:

  1. What's the most Django-tastic way to select this table?

  2. Is there a way to iterate over this table without pulling the entire thing into memory? I'm not sure Postgres or my server will be pleased if I do a simple select * from TABLE_FOO.


Solution

  • The solution I found to the first question was to grab the through table and then to use values_list to get a flattened result.

    So, from my example, this becomes:

    through_table = AcademicPaper.papers_cited.through
    all_citations = through_table.objects.values('source_id', 'target_id')
    

    Doing that runs the very basic SQL that I'd expect:

    print all_citations.query
    SELECT 'source_id', 'target_id' FROM my_through_table;
    

    And it returns flattened ValueList objects, which are fairly small and I can work with very easily. Even in my table with 12M objects, I was actually able to do this and put it all in memory without the server freaking out too much.

    So this solved both my problems, though I think the advice in the comments about cursors looks very sound.