Search code examples
phparrayslaravelwhere-in

using Laravel lists function to retrieve a huge number of items


I have a Group model that belongsToMany Contacts, and Contact belongsToMany Groups, usually a group could contain about 300,000 contact, and a contact could belong to 2 or 3 groups, and I want a function that retrieves all "unique" contacts in a number of given groups, so I tried

$contacts = Contact::With(array('groups'=>function($query) use ($groups)
{
    $query->whereIn('groups.id' , $groups);
}
))->get();

It worked but the problem is that I discovered the sql generated uses where contact_id IN ( ALL CONTACTS' IDs). So, if I'm retrieving 100,000 contacts it will have 100,000 of contact ids in the where in clause.

Then I ended up using this

foreach($groups as $group)
{
   $contacts = array_unique(array_merge($contacts , $group->contacts()->lists('email'))) ;
}  

But I'm still concerned about performance. I don't know how lists works and if it is a good idea to array_unique and array_merge on arrays containing hundreds of thousands of email strings?


Solution

  • You said:

    usually a group could contain about 300,000 contact, and a contact could belong to 2 or 3 groups

    According to this your relationship should be many-to-many and both Group and Contact should use belongsToMany to make relationship between them using a pivot table contact_group:

    // Group model
    public function contacts()
    {
        return $this->belongsToMany('Contact');
    }
    
    // Contact model
    public function groups()
    {
        return $this->belongsToMany('Group');
    }
    

    The contact_group pivot table could be like this:

    id | group_id | contact_id
    

    If you have a setup like this then you'll be able to do this:

    $groups = Group::has('contacts')
                   ->with('contacts')
                   ->whereIn('id', [1,2,3]) // group ids, could be any field
                   ->get();
    
    $uniqueContacts = $groups->map(function($group) {
        return $group->contacts->lists('email');
    })->flatten()->toBase()->unique();
    
    // You got all unique contacts in $uniqueContacts
    dd($uniqueContacts); // all unique contacts
    

    Don't know how fast it'll work on 100,000 contacts but should work better.

    Update: Could be other (maybe better) ways by joining DB tables and querying from DB scope.