I have this code for table populating.
def add_tags(count):
print "Add tags"
insert_list = []
photo_pk_lower_bound = Photo.objects.all().order_by("id")[0].pk
photo_pk_upper_bound = Photo.objects.all().order_by("-id")[0].pk
for i in range(count):
t = Tag( tag = 'tag' + str(i) )
insert_list.append(t)
Tag.objects.bulk_create(insert_list)
for i in range(count):
random_photo_pk = randint(photo_pk_lower_bound, photo_pk_upper_bound)
p = Photo.objects.get( pk = random_photo_pk )
t = Tag.objects.get( tag = 'tag' + str(i) )
t.photos.add(p)
And this is the model:
class Tag(models.Model):
tag = models.CharField(max_length=20,unique=True)
photos = models.ManyToManyField(Photo)
As I understand this answer : Django: invalid keyword argument for this function I have to save tag objects first (due to ManyToMany field) and then attach photos to them through add()
. But for large count
this process takes too long. Are there any ways to refactor this code to make it faster?
In general I want to populate Tag model with random dummy data.
EDIT 1 (model for photo)
class Photo(models.Model):
photo = models.ImageField(upload_to="images")
created_date = models.DateTimeField(auto_now=True)
user = models.ForeignKey(User)
def __unicode__(self):
return self.photo.name
TL;DR Use the Django auto-generated "through" model to bulk insert m2m relationships.
"Tag.photos.through" => Django generated Model with 3 fields [ id, photo, tag ]
photo_tag_1 = Tag.photos.through(photo_id=1, tag_id=1)
photo_tag_2 = Tag.photos.through(photo_id=1, tag_id=2)
Tag.photos.through.objects.bulk_insert([photo_tag_1, photo_tag_2, ...])
This is the fastest way that I know of, I use this all the time to create test data. I can generate millions of records in minutes.
Edit from Georgy:
def add_tags(count):
Tag.objects.bulk_create([Tag(tag='tag%s' % t) for t in range(count)])
tag_ids = list(Tag.objects.values_list('id', flat=True))
photo_ids = Photo.objects.values_list('id', flat=True)
tag_count = len(tag_ids)
for photo_id in photo_ids:
tag_to_photo_links = []
shuffle(tag_ids)
rand_num_tags = randint(0, tag_count)
photo_tags = tag_ids[:rand_num_tags]
for tag_id in photo_tags:
# through is the model generated by django to link m2m between tag and photo
photo_tag = Tag.photos.through(tag_id=tag_id, photo_id=photo_id)
tag_to_photo_links.append(photo_tag)
Tag.photos.through.objects.bulk_create(tag_to_photo_links, batch_size=7000)
I didn't create the model to test, but the structure is there you might have to tweaks some stuff to make it work. Let me know if you run into any problems.
[edited]