Search code examples
pythonajaxweb-applicationscgi

how to recycle images, but not show anyone the same image twice?


I'm writing a web app similar to wtfimages.com in that one visitor should never (or rarely) see the same thing twice, but different visitors can see the same thing. Ideally, this would span visits, so that when Bob comes back tomorrow he doesn't see today's things again either.

Three first guesses:

  • have enough unique things that it's unlikely any user will draw enough items to repeat
  • actually track each user somehow and log what he has seen
  • have client-side Javascript request things by id according to a pseudorandom sequence seeded with something unique to the visitor and session (e.g., IP and time)

Edit: So the question is, which of these three is the best solution? Is there a better one?

Note: I suspect this question is the web 2.0 equivalent of "how do I implement strcpy?", where everybody worth his salt knows K&R's idiomatic while(*s++ = *t++) ; solution. If that's the case, please point me to the web 2.0 K&R, because this specific question is immaterial. I just wanted a a "join the 21st century" project to learn CGI scripting with Python and AJAX with jQuery.


Solution

  • The simplest implementation I can think of would be to make a circular linked list, and then start individual users at random offsets in the linked list. You are guaranteed that they will see every image there is to see before they will see any image twice.

    Technically, it only needs to be a linked list in a conceptual sense. For example, you could just use the database identifiers of the various items and wrap around once you've hit the last one.


    There are complexity problems with other solutions. For example, if you want it to be a different order for each person, that requires permuting the elements in some way. But then you have to store that permutation, so as to guarantee that people see things in different orders. That's going to take up a lot of space. It will also require you to update everybody's permutations if you add or remove an image to the list of things to see, which is yet more work.

    A compromise solution that still allows you to guarantee a person sees every image before they see any image twice while still varying things among people might be something like this:

    • Using some hash function H (say, MD5), take the hash of each image, and store the image with a filename equal to the digest (e.g. 194db8c5[...].jpg).

    • Decide on a number N. This will be the number of different paths that a randomly selected person could take to traverse all the images. For example, if you pick N = 10, each person will take one of 10 possible distinct journeys through the images. Don't pick an N larger than the digest size of H (for MD5, this is 16; for SHA-1, it's 64).

    • Make N different permutations of the image list, with the ith such permutation being generated by rotating the characters in each file name i characters to the left, and then sorting all the entries. (For example, a file originally named abcdef with i == 4 will become efabcd. Now sort all the files that have been transformed in this way, and you have a distinct list.)

    • Randomly assign to each user a number r from 0 .. N - 1 inclusive. They now see the images in the ordering specified by r.

    Ultimately, this seems like a lot of work. I'd say just suck it up and make it random, accept that people will occasionally see the same image again, and move on.