Search code examples
phpgearman

How can I get the list of scheduled jobs from Gearman?


I am currently evalutuating Gearman to farm out some expensive data import jobs in our backend. So far this looks very promising. However there is one piece missing that I just can't seem to find any info about. How can I get a list of schedules jobs from Gearman?

I realize I can use the admin protocol to get the number of currently queued jobs for each function, but I need info about the actual jobs. There is also the option of using a persistent queue (eg. MySQL) and query the database for the jobs, but it feels pretty wrong to me to circumvent Gearman for this kind of information. Other than that, I'm out of ideas.

Probably I don't need this at all :) So here's some more background on what I want to do, I'm all open for better suggestions. Both the client and the worker run in PHP. In our admin interface the admins can trigger a new import for a client; as the import takes a while it is started as a background task. Now the simple questions I want to be able to answer: When was the last import run for this client? Is an import already queued for this client (in that case triggering a new import should have no effect)? Nice to have: At which position in the queue is this job (so I can make an estimate on when it will run)?

Thanks!


Solution

  • The Admin protocol is what you'd usually use, but as you've discovered, it won't list the actual tasks in the queue. We've solved this by keeping track of the current tasks we've started in our application layer, and having a callback in our worker telling the application when the task has finished. This allows us to perform cleanup, notification etc. when the task has finished, and allows us to keep this logic in the application and not the worker itself.

    Relating to progress the best way is to just use the built-in progress mechanics in Gearman itself, in the PHP module you can call this by using $job->sendStatus(percentDone, 100). A client can then retrieve this value from the server using the task handle (which will be returned when you start the job). That'll allow you to show the current progress to users in your interface.

    As long as you have the current running tasks in your application, you can use that to answer wether there are similar tasks already running, but you can also use gearman's built-in job coalescing / de-duplication; see the $unique parameter when adding the task.

    The position in the current queue will not be available through Gearman, so you'll have to do this in your application as well. I'd stay away from asking the Gearman persistence layer for this information.