Search code examples
gotwitterconcurrencygoroutine

Cancelling user specific goroutines


I have an apps(web apps) that let user sign in using twitter oauth and provide automatic tweet deletion functionalities. After the user logged in into the web app I will spin up a goroutines (via REST api) for each user that will delete list of users tweets.

let say there will be 100 users with 500++ tweets per user:

  • how do I stop the deletion go routines in the middle of deletion process.

    eg: user 30 is requesting to stop deletion of tweets after initiating the delete process for 2mins (this should be done via an API call to my apps).

  • what is the best practices of creating go routines in order to maximize the performance of the apps considering the http request and twitter API limit. should I create go routines per user or implement worker pool?

info: I am using anaconda for the twitter client backend


Edit:

I have found a way to implement this using map with context. Here's the code for reference. credit to https://gist.github.com/montanaflynn/020e75c6605dbe2c726e410020a7a974

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "sync"
    "time"
)

// a concurrent safe map type by embedding sync.Mutex
type cancelMap struct {
    sync.Mutex
    internal map[string]context.CancelFunc
}

func newCancelMap() *cancelMap {
    return &cancelMap{
        internal: make(map[string]context.CancelFunc),
    }
}

func (c *cancelMap) Get(key string) (value context.CancelFunc, ok bool) {
    c.Lock()
    result, ok := c.internal[key]
    c.Unlock()
    return result, ok
}

func (c *cancelMap) Set(key string, value context.CancelFunc) {
    c.Lock()
    c.internal[key] = value
    c.Unlock()
}

func (c *cancelMap) Delete(key string) {
    c.Lock()
    delete(c.internal, key)
    c.Unlock()
}

// create global jobs map with cancel function
var jobs = newCancelMap()

// the pretend worker will be wrapped here
// https://siadat.github.io/post/context
func work(ctx context.Context, id string) {

    for {
        select {
        case <-ctx.Done():
            fmt.Printf("Cancelling job id %s\n", id)
            return
        case <-time.After(time.Second):
            fmt.Printf("Doing job id %s\n", id)
        }
    }
}

func startHandler(w http.ResponseWriter, r *http.Request) {

    // get job id and name from query parameters
    id := r.URL.Query().Get("id")

    // check if job already exists in jobs map
    if _, ok := jobs.Get(id); ok {
        fmt.Fprintf(w, "Already started job id: %s\n", id)
        return
    }

    // create new context with cancel for the job
    ctx, cancel := context.WithCancel(context.Background())

    // save it in the global map of jobs
    jobs.Set(id, cancel)

    // actually start running the job
    go work(ctx, id)

    // return 200 with message
    fmt.Fprintf(w, "Job id: %s has been started\n", id)
}

func stopHandler(w http.ResponseWriter, r *http.Request) {

    // get job id and name from query parameters
    id := r.URL.Query().Get("id")

    // check for cancel func from jobs map
    cancel, found := jobs.Get(id)
    if !found {
        fmt.Fprintf(w, "Job id: %s is not running\n", id)
        return
    }

    // cancel the jobs
    cancel()

    // delete job from jobs map
    jobs.Delete(id)

    // return 200 with message
    fmt.Fprintf(w, "Job id: %s has been canceled\n", id)
}

func main() {
    http.HandleFunc("/start", startHandler)
    http.HandleFunc("/stop", stopHandler)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Solution

  • You can't stop a goroutine from the outside, the goroutine has to support the cancellation operation. For details, see: cancel a blocking operation in Go. Common means to support cancellation is channels and the context package.

    As to which is better for you, that's too broad. That depends on many things, but for an example / reference, the standard lib's HTTP server serves each incoming HTTP request in its own goroutine, and has a decent performance.

    If you have a high request rate, it might worth creating and using a goroutine pool (or use a 3rd party lib / router that does this), but it really depends on your actual code, you should measure / profile your app to decide if it's needed or if it's worth it.

    Generally we can say that if the work that each goroutine does is "big" compared to the overhead of the creation / scheduling a goroutine requires, it is usually cleaner to just use a new goroutine for it. Accessing 3rd party services in a goroutine such as the Twitter API will likely have orders of magnitude more work and delay than launching a goroutine, so you should be fine launching a goroutine for each of those (without a performance penalty).