I'm trying to implement the following functionality in Go.
I have a web page with a form, used to upload a .csv file. Gorilla mux is used to route to a handler which takes the file and parses it, does a bunch of operations on the data and at the end produces a report with number of lines parsed, # of rejected lines, etc.
My problem is that even though it works on my machine, on a server Apache will time out before I can get to the end of it all: the file upload itself isn't the issue, but I have to wait for transformations on the data to complete.
I've tried to use Gorilla websocket to get feedback from the process (incrementing number of lines parsed and treated, for instance) and keep the connection open, but this is a POST request, and Gorilla websocket won't upgrade from http to websocket unless one has a GET request.
I'm not even sure I'm on the right track with websockets for doing this type of thing.
I can have a goroutine for the processing itself and return the handler before the goroutine completes, but then how do I show the result of the process in the UI?
So at this stage my question boils down to: what would be the best way, in Go, when you need to:
A clue as to the right direction to go in would be much appreciated.
You've stumbled onto a non-trivial problem. There are a lot of possible solutions, with different user experiences, implementation complexities, and side effects. This is a pretty big topic so this answer is intended mostly as a starting point for further research.
First, pretty much regardless of solution, you're going to have to give each long-running task a unique ID that the browser can use to get status updates later. The task runner itself can just flag jobs as complete, or it can periodically issue progress updates if you want to present progress to the user.
The easiest to implement is likely to have your form submission immediately respond with a page, with the task ID included in the URL, whose handler checks the task status and either a) returns a page with "still working" or something to that effect and auto-refreshes after a few seconds, or b) returns a page saying "completed" and does not refresh. This isn't terribly difficult to implement, but it's not particularly smooth, either. If this is a simple internal-use project with simple UX and operational requirements, I'd just do this. Otherwise, further down the rabbit hole we go!
You could do live updates without reloading the page by a few different methods:
Either option will require both a handler to serve the status update information and some JavaScript wizardry on the front-end to call the handler, parse the response, and update the page.
Depending on the scale and requirements of this service, there are some side-effects to consider; mainly that a long-running task is effectively a kind of application state, making your application stateful, which has some severe operational downsides when it comes to availability, scaling, and deployment. If you're running multiple load-balanced instances you'll have to use sticky sessions or share task status between instances somehow.
The most common way to handle long-running tasks at scale is to separate the worker from the web application, using some kind of work queue (either in a database or a dedicated message broker like Rabbit or Kafka) to manage the tasks. This makes it a little more complicated to get status updates because you're working across processes, but it gives you a lot more flexibility operationally.
I'm guessing this is a more complicated answer than you expected to "requests are timing out", but this is a case of a trivial issue with a non-trivial solution. You're certainly not alone in tackling this issue; researching handling long-running tasks in web applications will yield a ton of information you can leverage.