How to Handle Long Running Jobs on HTTP

How to Handle Long Running Jobs on HTTP

There are situations where we need to perform complex or long running operations in our server while exposing it via an API. Those situations will force you to make a choice:

  • Make the client wait for the response, which make takes more than a minute. (Bad choice)
  • Handle it using a queue and return a task/job resource from which the client can query the progress and eventually the result. (Good choice)

These situations are very common, and turning a long running job into a resource will make your implementation more robust and gives you benefits in both client and server.

How can we do this?

We accomplish this using an event based architecture. Basically our Web API will create a task/job and publish it to an Event Broker. An Event Broker is like an administrator who listen the messages of other services and publish them to specific "spaces" (channels) where other services are waiting for new messages. Take a look to the image below to better understand it:

Event Broker Diagram.png

Here we have a very simple (and summarized) diagram in which a client will be requesting the server to apply a cool filter to an image.

Let's assume this process is long (which may be depending on the filter🤳) and making the client wait for it could result in HTTP timeouts or making the site freeze for several seconds.

In other to tackle this, the Web API will create a Task and send the reference (a unique id) back to the client. Then, the Web API will publish a message to the Event Broker telling the Workers (Image Resizer, Image Processing and Image Upload) that there is a new job for them. These workers can subscribe to channels/topics in the Event Broker, and when a new message arrive to one of those channels it will notify everyone who is subscribed.

The Workers will start making their job and publishing messages on the Event Broker when they finish (either if the job succeeded or failed). The Web API must be subscribed to these "result" channels and will update the Task status depending on the progress made by the workers. When all Workers finish their jobs successfully, the Task status will change to completed and the URL of the filtered image will be available for the client.

But wait, how will the client receive the new URL of the filtered image🤔? Well, there are several ways to do this.

HTTP Polling

HTTP Polling is one way to solve this. It consist of repeating an HTTP request over an interval of time. Basically we will be making a request using the Task reference to the server and if the status is not "Completed" (or "Failed") it will retry again after X seconds.

Web Sockets

Using Web Sockets the server can send an event to the client when the Task gets completed so the client will get the filtered image when it's ready. This is a better approach in terms of resource consumption but would require to add the complexity of web sockets to your application.

Either way, our client can now get the filtered image without having to make an HTTP request that takes over a minute to complete🎉.

Side notes

  • When creating the task/job resource, the correct HTTP status code is 202 Accepted. Also, a reference of the task should be sent, like an id or an uri if you are using REST.
  • Cool kids allow the jobs to be deleted from the queue using the reference received on creation. Be a cool kid.
  • Pro kids add an expiration date to tasks so they don't overwhelm the server. Be a pro kid, even if you are not a cool kid.

Conclusion

There are several patterns and architectures that handle these cases, this is just a personal preference and I would really like to read your approaches and recommendations in the comment section😁.