Search code examples
javaspring-bootasynchronousamazon-ec2distributed-computing

How to track progress status of async tasks running in multiple servers


I have multiple async tasks running in spring boot.These tasks read an excel file and insert all that data into the database.

The task is started when a request is made from the front-end. The front-end then periodically keeps requesting for the progress status of the task.

I need to track the progress of each of these tasks and know when they are completed.

This is the controller file that takes in requests for tasks and for polling their progress status:

public class TaskController {

    @RequestMapping(method = RequestMethod.POST, value = "/uploadExcel")
    public ResponseEntity<?> uploadExcel(String excelFilePath) {
        String taskId = UUID.randomUUID().toString();
        taskAsyncService.AsyncManager(id, excelFilePath);

        HashMap<String, String> responseMap = new HashMap<>();
        responeMap.put("taskId",taskId);
        return new ResponseEntity<>(responseMap, HttpStatus.ACCEPTED);
    }

    // This will be polled to get progress of tasks being executed
    @RequestMapping(method = RequestMethod.GET, value = "/tasks/progress/{id}")
    public ResponseEntity<?> getTaskProgress(@PathVariable String taskId) {
        HashMap<String, String> map = new HashMap<>();

        if (taskAsyncService.containsTaskEntry(id) == null) {
            map.put("Error", "TaskId does not exist");
            return new ResponseEntity<>(map, HttpStatus.BAD_REQUEST);
        }

        boolean taskProgress = taskAsyncService.getTaskProgress(taskId);

        if (taskProgress) {
            map.put("message", "Task complete");
            taskAsyncService.removeTaskProgressEntry(taskId);
            return new ResponseEntity<>(map, HttpStatus.OK);
        }

        //Otherwise task is still running
        map.put("progressStatus", "Task running");
        return new ResponseEntity<>(map, HttpStatus.PARTIAL_CONTENT);

    }
}

This is the code that executes the async tasks.

public class TaskAsyncService {
    private final AtomicReference<ConcurrentHashMap<String, Boolean>> isTaskCompleteMap = new AtomicReference<ConcurrentHashMap<String, Boolean>>();

    protected boolean containsTaskEntry(String taskId) {
        if (isTaskCompleteMap.get().get(taskId) != null) {
            return true;
        }
        return false;
    }

    protected boolean getTaskProgress(String taskId, String excelFilePath) {
        return isTaskCompleteMap.get().get(taskId);
    }

    protected void removeTaskProgressEntry(String taskId) {
        if (isTaskCompleteMap.get() != null) {
            isTaskCompleteMap.get().remove(taskId);
        }
    }

    @Async
    public CompletableFuture<?> AsyncManager(String taskId) {
        HashMap<String, String> map = new HashMap<>();

        //Add a new entry into isTaskCompleteMap
        isTaskCompleteMap.get().put(taskId, false);

        //Insert excel rows into database

        //Task completed set value to true
        isTaskCompleteMap.get().put(taskId, true);
        map.put("Success", "Task completed");

        return CompletableFuture.completedFuture(map);
    }
}

I am using AWS EC2 with a load balancer. Therefore, sometimes a polling request gets handled by a newly spawned server which cannot access the isTaskCompleteMap and returns saying that "TaskId does not exist".

How do I track the status of the tasks in this case? I understand i need a distributed data structure but don't understand of what kind and how to implement it.


Solution

  • You can use Hazelcast or similar distributed solutions(Redis, etc).

    maps - https://docs.hazelcast.org/docs/3.0/manual/html/ch02.html#Map

    1. Use distributed map from hazelcast instead of CHM.
    2. Get from such map should return task even if they are processing on another pod(server)