Search code examples
javamultithreadingspring-bootconcurrency

How can a simple Spring Boot app (local filesystem listing REST API) be made thread-safe?


I am writing a test Spring Boot app and I want to build it to be thread-safe from ground-up. For this question, let's assume app is a simple REST API which returns a list of file and directory names from local OS filesystem where app resides based on specified path (provided by user as GET parameter when invoking the REST API).

I appreciate horizontal scaling could be achieved using containers/kubernates, event/queue based architectures and other such approaches - however I am not interested in using those methods at present (unless you folks suggest this is the only elegant solution to my question). Therefore, please assume platform is JVM running on a single multicore (linux) OS instance/server.

@RestController
public class myController {

    FileService fileService;

    /**RestController method to return JSON formatted list of file & directory 
     *names contained within a given path when 
     *http://[server:port]/rest/browse?path=[UserSpecifiedPath] 
     is requested by client**/

    @GetMapping("/rest/browse")
    public List<Java.IO.File> browseFiles(@RequestParam(value="path",required=false) String pathName) {

        return fileService.list(pathName);

    }
}

@Service
public class FileService {

    //Service method to return alphabetically sorted list of non-hidden files & directories
    public List<Java.IO.File> list(String pathName) {

        return Arrays.asList(Arrays.stream(new File(pathName).listFiles())
                .parallel()
                .filter(file -> !file.getName()
                        .startsWith("."))
                .sorted(Comparator.comparing(File::getName))
                .toArray(File[]::new));
    }
}

The code to actually return the sorted list of files & dirs is quite dense and leans on Java's Arrays Collection, as well as a lambda function. I am unfamiliar with the underlying code of the Arrays collection (and how to reason about its functionality) as well as the way the lambda function will interact with it. I am keen to limit the use of synchronized/locking to resolve this issue, as I wish FileService() to be as parallelizable as possible.

    My concern is related to FileService:
  • I have instantiated FileService as a singleton (thanks to Spring Boot's default behaviour)
  • Spring's Controller/servlet is multithreaded insofar as each request has at least one thread
  • FileService's use of the Arrays Collection code, together with the lambda function does on a new IO.File object to populate a List not appear to me to be atomic
  • Therefore, multiple threads representing multiple requests could be executing different portions of fileService at once, creating unpredictable results
  • Even if Spring Boot framework somehow handles this particular issue behind the scenes, if I want to add some hitherto unwritten additional concurrency to the controller or other part of app in future, I will still have a fileService.list that is not thread safe and my app will therefore produce unpredictable results due to multiple threads messing with the instantiated File object in fileService.list()

The above represents my best attempt at reasoning about why my code has problems and is possibly stateful. I appreciate there are gaps in my knowledge (clearly, I could do a deep dive into Arrays Collection and lambda function) and I likely do not fully understand the concept of state itself to an extent and getting my self twisted-up over nothing. I have always found state to be a bit confusing given even supposedly stateless languages must store state somewhere (in memory, an application has to store its variables at some point, as they are passed between operations).

Is my reasoning above correct? How can I write FileService to be stateless?

EDIT To answer my own question based on the answers provided by others, FileService is stateless and therefore thread-safe. The variables on which it operates are either local variables, method parameters or return statements which are all thread safe. When a given thread calls the method, these objects are stored in each thread's stack. Even if some of the logic in FileService is not atomic, it does not matter for aforementioned reason.


Solution

  • Therefore, multiple threads representing multiple requests could be executing different portions of fileService at once,

    Correct.

    creating unpredictable results

    No. Each thread has its own method invocation, which has its own local variables, which reference their own objects. Since each thread uses its own objects, threads don't interact at all, and can not possibly interfere with each other's work, making this code trivially thread safe.

    Put differently, thread safety issues only arise when several threads use the same object. If they use different objects (or the objects are immutable), the code is trivially thread safe.

    As Spring abstracts away the creation of each new controller object and its own multithreading, I struggled to understand correctly what was going on with respect to threads and method invocation in the controller

    By default, a controller is application scoped, so the same controller object will be shared by all threads.

    To put it another way, for each chain of logic emanating out of a Spring Controller, I can basically not concern myself with how per-request threads are behaving.

    Not quite. If these threads modify shared objects, you are responsible for synchronizing access. It's just that usually, you will not share any mutable objects, trivially fullfilling this requirement. In your case, the Controller object is shared, but not mutable, while the Lists, Streams, Arrays, and File objects are not shared, so there aren't any mutable shared objects.

    However, if you were to optimize the performance of your FileService with an in-memory cache such as

    @Service
    class FileService {
        Map<String, File[]> cache = new HashMap<>(); // danger
    
        public List<File> list(String pathName) {
            var result = cache.get(pathName);
            if (result == null) {
                result = ...;
                cache.put(pathName, result);
            }
            return result;
        }
    }
    

    all threads would share the same controller object, and the same FileService, and therefore the same Map, requiring you to synchronize access to the Map somehow.