Search code examples
spring-bootjpacachingspring-data-jpaeclipselink

Why Is it not a good idea to store JPA entites in HashMap/ConcurrentHashMap for caching? (Concurrency Exception, Deadlocks, signal before wait)


I have a SpringBootApplication with spring-data-jpa to perform CRUD applications using eclipse-link as JPA service provider. I also have a CacheService that caches all the entities(at start up and after making any updates) and stores it in the ConcurrentHashMap.

I read various posts/ SO questions like

  1. JPA multithreading org.eclipse.persistence.exceptions.ConcurrencyException
  2. and most of google search result on this link.

which tells that entitymanager is not thread safe and entities should not be shared across threads. Some other posts mention that enitityManagerFactory on the other hand is thread safe)

Most of the questions are in the comments in below code. I have also summarized them at the bottom as well.



// Sample code 
@SpringBootApplication
public class Application {

   public static void main(String[] args){
      SpringApplication.run(Application.class, args);
   }

}

/* Sample entity class */

@Entity
@Table
public class Student {

   @Id
   // other annotations for generation strategy.
   Integer studentId;

   //other fields below with different types of association like courses student is enrolled in

}


/* Sample Repository */
@Repository
public interface StudentRepository extends JpaRepository<Student, Integer>{} 


/** Sample Service */
@Service
public class StudentServiceImpl implements StudentService{

    @Autowired
    private StudentRepository studentRepository;

    @Autowired
    private CacheService cacheService;

    @Override
    public Student createStudent(Student student)}{
        student = studentRepository.saveAndFlush(student);
        cacheService.cacheEntity(student);
    }

    @Override
    @Transactional
    public Student updateStudent(Integer studentId, StudentDTO dto){
        // getting object from cache instead of repository. 
        // 1. does cachedEntity really enforces write lock? we never went to entity manager under this @Transactional .
        // since entity manager is not really thread safe, what can be scenarios where I can run into issues like deadlock, concurrency Exception?

        // when we cached the entities we might have cached using different thread. 
        // I know that entityManagerFactory is thread-safe, but not sure about it's internals. 

        // 2. What could be scenarios when ConcurrencyManager could issue a signal to release read/write locks
        // before a wait is triggered.
        Student student = cacheService.getStudent(studentId);
        student = updateStudentFromDTO(student, dto); // copies fields from dto to student
        Student student = studentRepository.saveAndFlush(student);
        cacheService.cacheEntity(student);
    }

}


// Cache Service
@Service
public class CacheService {

    private Map<Integer, Student> studentsMap = new ConcurrentHashMap<>();

    @Autowired
    private StudentRepository studentRepository;


    @PostConstruct
    @Scheduled(initialDelay = 3600000, fixedDelay = 3600000)
    public void cacheEntities(){
        cacheStudents();
        // cache other entities below

    }

    public Student getStudent(Integer studentId){
        // what whould happen once object  is shared 
        // to another class in a thread, multiple threads can operate on same since it is no longer thread safe 
        // can it result into issues like deadlock , concurrencyException , Why and How?
        return studentsMap.get(studentId);
    }

    public void cacheEntity(Object o){
        if(obj instanceof Student){
            Student s = (Student) o;
            synchronized(this){
                studentsMap.put(s.getStudentId(), s);
                // cache linked associations 
                for(Course c: s.getCourses()){
                    cacheCourses(s.getCourses()) // imagine this function does same as cacheEntity where it caches the courses in its own HashMap.
                }
            }
        }
        // other objects caching block
    }

    public void cacheStudents(){

        List<Student> students = studentRepository.findAll();
        Map<Integer, Student> map = students.stream.collect(Collectors.toMap(Student::getStudentId, Function.identity()));

        synchronized(this){
            this.studentsMap = map;
        }

    }

}


//  RefreshService
@Service 
public class RefreshService{

    @Autowired
    EntityManagerFactory emf;

    @Autowired
    CacheService cacheService;

    public void refreshCache(){
        this.emf.getCache().evictAll();

        // 1. what can happen in the interim. Say, one entity already taken from cache 
           and other thread evits the cache. 
        // 2. Now this entity is no longer managed by entityManager, what would happen if 
           lazy fetch is performed on such entities? or user tries to save this?
        
        cacheService.cacheEntities();
    }

}


Summary of questions:

  1. What are the possible issues that could occur due to such caching?
  2. Would @Transactional annotation enforce any kind of locking on entity if we get the object from cache?
  3. What is the right way to have a custom cache of entities. (I know We could write a mapper and store dtos in hashmap but what are other options)
  4. The implication if we have multiple instance of this application and are under load-balancer? Given not all entities might be available in the cache in all instances. They might also be out of sync.

PS. I am not trying to implement a cacheService, rather trying to remove it (after realizing that why caching is not a good idea) from an existing application which has only one instance.


Solution

  • Many of your questions should be asked independently, as they are on caching in general, and don't change with JPA being involved. As the links and your post suggests, EntityManagers, and everything read from them, are not thread safe. Having a shared, single cache and shoving objects meant for a single unit-of-work architecture into it is, in short bad. Those warnings are there because there is absolutely no thread locking involved, and the container cannot help you. You can get by (I have in past projects) by ensuring everything needed is completely loaded and detached, and that the objects cached are never modified. Lazy loading things on the fly will get you indetermined behavior - an error if you are lucky. In short:

    1. What are the possible issues that could occur due to such caching?

    It'll depend on the app, but these objects and any JPA context they are tied to are not thread safe. Accessing something unfetched (lazy) on two threads is a problem area. These cached objects also hold onto JPA context objects (the EM they were read from) so can tie up more resources than you might have bargained for.

    1. Would @Transactional annotation enforce any kind of locking on entity if we get the object from cache?

    I don't see how. This just wraps the access methods in a transaction, but doesn't protect any resources it might load from other threads.

    1. What is the right way to have a custom cache of entities. (I know We could write a mapper and store dtos in hashmap but what are other options)

    You covered it yourself. Your own cache needs to cache detached objects. This is a bit more difficult with EclipseLink, which requires making a copy of the object to be safe for caching, possibly by using EclipseLink's CopyGroup or your own copy mechanism. EclipseLink uses a shared cache within the EMF though, so reads through EntityManagers already benefit if the object has been read in once before.

    1. The implication if we have multiple instance of this application and are under load-balancer? Given not all entities might be available in the cache in all instances. They might also be out of sync.

    This is pretty self explanatory - you have to maintain your caches overtop of the data. EclipseLink's shared caches have coordination mechanisms you can configure, but with your own caching, you'd need your own mechanisms unless you have one single cache for all application instances.