java python reinforcement-learning openai-gym stable-baselines

Reinforcement Learning - Custom environment implementation in Java for Python RL framework

I have a bunch of Java code that constitutes an environment and an agent. I want to use one of the Python reinforcement learning libraries (stable-baselines, tf-agents, rllib, etc.) to train a policy for the Java agent/environment. And then deploy the policy on the Java side for production. Is there standard practice for incorporating other languages into Python RL libraries? I was thinking of one of the following solutions:

Wrap Java env/agent code into REST API, and implement custom environment in Python that calls that API to step through the environment.
Use Py4j to invoke Java from Python and implement custom environment.

Which one would be better? Are there any other ways?

Edit: I ended up going the former - deploying a web server that encapsulates the environments. works quite well for me. Leaving the question open in case there is a better practice to handle this kind of situations!

Solution

The first approach is fine. RLLib implemented it the same way for the PolicyServerInput. Which is used for external Envs. https://github.com/ray-project/ray/blob/82465f9342cf05d86880e7542ffa37676c2b7c4f/rllib/env/policy_server_input.py

So take a look into their implementation. It uses Python data serialization, so I guess an own impl would be best to connect to Java.