Search code examples
cluster-computingload-balancingpentahokettlepostgresql-9.4

Pentaho 7.1 Community Edition Cluster Unified Repository


Actually, I've successfully setup a pentaho community cluster following Pentaho Official Docs and Pentaho in High Availability deploying with tomcat 8 embedded, and using apache2 as reverse proxy.

My setup:

  • First node: pentaho server ce 7.1 embedded tomcat 8
  • Second node: pentaho server ce 7.1 embedded tomcat 8
  • Web Server: apache2 http reverse proxy
  • Database: postgresql 9.4

Each one is running in different server, no firewalls between them, same network, can guarantee no network or firewall issues.

I can start the cluster, I can ping both servers, and I can access them via reverse proxy. I can test both are working with the reverse proxy because I shutdown one of them, and the other keeps answering (losing sessions because of the sticky feature).

After installing all, I've decided to migrate manually all the users, permissions, files, and scheduled tasks. When I create a user, I can see the user created in both instances (accessing them via ip and not via reverse proxy), all is ok in this point.

But when I upload a file, or create/remove a file/folder from the repository browser, I can't see those changes in both nodes, just in the node that was the active session.

Pentaho log doesn't show any error, as far as I can see, each node has it's own file repository, so I reviewed all config files again, and can verify that all the specified in docs was changed to use PostgreSQL.

After searching, I'm tending to think that, in cluster mode, the file repository (jackrabbit) doesn't get as "unique" for all nodes, I mean, each node will continue having it's own file repository, which is a lack of time, because, my team uploads reports directly to the bi server, and they are using the reverse proxy, not accessing via ip.

I thought setting up database, it will unify the repository, so, I can get all files and folders in a database repository, not in each node.

If it is not the right approach, is there a way to use a unique filesystem repository in cluster mode?

Thanks for your attention.


Solution

  • As stated by @AlainD, finally I found the issue.

    The problem was with the config files:

    ../pentaho-server/pentaho-solutions/system/jackrabbit/repository.xml 
    

    In the jackrabbit repository config file I didn't changed the unique id value for each worker node, both worker nodes had the same id.

    I don't know exactly the consequences of this, but after setting correct value for each worker node, it started working as a charm! even used the same config files for pentaho server 8 and I was able to migrate configurations. Thank you.