Search code examples
javaperformancecachinginfinispanreplicate

Problem of performance in replicated infinispan


I am using Infinispan replicated cache, and when I scale two instances of the application on separate machines, the performance significantly decreases. It is worth mentioning that the protocol used is tcpping. The Infinispan website states that it can be scaled up to 10 nodes. this is my jgroup config:

<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="urn:org:jgroups"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"
>
    <TCP bind_addr="${jgroups.bind_addr:192.168.206.117}"
         bind_port="${jgroups.bind_port:9092}"
         external_addr="${jgroups.external_addr}"
         external_port="${jgroups.external_port}"
         thread_pool.min_threads="0"
         thread_pool.max_threads="200"
         thread_pool.keep_alive_time="30000"

         use_virtual_threads="true"
         buffered_input_stream_size="8192"
         buffered_output_stream_size="8192"
         thread_pool.enabled="false"
         bundler_type="no-bundler"/>
    <RED/>

    <TCPPING async_discovery="true"
             initial_hosts="${jgroups.tcpping.initial_hosts:192.168.206.117[9092],172.16.93.21[9093]}"
             return_entire_cache="${jgroups.tcpping.return_entire_cache:true}"
             send_cache_on_join="true"
             port_range="${jgroups.tcp.port_range:2}"/>
    <MERGE3 min_interval="10000"
            max_interval="30000"/>
    <FD_SOCK2/>
    <FD_ALL3 timeout="40000" interval="5000"/>
    <VERIFY_SUSPECT2 timeout="1500"/>
    <BARRIER/>
    <pbcast.NAKACK2 use_mcast_xmit="false"/>
    <UNICAST3/>
    <pbcast.STABLE desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="2000"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"/>
    <!--RSVP resend_interval="2000" timeout="10000"/-->
    <pbcast.STATE_TRANSFER/>
</config>

I am searching a lot but I can not find a solution, my performance with 1 node 3000 TPS and with 2 node with 2 machine 1000 TPS


Solution

  • A quick explanation of replicated caches and their scalability: when writes are performed, these need to be propagated to all other members in the cluster. If you are using sync replication, this means that a write has to wait for an ack from the remote nodes. Using async replication means you don't have to wait, at the expense of slight staleness in the replicas. Reads are always going to be local, so as fast as they can be. Therefore, choose replicated caches for low write/high read scenarios.