Search code examples
cachingpersistenceignite

Ignite persistence performance hit and metrics


I am trying out native persistence in Apache Ignite. My setup is currently local, single node cluster. I enabled it by adding this property in my data region

<property name="persistenceEnabled" value="true"/>

My full data region configuration is as follows

<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
  <property name="name" value="dr.local.input.trade"/>
  <property name="persistenceEnabled" value="true"/>
  <property name="metricsEnabled" value="true"/>
  <property name="initialSize" value="#{200 * 1024 * 1024}"/>
  <property name="maxSize" value="#{500 * 1024 * 1024}"/>
  <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
</bean>

Now the entries are being persisted, i.e if I shutdown Ignite and restart it then my data comes back inside the cache.

I am seeing significant performance hit. Around 35% increased put operation latency compared to non-persisted data region. I have referred to Ignite persistence tuning page. From that I have singled out below properties and their properties

Property Value
WAL Modes LOG_ONLY
walCompactionLevel 3
walCompationEnabled true
writeThrottlingEnabled true
checkpointBufferSize 512 mb
checkpointFrequency 5 minutes

Is there anything more that I can tune? Is the performance hit I mentioned above is typical or can it be lowered much more?

Also I tried seeing JMX metrics related to persistence using JConsole. I was checking metrics under org.apache.368239c8.ignitelocal."Persistent Store". All metrics mentioned under this are showing as 0. Data is surely persisted, I can see in Ignite work dir and WAL dir. Am I looking at wrong metrics? Please help.

Attaching entire Ignite config below.

<?xml version="1.0" encoding="UTF-8"?>

<!--
Generated by Chef for ignite1.intranet.com
-->

<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xmlns="http://www.springframework.org/schema/beans"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/util
        http://www.springframework.org/schema/util/spring-util.xsd">

  <bean id="propertyConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
    <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_FALLBACK"/>
    <property name="searchSystemEnvironment" value="true"/>
  </bean>

  <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <!-- Set to true to enable distributed class loading for examples, default is false. -->
    <property name="sslContextFactory">
      <bean class="org.apache.ignite.ssl.SslContextFactory">
          <property name="keyStoreFilePath" value="/home/sysSvcDevOps/ssl/ignite1.keystore.jks"/>
          <property name="keyStorePassword" value="KeyStore443"/>
          <property name="keyStoreType" value="jks"/>
          <property name="trustStoreFilePath" value="/home/sysSvcDevOps/ssl/cacerts/java.cacerts.jks"/>
          <property name="trustStorePassword" value="changeit"/>
          <property name="trustStoreType" value="jks"/>
      </bean>
    </property>
    <property name="igniteInstanceName" value=".dev"/>
    <property name="consistentId" value="ignite1.dev"/>
    <property name="workDirectory" value="/apps/Svc/dev/Ignite/IgniteData/persistentstore/work"/>

      <property name="rebalanceThreadPoolSize" value="8"/>
      <property name="publicThreadPoolSize" value="32"/>
      <property name="systemThreadPoolSize" value="64"/>
      <property name="queryThreadPoolSize" value="64"/>
      <property name="failureDetectionTimeout" value="30000"/>
      <property name="authenticationEnabled" value="true"/>
      <property name="metricsUpdateFrequency" value="30000"/>
      <property name="peerClassLoadingEnabled" value="false"/>
      <property name="clientMode" value="false"/>

    <!-- Enable task execution events for examples. -->
    <property name="includeEventTypes">
      <list>
        <util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_STARTED"/>
        <util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_STOPPED"/>
        <util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST"/>
        <util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_NODES_LEFT"/>
      </list>
    </property>

    <property name="dataStorageConfiguration">
      <bean class="org.apache.ignite.configuration.DataStorageConfiguration">

        <property name="walSegmentSize" value="1073741824"/>
          <property name="walSegments" value="20"/>
          <property name="maxWalArchiveSize" value="10737418240"/>
          <property name="walCompactionEnabled" value="true"/>
          <property name="walCompactionLevel" value="4"/>
          <property name="checkpointFrequency" value="300000"/>
          <property name="checkpointThreads" value="16"/>
          <property name="checkpointReadLockTimeout" value="60000"/>
          <property name="lockWaitTime" value="45000"/>
          <property name="checkpointWriteOrder" value="RANDOM"/>
          <property name="pageSize" value="4096"/>
          <property name="writeThrottlingEnabled" value="true"/>

        <!-- wal storage paths -->
        <property name="walPath" value="/apps/Svc/dev/Ignite/IgniteData"/>
        <property name="walArchivePath" value="/apps/Svc/dev/Ignite/IgniteDataArchive"/>
        <property name="storagePath" value="/apps/Svc/dev/Ignite/IgniteData/archive"/>

        <property name="dataRegionConfigurations">
          <list>
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="dr.dev.referencedata"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="initialSize" value="1073741824"/>
                    <property name="maxSize" value="4294969673"/>
                    <property name="checkpointPageBufferSize" value="1073741824"/>
                </bean>
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="dr.dev.input"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="metricsEnabled" value="true"/>
                    <property name="checkpointPageBufferSize" value="#{4 * 1024 * 1024 * 1024}"/>
                    <property name="initialSize" value="12884901888"/>
                    <property name="maxSize" value="81604378624"/>
                    <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
                </bean>
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="dr.dev.input.exception"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="metricsEnabled" value="true"/>
                    <property name="checkpointPageBufferSize" value="#{4 * 1024 * 1024 * 1024}"/>
                    <property name="initialSize" value="4294967296"/>
                    <property name="maxSize" value="21474836480"/>
                    <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
                </bean>
                <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="dr.dev.output"/>
                    <property name="initialSize" value="1073741824"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="metricsEnabled" value="true"/>
                    <property name="checkpointPageBufferSize" value="#{2 * 1024 * 1024 * 1024}"/>
                    <property name="maxSize" value="2147483648"/>
                </bean>
          </list>
        </property>

        <property name="defaultDataRegionConfiguration">
          <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
              <property name="name" value="default_region"/>
              <property name="persistenceEnabled" value="true"/>
              <property name="initialSize" value="268435456"/>
              <property name="maxSize" value="268435456"/>

          </bean>
        </property>
      </bean>
    </property>

    <property name="discoverySpi">
      <bean class="org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi">
        <property name="zkConnectionString" value="zk1.intranet.com:22001,zk2.intranet.com:22001"/>
          <property name="zkRootPath" value="/ignite"/>
          <property name="sessionTimeout" value="120000"/>
          <property name="joinTimeout" value="10000"/>
      </bean>
    </property>

    <property name="communicationSpi">
      <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
          <property name="socketWriteTimeout" value="60000"/>
      </bean>
    </property>

    <property name="cacheConfiguration">
      <list>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="referenceDataCacheTemplate*"/>
              <property name="cacheMode" value="REPLICATED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.referencedata"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="statisticsEnabled" value="true"/>
              <property name="sqlIndexMaxInlineSize" value="203"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>


              </bean>
            </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="inputMetadataCacheTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.input"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="statisticsEnabled" value="true"/>
              <property name="readFromBackup" value="false"/>
              <property name="sqlIndexMaxInlineSize" value="211"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
              <property name="expiryPolicyFactory">
                <bean class="javax.cache.expiry.ModifiedExpiryPolicy" factory-method="factoryOf">
                  <constructor-arg>
                    <bean class="javax.cache.expiry.Duration">
                      <constructor-arg value="DAYS"/>
                      <constructor-arg value="5"/>
                    </bean>
                  </constructor-arg>
                </bean>
              </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="inputReconCacheTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.input"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="statisticsEnabled" value="true"/>
              <property name="readFromBackup" value="false"/>
              <property name="sqlIndexMaxInlineSize" value="211"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
              <property name="expiryPolicyFactory">
                <bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
                  <constructor-arg>
                    <bean class="javax.cache.expiry.Duration">
                      <constructor-arg value="DAYS"/>
                      <constructor-arg value="4"/>
                    </bean>
                  </constructor-arg>
                </bean>
              </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="inputExceptionsCacheTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.input.exception"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="statisticsEnabled" value="true"/>
              <property name="readFromBackup" value="false"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
              <property name="expiryPolicyFactory">
                <bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
                  <constructor-arg>
                    <bean class="javax.cache.expiry.Duration">
                      <constructor-arg value="DAYS"/>
                      <constructor-arg value="15"/>
                    </bean>
                  </constructor-arg>
                </bean>
              </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="outputDataCacheTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.output"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="sqlSchema" value=""/>
              <property name="statisticsEnabled" value="true"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
              <property name="expiryPolicyFactory">
                <bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
                  <constructor-arg>
                    <bean class="javax.cache.expiry.Duration">
                      <constructor-arg value="DAYS"/>
                      <constructor-arg value="450"/>
                    </bean>
                  </constructor-arg>
                </bean>
              </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="reconAuditDataCacheTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.referencedata"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="sqlSchema" value=""/>
              <property name="statisticsEnabled" value="true"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="fileDataCacheTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.input"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="statisticsEnabled" value="true"/>
              <property name="queryParallelism" value="4"/>
              <property name="eagerTtl" value="true"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="256"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
              <property name="expiryPolicyFactory">
                <bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
                  <constructor-arg>
                    <bean class="javax.cache.expiry.Duration">
                      <constructor-arg value="DAYS"/>
                      <constructor-arg value="5"/>
                    </bean>
                  </constructor-arg>
                </bean>
              </property>
          </bean>
          <bean id="cache-template-bean" abstract="true"
                class="org.apache.ignite.configuration.CacheConfiguration">
              <property name="name" value="shortLivedReferenceDataTemplate*"/>
              <property name="cacheMode" value="PARTITIONED"/>
              <property name="backups" value="1"/>
              <property name="atomicityMode" value="ATOMIC"/>
              <property name="dataRegionName" value="dr.dev.input.exception"/>
              <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
              <property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
              <property name="statisticsEnabled" value="true"/>
              <property name="managementEnabled" value="true"/>
            <property name="affinity">
              <bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
                  <property name="partitions" value="64"/>

                <property name="affinityBackupFilter">
                          <bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
                            <constructor-arg>
                                                  <array value-type="java.lang.String">
                                                  <value>RACK_ID</value>
                                                     </array>
                             </constructor-arg>
                          </bean>
                </property>

              </bean>
            </property>
              <property name="expiryPolicyFactory">
                <bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
                  <constructor-arg>
                    <bean class="javax.cache.expiry.Duration">
                      <constructor-arg value="DAYS"/>
                      <constructor-arg value="2"/>
                    </bean>
                  </constructor-arg>
                </bean>
              </property>
          </bean>

      </list>
    </property>

    <property name="sqlSchemas">
      <list>
        <value>dataInput</value>
      </list>
    </property>

  </bean>
</beans>

Solution

  • Speaking of the possible performance drop on writes.

    In comparison to a pure in memory mode, the following disk interactions happen on updates:

    • in addition to a page modification in RAM, Ignite needs to provide consistency guarantees depending on your WAL mode, but unless it's not disabled, every update must be written to a WAL file. No data is flushed on disk yet; modification happens only in memory + WAL record is written.

    • Once you have too many dirty pages in RAM, or a timeout occurs, Ignite starts a checkpointing process flushing dirty pages on disk to the partition files on disk.

    • If WAL becomes too big, Ignite might perform segments rotation by copying them to a WAL archive to free up the space for new WAL updates.

    As you can see, there are at least 3 major disk-related operations, meaning that it's crucial to have really fast disks for /wal, /walarchive and /db mounted folders. Again, it all depends on your use case, but in general it's strongly recommended to have the fastest available disks for WAL-related activity.

    Possible performance drop on reads.

    Again, it depends on a scenario, but if you can put all your data in memory (as it was before you turned persistence on), you will not see any performance differences.

    It should be noted that after a restart, there will be no data in RAM at the start and Ignite must preload them first, i.e. to do a warm-up.

    But, if you have more data than your configured data region size, a page replacement will take place rotating the data from and to disk. Worse scenario: say, you have a 10 GB RAM data region and 11 GB dataset. And you want to scan your data twice in alphabetical order.

    There was no data in RAM yet; imagine that you did a restart. Ignite starts to read data from the disk and populate the data pages in memory. Imagine that after the letter W, our in-memory data set became full, and page rotation is required to load the remaining W-Z data. In that case, the oldest pages need to be evicted - meaning that, say, A-D chunk needs to go to the disk to load W-Z data instead. So, your in-memory data set is now something like W-Z, E-V. If we are going to make the same scan query, the whole data set needs to be replaced similarly.

    Enable persistence metrics.

    Check that you have the following property in your data region configuration, more details here.

    <property name="metricsEnabled" value="true"/>
    

    Also, there is no need for

    <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
    

    It's only for non-persistent regions.