Search code examples
dockerserverconfigurationjvmopengrok

Optimizing of Opengrok on large base


i have a server instance here with 4 Cores and 32 GB RAM and Ubuntu 20.04.3 LTS installed. On this machine there is an opengrok-instance running as docker container.

Inside of the docker container it uses AdoptOpenJDK:

OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421_975 (JIT enabled, AOT enabled)
OpenJ9   - b4cc246d9
OMR      - 162e6f729
JCL      - 7796c80419 based on jdk-11.0.11+9)

The code-base that the opengrok-indexer scans is 320 GB big and tooks 21 hours.

What i am figured is out was, that i've am disable the history-option it tooks lesser time. Is there a possibility to reduce this time, if the history-flag is set.

Here are my index-command:

opengrok-indexer -J=-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -J=-Djava.util.logging.config.file=/usr/share/tomcat10/conf/logging.properties -J=-XX:-UseGCOverheadLimit -J=-Xmx30G -J=-Xms30G -J=-server -a /var/opengrok/dist/lib/opengrok.jar -- -R /var/opengrok/etc/read-only.xml -m 256 -c /usr/bin/ctags -s /var/opengrok/src/ -d /var/opengrok/data --remote on -H -P -S -G -W /var/opengrok/etc/configuration.xml --progress -v -O on -T 3 --assignTags --search --remote on -i *.so -i *.o -i *.a -i *.class -i *.jar -i *.apk -i *.tar -i *.bz2 -i *.gz -i *.obj -i *.zip"

Thank you for your help in advance.

Kind Regards

Siegfried


Solution

  • You should try to increase the number of threads using the following options:

      --historyThreads number
        The number of threads to use for history cache generation on repository level. By default the number of threads will be set to the number of available CPUs.
        Assumes -H/--history.
        
      --historyFileThreads number
        The number of threads to use for history cache generation when dealing with individual files.
        By default the number of threads will be set to the number of available CPUs.
        Assumes -H/--history.
    
       -T, --threads number
        The number of threads to use for index generation, repository scan
        and repository invalidation.
        By default the number of threads will be set to the number of available
        CPUs. This influences the number of spawned ctags processes as well.
    

    Take a look at the "renamedHistory" option too. Theoretically "off" is the default option but this has a huge impact on the index time, so it's worth the check:

      --renamedHistory on|off
        Enable or disable generating history for renamed files.
        If set to on, makes history indexing slower for repositories
        with lots of renamed files. Default is off.