Search code examples
hiveparquetminiotrinoalluxio

Trino with Spilling and Alluxio Caching makes a large number of requests to the Object Storage


grafana minio requests counters

I have a local setup of Trino, Hive Metastore, and Minio storage. I have enabled and configured Alluxio caching and Disk Spilling on Trino. The number of requests made to the object storage is higher than expected. Given that I am only testing on a few megabytes of Parquet files.

What could be the problem? and the solution?

Here are my configurations in /etc/trino/config.properties .

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
catalog.management=${ENV:CATALOG_MANAGEMENT}
query.max-memory=2GB
query.max-memory-per-node=700MB
exchange.http-client.max-requests-queued-per-destination=999999
scheduler.http-client.max-requests-queued-per-destination=999999
exchange.http-client.request-timeout=30s
task.info-update-interval=2s
spill-enabled=true
spiller-spill-path=/tmp/spill
spiller-max-used-space-threshold=0.7
spiller-threads= 16
max-spill-per-node=100GB
query-max-spill-per-node=100GB
aggregation-operator-unspill-memory-limit=15MB
spill-compression-codec=LZ4
spill-encryption-enabled=false

Here are my catalog configurations in /etc/trino/catalog/hive.properties

connector.name=hive
hive.metastore=thrift
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.path-style-access=true
hive.s3.endpoint=http://minio:9000
hive.s3.aws-access-key=XXX
hive.s3.aws-secret-key=XXX
hive.non-managed-table-writes-enabled=true
hive.s3.ssl.enabled=false
hive.s3.max-connections=1000
hive.metastore.thrift.client.read-timeout=3000s
hive.timestamp-precision=MILLISECONDS
hive.collect-column-statistics-on-write=false
hive.storage-format=PARQUET
hive.security=allow-all
fs.cache.enabled=true
fs.cache.directories=/tmp/cache
fs.cache.max-disk-usage-percentages=70
fs.cache.ttl=32d
fs.cache.preferred-hosts-count=5
fs.cache.page-size=15MB

Thanks in advance.


-- Edit: Test with both cach and spilling disabled


Disabling caching and spilling, has affected the latency and throughput, and caused far more GetObject requests:

grafana minio requests from trino without alluxio or spilling


-- Edit: Cache Tracing --


I have enabled cache tracing, and the cache is being hit.

Cache-hit Alluxio Trino Cache-hit Alluxio Trino Cache-hit Alluxio Trino Cache-hit Alluxio Trino

Instruction to enable tracing were found at: File System Cache-Monitoring

Setup in Docker Compose is simply:



  jaeger:
    image: jaegertracing/all-in-one:latest
    hostname: jaeger
    ports:
      - "16686:16686"
      - "4317:4317"
    environment:
      - COLLECTOR_OTLP_ENABLED=true

And /etc/trino/config.properties

################################# Tracing
tracing.enabled=true
tracing.exporter.endpoint=http://jaeger:4317

-- Edit: Provide more context Part II--


Based on Slack chat, the following has been confirmed:

  • Alluxio is used internally in Trino, so no need to follow further tutorials on setting Alluxio standalone/edge etc.
  • Caching is supported only on workers, but still my original problem happens when I am using one master and three workers, and I revised the configurations.
  • I have disabled spilling, and still the Minio traffic is still high.

On Trino startup, I get:

024-05-08T09:14:37.105Z INFO    main    Bootstrap   hive.s3.security-mapping.refresh-period              ----        ----                  How often to refresh the security mapping configuration
2024-05-08T09:14:37.106119355Z 2024-05-08T09:14:37.105Z INFO    main    Bootstrap   hive.s3.security-mapping.iam-role-credential-name    ----        ----                  Name of the extra credential used to provide IAM role
2024-05-08T09:14:37.106120682Z 2024-05-08T09:14:37.105Z INFO    main    Bootstrap   jmx.base-name                                        ----        ----
2024-05-08T09:14:37.237666777Z 2024-05-08T09:14:37.237Z INFO    main    alluxio.client.file.cache.PageStore Opening PageStore with option=alluxio.client.file.cache.store.PageStoreOptions@780e214b
2024-05-08T09:14:37.256328387Z 2024-05-08T09:14:37.256Z INFO    pool-51-thread-1    alluxio.client.file.cache.LocalCacheManager Restoring PageStoreDir (/tmp/alluxio_cache/LOCAL)
2024-05-08T09:14:37.257456265Z 2024-05-08T09:14:37.257Z INFO    pool-51-thread-1    alluxio.client.file.cache.LocalCacheManager PageStore (/tmp/alluxio_cache/LOCAL) restored with 0 pages (0 bytes), discarded 0 pages (0 bytes)
2024-05-08T09:14:37.257498782Z 2024-05-08T09:14:37.257Z INFO    pool-51-thread-1    alluxio.client.file.cache.LocalCacheManager Cache is in READ_WRITE.
2024-05-08T09:14:37.552235866Z 2024-05-08T09:14:37.551Z INFO    main    org.ishugaliy.allgood.consistent.hash.HashRing  Ring [hash_ring_8605] created: hasher [METRO_HASH], partitionRate [1000]
2024-05-08T09:14:37.566095594Z 2024-05-08T09:14:37.565Z INFO    main    org.ishugaliy.allgood.consistent.hash.HashRing  Ring [hash_ring_8605]: node [TrinoNode[nodeIdentifier=5b97f235d043, hostAndPort=172.23.0.12:8080]] added
2024-05-08T09:14:37.721904967Z 2024-05-08T09:14:37.721Z DEBUG   main    io.trino.connector.CoordinatorDynamicCatalogManager -- Added catalog hive using connector hive --
2024-05-08T09:14:37.724651195Z 2024-05-08T09:14:37.724Z INFO    main    io.trino.security.AccessControlManager  Using system access control: default
2024-05-08T09:14:37.735207694Z 2024-05-08T09:14:37.734Z INFO    main    io.trino.server.Server  Server startup completed in 14.36s
2024-05-08T09:14:37.735276670Z 2024-05-08T09:14:37.734Z INFO    main    io.trino.server.Server  ======== SERVER STARTED ========

-- Edit: Provide more context Part I--


The test simply runs a few simple queries in a loop, so regardless of partitioning or queries joins cardinalities, the caching should have been triggered.


By "expected" I meant, I would have expected a 10-megabyte dataset not to cause 60 GB of traffic, as in the screenshot below.


I assume that the Alluxio cache is bypassed altogether, so I doubt that it is not only a matter of configuration, as one might infer from: Trino: File System Cache


By consulting more references, it's clear that more steps should be included:


Besides, I doubt compatibility issues, since I am on Trino 445. The Alluxio docs mentions that the step were only tested against an earlier version of Trino:

Deploy Trino. This guide is tested with Trino-352.

Here is a distribution of the requests trino makes to Minio bypassing any caching, I include it in case it hints Trino team to an optimization idea, especially head requests are triggered very often:

Trino API Distribution


Finally, after following the steps in the above mentioned tutorials, the following error occurs:

trino> CREATE SCHEMA hive.lakehouse
    ->     WITH (location = 'alluxio://alluxio-leader:19998/lakehouse');
Query 20240508_071745_00002_vyeug failed: Invalid location URI: alluxio://alluxio-leader:19998/lakehouse

Logs:

2024-05-08 09:17:45 2024-05-08T07:17:45.210Z    DEBUG   dispatcher-query-4      io.trino.security.AccessControl Invocation of checkCanSetUser(principal=Optional[trino], userName='trino') succeeded in 27.47us
2024-05-08 09:17:45 2024-05-08T07:17:45.210Z    DEBUG   dispatcher-query-4      io.trino.security.AccessControl Invocation of checkCanExecuteQuery(identity=Identity{user='trino', principal=trino}, queryId=20240508_071745_00002_vyeug) succeeded in 13.49us
2024-05-08 09:17:45 2024-05-08T07:17:45.211Z    DEBUG   dispatcher-query-5      io.trino.execution.QueryStateMachine    Query 20240508_071745_00002_vyeug is QUEUED
2024-05-08 09:17:45 2024-05-08T07:17:45.211Z    DEBUG   dispatcher-query-1      io.trino.execution.QueryStateMachine    Query 20240508_071745_00002_vyeug is WAITING_FOR_RESOURCES
2024-05-08 09:17:45 2024-05-08T07:17:45.212Z    DEBUG   dispatcher-query-1      io.trino.execution.QueryStateMachine    Query 20240508_071745_00002_vyeug is DISPATCHING
2024-05-08 09:17:45 2024-05-08T07:17:45.212Z    DEBUG   dispatcher-query-1      io.trino.execution.QueryStateMachine    Query 20240508_071745_00002_vyeug is RUNNING
2024-05-08 09:17:45 2024-05-08T07:17:45.213Z    DEBUG   Query-20240508_071745_00002_vyeug-172   io.trino.security.AccessControl Invocation of checkCanCreateSchema(context=SecurityContext{identity=Identity{user='trino', principal=trino}, queryId=20240508_071745_00002_vyeug}, schemaName=hive.lakehouse, properties={location=alluxio://alluxio-leader:19998/lakehouse}) succeeded in 68.04us
2024-05-08 09:17:45 2024-05-08T07:17:45.223Z    DEBUG   Query-20240508_071745_00002_vyeug-172   io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient Invocation of getDatabase(name='lakehouse') took 7.32ms and failed with NoSuchObjectException(message:database hive.lakehouse)
2024-05-08 09:17:45 2024-05-08T07:17:45.224Z    DEBUG   dispatcher-query-1      io.trino.execution.QueryStateMachine    Query 20240508_071745_00002_vyeug is FAILED
2024-05-08 09:17:45 2024-05-08T07:17:45.224Z    DEBUG   Query-20240508_071745_00002_vyeug-172   io.trino.execution.QueryStateMachine    Query 20240508_071745_00002_vyeug failed
2024-05-08 09:17:45 io.trino.spi.TrinoException: Invalid location URI: alluxio://alluxio-leader:19998/lakehouse
2024-05-08 09:17:45     at io.trino.plugin.hive.HiveMetadata.lambda$createSchema$22(HiveMetadata.java:954)
2024-05-08 09:17:45     at java.base/java.util.Optional.map(Optional.java:260)
2024-05-08 09:17:45     at io.trino.plugin.hive.HiveMetadata.createSchema(HiveMetadata.java:949)
2024-05-08 09:17:45     at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.createSchema(ClassLoaderSafeConnectorMetadata.java:417)
2024-05-08 09:17:45     at io.trino.tracing.TracingConnectorMetadata.createSchema(TracingConnectorMetadata.java:348)
2024-05-08 09:17:45     at io.trino.metadata.MetadataManager.createSchema(MetadataManager.java:769)
2024-05-08 09:17:45     at io.trino.tracing.TracingMetadata.createSchema(TracingMetadata.java:373)
2024-05-08 09:17:45     at io.trino.execution.CreateSchemaTask.internalExecute(CreateSchemaTask.java:128)
2024-05-08 09:17:45     at io.trino.execution.CreateSchemaTask.execute(CreateSchemaTask.java:82)
2024-05-08 09:17:45     at io.trino.execution.CreateSchemaTask.execute(CreateSchemaTask.java:54)
2024-05-08 09:17:45     at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:146)
2024-05-08 09:17:45     at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:272)
2024-05-08 09:17:45     at io.trino.dispatcher.LocalDispatchQuery.startExecution(LocalDispatchQuery.java:145)
2024-05-08 09:17:45     at io.trino.dispatcher.LocalDispatchQuery.lambda$waitForMinimumWorkers$2(LocalDispatchQuery.java:129)
2024-05-08 09:17:45     at io.airlift.concurrent.MoreFutures.lambda$addSuccessCallback$12(MoreFutures.java:570)
2024-05-08 09:17:45     at io.airlift.concurrent.MoreFutures$3.onSuccess(MoreFutures.java:545)
2024-05-08 09:17:45     at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137)
2024-05-08 09:17:45     at io.trino.$gen.Trino_445____20240508_071639_2.run(Unknown Source)
2024-05-08 09:17:45     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-05-08 09:17:45     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
2024-05-08 09:17:45     at java.base/java.lang.Thread.run(Thread.java:1570)
2024-05-08 09:17:45 Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "alluxio"
2024-05-08 09:17:45     at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3553)
2024-05-08 09:17:45     at io.trino.hdfs.TrinoFileSystemCache.createFileSystem(TrinoFileSystemCache.java:155)
2024-05-08 09:17:45     at io.trino.hdfs.TrinoFileSystemCache$FileSystemHolder.createFileSystemOnce(TrinoFileSystemCache.java:298)
2024-05-08 09:17:45     at io.trino.hdfs.TrinoFileSystemCache.getInternal(TrinoFileSystemCache.java:140)
2024-05-08 09:17:45     at io.trino.hdfs.TrinoFileSystemCache.get(TrinoFileSystemCache.java:91)
2024-05-08 09:17:45     at org.apache.hadoop.fs.ForwardingFileSystemCache.get(ForwardingFileSystemCache.java:39)
2024-05-08 09:17:45     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
2024-05-08 09:17:45     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
2024-05-08 09:17:45     at io.trino.hdfs.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:110)
2024-05-08 09:17:45     at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
2024-05-08 09:17:45     at io.trino.hdfs.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:109)
2024-05-08 09:17:45     at io.trino.hdfs.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:102)
2024-05-08 09:17:45     at io.trino.filesystem.hdfs.HdfsFileSystem.directoryExists(HdfsFileSystem.java:254)
2024-05-08 09:17:45     at io.trino.filesystem.manager.SwitchingFileSystem.directoryExists(SwitchingFileSystem.java:117)
2024-05-08 09:17:45     at io.trino.filesystem.cache.CacheFileSystem.directoryExists(CacheFileSystem.java:104)
2024-05-08 09:17:45     at io.trino.filesystem.tracing.TracingFileSystem.lambda$directoryExists$5(TracingFileSystem.java:119)
2024-05-08 09:17:45     at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
2024-05-08 09:17:45     at io.trino.filesystem.tracing.TracingFileSystem.directoryExists(TracingFileSystem.java:119)
2024-05-08 09:17:45     at io.trino.plugin.hive.HiveMetadata.lambda$createSchema$22(HiveMetadata.java:951)
2024-05-08 09:17:45     ... 20 more
2024-05-08 09:17:45 
2024-05-08 09:17:45 
2024-05-08 09:17:45 2024-05-08T07:17:45.225Z    INFO    dispatcher-query-1      io.trino.event.QueryMonitor     TIMELINE: Query 20240508_071745_00002_vyeug :: FAILED (INVALID_SCHEMA_PROPERTY) :: elapsed 12ms :: planning 12ms :: waiting 0ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2024-05-08T07:17:45.211Z :: end 2024-05-08T07:17:45.223Z
2024-05-08 09:17:45 2024-05-08T07:17:45.430Z    DEBUG   http-client-node-manager-63     io.trino.connector.CatalogPruneTask     Pruned catalogs on server: http://172.21.0.14:8080/v1/task/pruneCatalogs

The alluxio jars, I have even added more in a trial and error one by one, are included in both of Trino and Hive metastore containers:

[trino@4515602f2e82 hive]$ pwd
/usr/lib/trino/plugin/hive

[trino@4515602f2e82 hive]$ ls -ll | grep allu
-rwxrwxrwx  1 root  root         0 May  8 06:48 alluxio-2.9.3-client.jar
-rwxrwxrwx  1 root  root  90338926 Mar 24  2023 alluxio-2.9.3-hadoop2-client.jar
-rw-r--r--  4 trino trino   519152 Jan  1  1980 alluxio-core-client-fs-312.jar
-rw-r--r--  4 trino trino  1499627 Jan  1  1980 alluxio-core-common-312.jar
-rw-r--r--  4 trino trino  6097283 Jan  1  1980 alluxio-core-transport-312.jar
-rwxrwxrwx  1 root  root  90338925 May  8 07:06 alluxio-shaded-client-2.9.3.jar
-rw-r--r--  4 trino trino    34723 Jan  1  1980 trino-filesystem-cache-alluxio-445.jar

And


Solution

  • I would suggest to NOT mix spilling and file system caching for starters, they are not designed to work together. Beyond that I would say that it completely depends on what your queries are, what data they have to access, how your files and partitions are structured and how you define "expected". So I really cant answer with any more detail at this stage.