We have added Alluxio to Walden, our reference implementation of a small data lake. Alluxio provides a unified view into one or more underlying storage sources, adding caching and translation on top of them. This can greatly improve overall Trino performance across queries, while also enabling support for external storage types like NFS that are not supported natively by Trino.
Since announcing Walden over a year ago, we have been running it in our own on-prem cluster and iteratively improving it along the way. As a part of this, we've recently added Alluxio into the Walden deployment. When exploring datasets, we frequently run queries against the same datasets repeatedly, and so it makes sense to have some form of caching to improve query performance. Meanwhile, we also wanted to enable Trino access to multi-TB external storage that's being served over NFS. Trino has support for several storage types but NFS is not natively supported.
Alluxio provides both caching an expanded filesystem support, with the mild tradeoff of introducing a translation layer between Trino and underlying storage. In Walden, each Trino worker is colocated with an Alluxio node that is used to handle access and caching for that worker. Using the Alluxio CLI/API, we can then mount external volumes into Alluxio's internal filesystem tree, including both NFS and block storage. These volumes are then automatically accessible by Trino and the Hive metastore. As Trino workers access data through Alluxio, it is automatically cached, making further queries against the data extremely fast while reducing load on the underlying storage.
This has worked well for us so far and has solved several problems via a single service.
One caveat that we have found is that if we directly edit the underlying storage after it has been mounted into Alluxio, we need to manually refresh Alluxio's view of the NFS volume using the
alluxio fs checkConsistency command, often with two calls needed to resolve the contents of a new directory.
Another option would be to only make changes via Alluxio, but in our case it's overall simpler to just write to NFS directly.
As a workaround we just run the
checkConsistency command as a part of our ingest process.
Walden now includes Alluxio in the default configuration, but there are other options as well.
For making NFS volumes visible to Alluxio, an alternate solution is to run an unreplicated instance of Minio that serves the NFS mount. When Minio is run in unreplicated mode, it will directly serve/proxy the underlying directory structure. Any changes to the underlying files are immediately reflected in Minio, without requiring a manual update command as is needed with Alluxio. Minio's replicated mode meanwhile enables a custom Erasure Coding storage format that would not be suitable for directly serving existing files as-is. Running Minio as a single instance can represent a bottleneck for the Trino workers since all access to the archive must come through the single Minio instance. A workaround for this could be to run multiple unreplicated Minio instances colocated with the Trino workers, where each worker is paired with a dedicated Minio instance.
For caching frequently accessed data, Trino itself supports caching data locally on the worker disk. We haven't investigated this solution yet, but we expect it would have comparable performance improvements to using Alluxio's caching when using e.g. dedicated NVMe drives. Unlike Alluxio's explicit RAM cache pool, this solution would instead only implicitly have in-memory caching via the kernel page cache. However that may be preferable to manually allocating a large chunk of memory to Alluxio on each worker.
We are always looking for ways to improve our data analysis stack and may investigate these solutions as a possible alternative to Alluxio in the future.