Best practices in data quality management
8 minute read Published: 2022-09-01Good data quality management is expensive. Poor data quality management is “expensiver”.
Good data quality management is expensive. Poor data quality management is “expensiver”.
Walden is our reference implementation of a data warehouse. After adding instructions for its deployment on Amazon's EKS last month, we are now also supporting it on Microsoft's Azure Kubernetes Service (AKS).
We have added Alluxio to Walden, our reference implementation of a small data lake. Alluxio provides a unified view into one or more underlying storage sources, adding caching and translation on top of them. This can greatly improve overall Trino performance across queries, while also enabling support for external storage types like NFS that are not supported natively by Trino.
“Every business is a software business” proclaimed more than 20 years ago Watts S. Humphrey, the “Father of Software Quality”. A cursory look at organizations today — whether big or small — is enough to ascertain his premonitions. In the 2020s we could even go one step further and say that “Every business is a data business.”
woo.scie.nz! Many data science projects start out with a boring task: downloading your data. We don't like being bored, which is why we built the scie.nz hub.
We have built Walden, a small data lake for (mostly) solitary use, consisting of a set of configurations and images for deployment into a Kubernetes cluster. We are releasing the code as free and open source software, hoping to lower some of the barriers to entry to the world of big data and AI. Check it out on our github, or read below for more info!
The way we do science is evolving. We are building to cope with the change.