Data Science

Best practices in data quality management

8 minute read Published: 2022-09-01

Good data quality management is expensive. Poor data quality management is “expensiver”.

Deploying Walden on AKS

3 minute read Published: 2022-06-14

Walden is our reference implementation of a data warehouse. After adding instructions for its deployment on Amazon's EKS last month, we are now also supporting it on Microsoft's Azure Kubernetes Service (AKS).

Deploying Walden on EKS

3 minute read Published: 2022-05-17

Walden is our reference implementation of a data warehouse. We are now supporting it on Amazon's Elastic Kubernetes Service. Follow deployment instructions here, or read more information about our experience deploying a data warehouse on AWS below.

Adding Alluxio to Walden

4 minute read Published: 2022-04-18

We have added Alluxio to Walden, our reference implementation of a small data lake. Alluxio provides a unified view into one or more underlying storage sources, adding caching and translation on top of them. This can greatly improve overall Trino performance across queries, while also enabling support for external storage types like NFS that are not supported natively by Trino.

How to get the most out of your data science initiatives?

9 minute read Published: 2022-03-31

“Every business is a software business” proclaimed more than 20 years ago Watts S. Humphrey, the “Father of Software Quality”. A cursory look at organizations today — whether big or small — is enough to ascertain his premonitions. In the 2020s we could even go one step further and say that “Every business is a data business.”

Introducing the hub!

1 minute read Published: 2021-04-10! Many data science projects start out with a boring task: downloading your data. We don't like being bored, which is why we built the hub.

Introducing Walden

5 minute read Published: 2021-02-15

We have built Walden, a small data lake for (mostly) solitary use, consisting of a set of configurations and images for deployment into a Kubernetes cluster. We are releasing the code as free and open source software, hoping to lower some of the barriers to entry to the world of big data and AI. Check it out on our github, or read below for more info!


2 minute read Published: 2020-12-01

The way we do science is evolving. We are building to cope with the change.