How to get the most out of your data science initiatives?

9 minute read Published: 2022-03-31

“Every business is a software business” proclaimed more than 20 years ago Watts S. Humphrey, the “Father of Software Quality”. A cursory look at organizations today — whether big or small — is enough to ascertain his premonitions. In the 2020s we could even go one step further and say that “Every business is a data business.”

Due to the proliferation of both physical and virtual sensors and cheap, commoditized storage, companies are sitting on valuable data that they have already collected (or that they could easily collect). While some early adopters have used data science to make better business decisions, most companies are only now starting to realize its potential. This is no coincidence: while being data driven can bring enormous benefits to a business, obstacles abound.

We have worked with several companies as consultants for their data science initiatives and during these projects we have found that there are some key steps you can take to increase your return on investment.

Have solid data infrastructure in place

While data is much easier to collect nowadays, one does need to exercise this collection ability. And not all data is created equal — “quality” is an elusive characteristic of good data. It is safe to say that any data science initiative can at most be as good as the data it produces.

By high quality data, we understand a few things:

Work on creating a culture focused on data

Implementing data science in your company should be seen like any other change management initiative. It is important to make a compelling case that data can provide a competitive advantage and that decisions can be taken or challenged based on data. We believe that companies need proactive internal advocates focused on the benefits (and not only the risks!) of widespread data sharing. If you want to read more about this, you can check out our article series on this issue.

Another great strategy to encourage employees to have a more data centric approach is to provide them with training programs and opportunities to learn data science. As Redman and Davenport recently noted in a Harvard Business Review article, such initiatives can create “citizen data scientists”. There is a continuum that begins with Excel and culminates with sophisticated statistical models, enormous machine learning pipelines or complex A/B testing tools. Not all employees need to be working at the frontier of what we think of as “data work,” but it benefits everyone to be a bit conversant in basic statistics and the elements of computer programming.

We specifically propose a strategy that eschews drawing a strict line between “data scientists” or “engineers” and “the others” in the organizations. We believe this is good policy. For one, it recognizes the value of the many and diverse talents that need to come together to make even the deepest technical organisation work. This is not only a means to disseminate data science within the organization however. Through “citizen data science” some employees will develop their skills to a point where they can constitute a pool of internal specialists on which the organisation can then rely.

Integrate the data science team in your company

One of the biggest mistakes one can make is not properly integrating data science teams within the organization. As we already hinted, simply creating a data science department and hiring a team of professionals to fill it will not ensure success.

Your data science team needs to be able to thoroughly understand the business and work together with all the departments that might benefit from its insights. You need to be proactive in your approach. Otherwise, you run the risk of having a data science department that provides reports or analyses only when they receive requests from other departments. Data scientists are ultimately just that… scientists. They are creative types that first and foremost need to understand the business, work together with other departments, come up with their own hypotheses and test them. Seen in this light, data science is a new form of R&D, and good R&D needs deep integration with the business to succeed in developing new products, business lines or solutions.

It is desirable to try and answer important questions about the function of a data science team in advance of its creation. You want to be able to clearly communicate what the role of the team is and how it will go about doing its job. Once established, the organization should also ensure that there is ample communication between your data science teams and the other teams they will be working with.

Last but not least, we want to emphasize the importance of removing obstacles between data practitioners and actual data. Missing infrastructure is sadly not the only limitation: when data is valuable it is not surprising that “turf wars” will erupt with respect to who has access to it. The good news is that a lot of data territoriality is likely irrational. Squirreling away data not only goes against company goals, but is oftentimes a suboptimal means of advancing one’s own career — data science, like any science, is incremental and there is glory to be had in creating great datasets as well as in training great models.

Ultimately, everybody wins from sharing data to the greatest extent possible. This is prosocial behavior that should be encouraged and rewarded, for instance by including it in employee reviews.

How to get the most out of your data science initiatives

Skill development and autonomy

We already emphasized the need for continuous skill development. We want to emphasize the need to focus on communication skills in particular, as this area is often overlooked in favor of easier-to-quantify technical skills.

Managing expectations for instance is a key data science skill. Given the open-ended nature of data science, mistaken assumptions can often rush in to fill the void created by insufficient direction regarding the role, promise and limitation of data science initiatives. This reality makes expectation setting particularly important in the early stages of data science initiatives.

Communicating clearly is another skill that is learned and should be developed. Plain language is desirable, as is the ability to adapt complex information to audience and context. It is indeed hard to prescribe how exactly one ought to improve communication skills. A focus on the agency of data scientists — being accountable for things they control and understand — is perhaps the most effective high-level principle to adopt. You, the hypothetical CEO, can tell them your data scientists what you need them to do but now how to do it. They must be allowed to refine, prioritize and plan any analysis tasks that come their way. This amount of freedom is doubled by accountability for results, which ensures that data scientists can develop ownership in the success of the company.

Moving forward

Given that the judicious use of data can generate a competitive advantage for many businesses, it is likely that data will be among your top priorities. As we said in the beginning, making a company more data oriented is no easy task by any measure. We hope these thoughts help advance your thinking on the best way for your organization to benefit from the data revolution.


Looking for an efficient open-source solution to manage data in your project? Aorist is a tool for managing data for your ML project. It produces readable, intuitive code that you can inspect, edit, and run yourself. You can then focus on the hard parts while automating the repetitive parts. To get this, you just need a description of how your data is formatted and organized, and where it needs to go.

Check it out!

Technology vector created by storyset -