Three obstacles before the data scientist can get to work
3 April 2017
Once the company has definitively embarked on the culture of Big Data, the possibilities open by the data analytics are immense. The coexistence between analysts and specialists in a company within mixed teams involves starting out on a journey that will ideally culminate in the opening of new lines of business. Results don’t sprout up from one day to the next, but data science makes once seemingly unattainable milestones feasible.
Before buckling down to work, the data scientist first must overcome three obstacles.
1. Access to data
Many companies may amass huge amounts of customer data, but the nature of their services includes restrictions related to security and privacy. This presents a ‘chicken and egg’ type of dilemma: as a condition for access to data, management will want to know the potential value it can bring to the company. No matter how much the analyst may sound off about this, the real benefits for the company cannot be demonstrated if the necessary data cannot be accessed.
How can we get out of this quandary? One way of doing so is by pressing on through scaled models which progressively show the management team the benefits analytics can bring. Access to a sample of data will help create a model that solves a specific problem. A small-scale study of specific customers, which can trigger a decision with immediate impact on the company, is a good starting point. Once the management team can verify the model’s suitability, by applying it to immediate decisions, the first step will have been taken.
In this scenario, choosing a suitable problem that has a visible impact on the business is crucial. Therefore, the analyst needs to show their skills, intuition, and knowledge of the business. It goes without saying that a model built from a limited sample will have limited significance, but it is a requirement to fling open the doors of data.
2. Technological means
Having overcome the first obstacle, the next one appears: having the necessary technological infrastructure to support access to data, analysis, and the exploration of results.
It’s not about looking for a culprit if such means are not available: there might not be anybody in the organization cognizant of the impact that data analysis can have on the business. But, this path offers no shortcuts: if this work isn’t done, someone will have to deal with it.
A further problem that often comes up is the decentralization of data. With disaggregated departments and dispersed databases, each with its own access and security protocols, the data scientist, sometimes with the help of an engineer, will have to focus on grouping the data in one place, before they can even get to work.
3. Human resource management
Part of data science, like any other science, is exploration. And exploration calls for a great deal of inspiration and the lowest possible number of strict orders that stifle creativity.
Passion, perseverance, and curiosity are qualities required in this type of work and are often not compatible with rigid organizational structures. Therefore, managers must be patient and understanding, and always within the varying pressure dictated by financial results, should grant the data scientist the necessary time and freedom to move forward with his or her investigation. Once the balance has been achieved between what motivates employees and the business’s priorities, the results should start to appear.
From data to decision… if nothing goes wrong
Once the data is available, the data scientist generally undertakes a scaled process. They will have to devote much of their time to cleaning the data and then set off on a route that begins with small samples and will end, if all goes well, with the extraction of useful conclusions based on a predictive model.
If all goes well… Because data science is not a foolproof process. As in any research project, there are no absolute certainties. Therefore, we must be prepared for possible failure, however hard it may be for companies with high expectations and often do not consider the lack of results to assume.
In projects involving vast databases, it’s not always necessary to use all the data. Therefore, it is important to scale: starting with a manageable database and setting up a permanent dialogue with the person or department most interested in the project. Then, once a small insight into the potential scope has been gained, scaling can begin.
The road to this point is sometimes littered with issues related to decision-making: the focus of the investigation, the data to be used, the analytics to be used… Technical knowledge does not guarantee the success of specific projects, always subject to unforeseen circumstances that are not covered in training centers.
The ratio between available information and decisions is very unbalanced towards the first. The process of transforming data into decisions may lead to swathes of information being lost, and the way the process is transmitted plays a role in this journey. An important decision for the company cannot be conveyed if it is not backed up with solid arguments about the source of this conclusion, which data has been used and which processes have been followed to analyze this information and turn it into the gold nugget that is the decision.
This article is part of the study “Data Scientists: Who are they? What do they do? How do they work?“, available on Rebel Thinking.
Other articles from the study: