
How to evaluate the data scientist’s work
10 April 2017

Once a project related to data science has developed enough within an organization, the time to measure the results of the data scientist’s work comes. How to do it?
First, we must take the time horizon into account: benefits are never seen in the short term. The data scientist develops a predictive model, whose execution depends on whether it is accepted by management. Machine learning techniques will then be run on the model created to improve accuracy.
For team leaders, it is important to emphasize the work’s practical application. It is fundamental, especially in large companies, to ensure that algorithms do not end up simply as beautiful theories. The responsibility of the data scientist can officially be wrapped up once they have finished constructing their model, but personal responsibility presses on, even at the risk of sounding gloomy, until the model is run.
Then comes the wait for results. Models are not foolproof: a key parameter may have been left out, either because a wrong variable altering the outcome has been entered or because the subtleties of the business have not been grasped. Execution may also fail: the insight might be good, but it is not put into practice in the right way.
The quality of the algorithm is not the exclusive yardstick to measure that data scientist’s performance. Their responsibilities include some sales-related work-dealing with customers, explaining to them what they have found, guiding them on what to do with their data, always using the communication skills that the data scientist – or any member of their team – should hold. Another type of valuation can be extracted from this work.
Finally, let’s once again remind ourselves of the importance of the human factor. Data science is not a black box enshrouded in mystery. Data scientists are not oracles, nor are their words prophecies: the algorithm may make a specific prediction, but the option to translate that insight to the business or not, with all the consequences that it may incur, ultimately depends on the person who makes the decision. Hence the importance of the human factor in the whole process.
Ethics: the essential complement to science
Data are a highly sensitive material. And in an industry where the raw material is so sensitive, trust is essential. That is why the data scientist work takes on a strong ethical commitment, in the sense that they must ensure a responsible use of the information given to them. In an increasingly digitalized society where everyone unwittingly and involuntarily leaves trails, it would be possible to invade anybody’s freedom simply by using the appropriate knowledge and powerful servers. But nobody wants that to happen.
Ethical commitment is not just a sign of sound judgement; it is also imperative in an information society that may face dangers that are not fully known: mass surveillance, lack of privacy, large-scale loss of data, etc. It is therefore the data scientist’s duty to work transparently, explaining in a simple and accessible way what their job is and how they do it, to quash the threat to privacy that people might often associate with big data. Few people are interested in knowing the intricacies of an algorithm, but they do want an outline of the route that the data follows.
One way to ensure that data gets used ethically is to work on open data projects, where anyone can access the data, contributing in some way social awareness and utility. For example, Spanish bank BBVA has launched several of these projects, designed to improve the quality of life of citizens or to optimize efficiency in cities through the intelligent use of information.
Open the data, give something back to society, become an aggregated data platform for others to use for the creation of value in cutting-edge projects where altruism replaces the quest for profit. That is the ethical commitment that many data scientists have taken to safeguard the good name of their specialty.
__________________________________________________________________________________
This article is part of the study “Data Scientists: Who are they? What do they do? How do they work?“, available on Rebel Thinking.
Other articles from the study:

