8 Challenges that Data Science Projects face

Well, the obvious one doesn’t make the list here – technical incompetence. There is no respite in the case of technically incompetent projects. Such projects are bound to fail. The incompetence could be in the form of incorrect code syntax, indentation error, or coding too many algorithms without being mindful of the prerequisites. So, we are working with an assumption here that the brains behind the project are technically sound. Moreover, this list is going to consist of common adoption problems other than technical incompetence which are commonplace in the real-world application.

No management buy-in

This can pretty much put an end to a passionately developed and technically viable project. Getting the management invested in a business decision is a fundamental requirement of any project. The same thing applies to every data science project as well. The management needs to understand the project and its implications on business. There can be many reasons for not getting buy-in from the management. It could be because of the management:

  • Feels a data science project is not strategically relevant at the given point of time
  • Doesn’t understand data science and therefore doesn’t want to take a chance
  • Doesn’t believe that data science is the answer to their problems
  • Is taking too much time to decide
  • Is resistant to change

Lack of Scalability

Most products need to be updated/upgraded from version to version. This is common during the development stage. It is also common for developers to sometimes fall in love with the first versions and ignore the need for scalability provisions. In reality, several iterations are required to factor in critical variables like user expectations/feedback. Appropriating a relevant budget is also crucial for scalability.

Not ‘problem’ driven

The best data science institutes around the world consider data science to be a ‘problem solving’ tool. And for obvious reasons. Artificial intelligence and data science are at the forefront of research and development. The widespread availability of data has made sure of that. However, any data science project that is initiated without a well-defined problem-statement is akin to an organization that starts life without a mission statement; or in other words, looking for a needle in a haystack. Conversely, if there is a well-defined problem statement, all efforts can be directed towards specific deliverables and action areas.

Incompatible workflow

This is another major pitfall when it comes to data science projects. Some projects don’t take off because they don’t factor the end-user while building their projects. Being able to empathize is one thing but gathering real-time end-user feedback is a whole different need altogether. As we discussed in the previous section, the problem statement is key. By taking this approach it’s easy to begin with the end-user in mind and build projects from that point onwards.

Overfitting

Overfitting is a condition wherein instead of defining the relationships between variables, the statistical model describes the random error in the data. This leads to an unnecessary increase in the complexity of the model and results in misleading regression coefficients and R-squared values. The problem with overfitting is that it makes the model unemployable outside the original dataset, thus making it a counter-productive endeavor. Rather than representing the genuine relationship between the variables, an over-fitted model represents the noise.

Doesn’t solve Business Problems

When a data science project doesn’t solve business problems, it becomes a figurative paperweight, no matter how technically sound it is. Machine learning and deep learning, which are subsets of artificial intelligence, put tremendous power in the hands of the project developer/manager. However, without the right business application and use, that power is worthless. The success of any project comes from its ability to impact a business and contribute to the value chain. This is the reason why many fancy PoCs never see the light of the day.

Too many cooks

A classic problem no matter which industry you look into. This isn’t a game of soccer where a 12th man gives you an advantage. The number of heads is inconsequential if synergy and cohesion are missing. Starting a data science project without defining clear roles is going to create problems down the line. If there are too many people working on a project, the problem can be in the form of differing philosophies among the members of the team. And if the roles are not properly defined, it could lead to communication gaps and misunderstandings.

Lack of domain knowledge

This is perhaps the biggest challenge facing data scientists in general. While data science is industry agnostic, projects are not. Depending on a project, expertise may be required in one domain or several. This means that data scientists have to work closely with domain experts and collaborate with them to find optimal solutions. However, in the real world, this process turns out to be far more difficult than it sounds. The problem is that most domain experts are only somewhat familiar with data science, if at all. And data scientists can’t possibly be an expert of all domains.

In our next blog, we will try to examine these challenges one by one and provide possible solutions to each of them.

Leave a Comment

Your email address will not be published. Required fields are marked *