There’s More to Data Science than Software Development

Opinion

One of the most off-putting things I find when talking to Data Scientists is a laser-like focus on software development. “Do you code in R or Python? Pandas has a new feature! How many years of experience do you have with library ‘x’?”

Along those lines, I recently saw a tweet that Data Science has extremely low barriers to entry. The tweet suggested that all you have to do is take some software development courses online and you’re good to go! It made my stomach churn. It confused the whole field of Data Science with software development.

Why do Data Scientists pretend to be software developers? Data Science is so much more than that. It is massively disappointing to see Data Scientists singularly pushing the software development side of the field.

Photo by James Healy on Unsplash

Is software important for Data Scientists? Of course. Is skilled coding a big part of the work? Yes. Is there a lot that Data Scientists can learn from software developers? Absolutely. Are software development skills the most important attribute of a Data Scientist? No.

When we push software development as the singular most important piece of Data Science we risk turning our teams, and our field, into another IT department. That’s not to criticize IT work. I’ve worked with many large businesses, and I couldn’t tell you how many incredibly smart, talented individuals I’ve met within IT departments. But those teams play defined roles in their organizations. In general, they aren’t driving overall business strategy. Yet, I see so many Data Scientists who get into the field wanting to influence big, important decisions in a business then focus entirely on maximizing software development skills.

Software development is an important tool in your belt. But it’s not the only tool. We can’t make lofty promises about supporting business-critical decision-making and then revert to talking exclusively about software development.

I’ve written about the more attitudinal tools that Data Scientists need here. However, even on the technical side, there is more than just software.

These are the four critical skills, outside of software development, which many Data Scientists are lacking.

Basic Statistical Background

You won’t truly understand what your code is doing unless you know the basic statistics behind it. I’ve encountered many Data Scientists who could code a complicated deep-learning model in no time, but they barely grasped the meaning of a normal distribution.

Skilled Data Scientists know the foundations of their tools. Mike Tyson said it well “Everyone has a plan until they get punched in the face.” Well, when your Data Science model misbehaves and punches you in the face, it is often an understanding of the fundamentals that will allow you to correct course.

Photo by Bogdan Yukhymchuk on Unsplash

You must have a solid mathematical and statistical foundation. Are you familiar with the core concepts of Frequentist Statistics? What about Bayesian Statistics? If you had to write first principle pseudocode for a model that you’re leveraging, could you? What are the gaps that you’re missing and how do you fill them?

Further, I would argue that often the best solutions in Data Science are more fundamental than many would like to admit. The elegance of simple, fundamental solutions is a lesson that the most successful Data Scientists have shown me repeatedly.

Critical Thinking

The best Data Scientists are good at understanding arguments, questioning others, and teasing out the truth of what someone is bringing to the conversation. Data Science isn’t about regurgitating information line for line; there’s much more art to it than that. Art that stems from being able to decipher the quality of information you receive.

Some of the biggest misses I’ve seen in Data Science projects started with someone taking bad information, or a weak argument, for granted without challenging it. If you’re taking on a project, it is your job to ask the right questions and analyze the situation from the beginning. To say you were taking orders or were set out with bad information is a cop-out that won’t get you very far.

I find critical thinking to be one of the most powerful determinants of whether someone will succeed in Data Science. Critical thinking is needed for any position in business, but it is especially crucial in Data Science. The questions asked are too ambiguous to be tackled without proper examination. Why? Because in a business context you’re going to be on the receiving end of a lot of information, theories, and opinions. Some of it will be well-founded, and some of it not so much. As a Data Scientist you are attempting to turn that information, and those ideas, into statistical models. If you can’t decipher the quality of the information you receive, you will be lost at sea.

Photo by Anastasia Taioglou on Unsplash

To build those muscles for debate Data Scientists need a basic knowledge of philosophy. There are free courses available online that will help you to get there. The best ones are not targeted at Data Science specifically. For many Data Scientists, those courses will feel frustrating — the right answers won’t be black and white. That’s the point.

Communication

It’s unfortunate, but I’ve seen high-quality models fail because of poor communication. The Data Scientist couldn’t explain clearly what their model was doing or what the results meant. Since no one could grasp their work, the project was considered a failure. In actuality, the results could have been hugely insightful. However, if you can’t pull those insights out in a clear, meaningful, relatable way for your clients, and team members, then you’ll never get to have the big impact you aimed for.

I would go so far as to say that no matter how well executed a Data Science project is, it will not be considered a success without proper communication. If you don’t know what that looks like for your project, then you’re putting all of your hard work at risk of being swiftly dismissed.

Photo by Campaign Creators on Unsplash

Some good communication comes from having a good statistical background, and some from strong critical thinking skills. But communication, in its own right, is a key skill. It’s the reason that you often find incredibly smart people vanquished to some dark, isolated corner of the office. They just don’t know how to communicate and the result is that much of the impact of their work is lost.

There is something to be said for specifically practicing communication. Again, there are many courses online to get you to the level you need to be. And again, the best of those courses are not Data Science specific.

Domain Expertise

I recently overheard a discussion between Data Scientists about pricing analytics, a field with which I have some experience. I could see the lack of understanding of the business side of the question. The Data Scientists were jumping straight into questions about model selection, but they had almost no grasp of the practical data limitations they would face. Without domain expertise, they were headed down a path to certain failure.

Just because you can grow a houseplant, doesn’t mean you can run a vineyard. Nuances matter. It is critical to learn the specifics of the problem you plan to solve.

Photo by Jaime Casap on Unsplash

At times I look to academia and am baffled by the current state of Data Science. In academia, you start with domain expertise and then apply statistical models. Statistical modeling is secondary to theoretical understanding. Yet, in Data Science we start with statistical modeling and often neglect the theoretical understanding, the domain expertise, that underlies the questions being asked.

Do I think the academic model is perfect? Far from it. And I wouldn’t recommend that all Data Scientists be domain experts. But, Data Scientists do need to find a way to incorporate domain expertise into their work. Whether through working in a specific industry niche or by finding business partners who can provide background knowledge for a project.

There’s More to Data Science than Software Development Republished from Source https://towardsdatascience.com/theres-more-to-data-science-than-software-development-eb8c2fd5ac0c?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>

Time Stamp:

More from Blockchain Consultants