Python is taking over the Data Science Scene

Programming Apr 20, 2021

Back in 2016 in an older version of SudoSecurity or TheLastTechie for those who remember it, I wrote an article about the R Language was going to take over the data science scene. It was suggesting that the more serious companies that are data science heavy would lean more on the R programming language. I am going to admit here that I am wrong.

A recent Terence Shin analysis of more than 15,000 data scientist job postings suggests that Python adoption keeps growing while the more specialized R programming language is in decline. This is not suggesting that the data scientists will drop the R language anytime soon. More likely, we will see both Python and R used for their respective strengths.

If I am correct at all, 2021 is the year that data science will become an enterprise-wide capability that will impact every line of businesses and functional departments. The language that will most likely dominate the field is the one that is the one that is most accessible to the broadest population within the enterprise.

The Data Science Boom

The following technologies that are fueling the boom in 2021 as follows in the chart right below.

From: https://towardsdatascience.com/the-most-in-demand-tech-skills-for-data-scientists-d716d10c191d

The chart below is from 2019. The results are not much different at all.

From: https://towardsdatascience.com/the-most-in-demand-tech-skills-for-data-scientists-d716d10c191d

Some trends are appearing if you look closely enough at the two charts right above.

  • There is a huge increase in skills related to the cloud.
  • Similarly, there is also a large increase in skills related to deep learning, like PyTorch and TensorFlow.
  • SQL and Python continue to grow in importance, while R remains stagnant.
  • Apache products, like Hadoop, Hive, and Spark, continue to decline in importance.

Not so fast there!

If you dig a little deeper into the technologies and skills that to be growling the fastest are those that are the easiest to learn. Hence, both TensorFkiw and PyTorch both saw some growth, PyTorch's growth has outpaced the growth of TensorFlow. The popularity of PyTorch is also starting to play out in the projects themselves too. Even the contributors to PyTorch are set to exceed the number that is contributing to TensorFlow (whereas the number of contributors to PyTorch over the last 12 months already surpasses that of TensorFlow).

Simplicity and convenience are going to be app killers. Some examples of this, in my opinion, are MongoDB and Fastly. They are the go-to defaults that developers are using and enabling them to develop faster and become more productive.

This brings us back to Python and the R Programming Language.

Without a doubt or second thought, the R language remains highly relevant in the field of data science. We are seeing developers and data scientists moving from R language to Python than vise versa (twice as many so far). The reasons include way better usability, performance, the ecosystem, and more for Python. Yet, at the same time, R is remaining broadly used for statistical computing, but as more and more companies – and their developers and data scientists – are embracing data science from a highly technical – not scientific standpoint – Python will remain booming.

Tags

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.