Modern Data Scientist: Technical and Soft Skills You Need to be successful

  • Home
  • /
  • Blog List
  • /
  • Modern Data Scientist: Technical and Soft Skills You Need to be successful
Modern Data Scientist: Technical and Soft Skills You Need to be successful
on 15 Sep 2016 21:40 PM
  • Rang Technologies
  • Data Science

What skills are required to be a Data Scientist?

OR

Is strong mathematics background required to pursue a career as a data scientist?

We at Rang Technologies see a lot of questions like this. It's hard when you're trying to break into the field to know exactly how much math & stats you need.

Primarily, it depends on how a company is defining "data scientist." Some companies say "data scientist" but really mean "data engineer", which is much more focused on the software engineering side of things and strong with coding production systems, data storage and extraction, cluster management etc. The latter is less Math/Stats and more CS focused.

Secondly it depends on how a company is dividing responsibilities. Some look for people who are either strong in programming or strong in mathematics/statistics, and then combine them in a team. Others look for "fully fledged" data scientists who have the deep insight in different models and when to apply which algorithms and can do all the implementation of the data. How the role you're looking at fits into these descriptions will affect how much math/stats you need to demonstrate.

Given the variance, the trick is to carefully dissect the job posting and dig into the background of the current team. LinkedIn is a great place to do this. You can generally figure out the different roles (job titles) as well as see the skills/background people in these roles have.

That said, there are a few mainstays that, irrespective of role, you should be demonstrating on your resume. Either through your academic courses/coursework, online courses you've taken, or project work you've completed (including write-ups that demonstrate your understanding). Specifically:


  • Linear algebra (and ideally basic multivariate calculus)

  • Regression ... linear regression and the things that violate the assumptions of linear models (e.g., autocorrelation in time series data, non-independent observations)

  • Probability theory ... especially Bayes' Law and Central Limit Theorem

  • Numerical analysis (e.g., time series analysis and forecasting)

  • Core machine learning methods (clustering, decision trees, k-NN)


How to take action now?

Compare this list of mainstays versus your resume. Which do you cover off? Which are you missing? Of those, which have you used or are proficient with? Time to make space to mention them - and if it is via project work, think about linking to a more detailed write-up (for example on GitHub) so you can highlight a deeper level of understanding. This is especially important for non-Math/Stats candidates, as the burden of proof is higher! If you've covered more than the above, great! Make sure the most relevant courses shine through and get you noticed.

Technical Skills: Analytics

Education - Data scientists are highly educated - 88% have at least a Master's degree and 46% have PhDs - and while there are notable exceptions, a very strong educational background is usually required to develop the depth of knowledge necessary to be a data scientist. Their most common fields of study are Mathematics and Statistics (32%), followed by Computer Science (19%) and Engineering (16%).

SAS and/or R - In-depth knowledge of at least one of these analytical tools, for data science R is generally preferred.

Skills Required are as below:
Technical Skills: Computer Science

Python Coding: Python is the most common coding language I typically see required in data science roles, along with Java, Perl, or C/C++.

Hadoop Platform: Although this isn't always a requirement, it is heavily preferred in many cases. Having experience with Hive or Pig is also a strong selling point. Familiarity with cloud tools such as Amazon S3 can also be beneficial.

SQL Database/Coding: Even though NoSQL and Hadoop have become a large component of data science, it is still expected that a candidate will be able to write and execute complex queries in SQL.

Unstructured data: It is critical that a data scientist be able to work with unstructured data, whether it is from social media, video feeds or audio.

About Rang Technologies:
Headquartered in New Jersey, Rang Technologies has dedicated over a decade delivering innovative solutions and best talent to help businesses get the most out of the latest technologies in their digital transformation journey. Read More...