Thursday 9 April 2015

Making a career in Data Science- so who is in pole position?

I saw an interesting infographic on the topic(see below), and wanted to share my thoughts around the theme

My thoughts
  1. Demand will significantly outstrip supply of 'genuine' Data Scientists- most organizations will struggle to attract and retain talent
  2. Many BI, analysts and programmers will try to change their titles/ role designations without picking up new skills- and cause some damage
  3. Integrators- those who can combine business awareness with statistics and technology- will be most successful   (http://bizmachinelearning.blogspot.in/2015/04/what-makes-for-ideal-data-scientist.html)
  4. The core challenge for organizations adopting data science is really lack of imagination to understand possibilities and discipline to avoid the hype
  5. I see the primary source of talent from college being MBAs with a good math (probably CS)background- I see a pure CS background shaky in problem formulation and decision making
  6. I see the Data Scientist spend a lot more time with Management to take business calls and much less with graphic designers
  7. I see Data Scientists significantly open new vistas - and increase more demand- see automation really impact BI professionals
  8. I would have like a question on career path for a Data Scientist- see them getting in business leadership roles
  9. I would have liked a question on domains of interest- see IoT and Personal Data as new playground for data
  10. It will take another 5 years - but we will finally have a broadly acceptable definition of Data Science

What do you think?

Sunday 5 April 2015

So how do I become a Data Scientist?

Let me address another question that I keep getting asked- what should I do to enter the world of Data Scientist- not a wonder given all the buzz around this space- and pitched as among the top 5 jobs of the 21st century

Before I get started, let me mention upfront that I don't operate in Silicon Valley and my understanding of Data Science is not the frothy ideas from the start-ups  but more embedded in how traditional organizations that have a more critical view of returns and don't easily buy in the hype. I see a much more stable role for Data Science in reinventing business processes and bringing a new culture to make decisions.

Also, I don't have jobs to offer and am not in the business of providing leads for consulting gigs- the best I can do is to share guidance around how you can better prepare yourself in the space. I would prefer to answer via comments below so it becomes a shared discussion- but if you wish, can handle offline mails at raghu2222in at gmail dot com [might take 1-2 weeks to get back and no promises]

First, pls check out my thoughts on key attributes for making a good data science career

http://bizmachinelearning.blogspot.in/2015/04/what-makes-for-ideal-data-scientist.html

To me, there are different paths to achieving the chosen career, but some things are a core need irrespective of role-context
a) strong consultative skills- communication, quick understanding of context
b) strong math skills- you might not understand all the details, but atleast appreciate the highlights to contribute in discussions
c) technical context- every role might have a varying technical architecture (statistical packages, Big Data technologies etc)

I would suggest taking in college or online programs to pick up on the math skills and technical knowledge- an option might be to work on the high quality MOOC courses from Standord, Caltech and many others. As in many areas, I would focus more on learning by experimenting than getting knee-deep in theoretical concepts- the good news is that most of the platforms, analytical packages and rich data sets are available for free.

The critical consultative capability and business context comes with experience and you might even be able to apply some of the new insights and tools you have picked up without formally changing your role. Typically we bleed in freshers through blended teams whereby the smart talent play along in the initial stages carrying out the data acquisition and model implementation tasks... and over 6-9 months start playing the more interesting conversations around definition of the business problem to be solved, articulating recommendations and metrics to track progress

Let me know what you think

So how does a Data Science engagement get started?

This blog is directed to business leaders who read up an HBR article or a top consulting firm pontificate on the virtues of leveraging data science and want to try that out in their own operation- or product or services companies latch onto this as the latest opportunity.

I guess the key question is "How do you get started?"

One view is to look at key business themes like improving sales, reducing delivery cost or improving hiring quality of hires- literally key strategic objectives presented to their board. This addresses the point of starting from business objectives first, but has the risk of being too broad in charter that the initiative might get lost in generalities. There are initial talk around machine learning, cognitive, AI etc and but enough attention is not given to link the big themes with the available data science capability - a tragedy as genuine transformational opportunities are not followed through.

Another approach is to take a product-centric approach. There are a whole host of products and solutions that aim to solve business issues through data science literally as a packaged solution. The advantage would be a clear articulation of specific outcomes that could be achieved and much faster implementation- as the approach is defined in a very targeted fashion. My beef with this approach is that it often gets to a fancy hammer looking for nails. There have been too many problems where we decided to force-fit a Data Science solution.

My suggestion is that the best way is to look at a single theme and then identify specific threads that could be handled- the statistics and algorithms should be kept out till there is clarity on this and also ensure business alignment.

for eg- if the idea is to reduce manufacturing cost, it would be better to define a priori the specific focus area- e.g production cost for widget 616, in-bound logistic costs for factory. This would then lead to whether there is sufficient stake holder buy-in and available data feeds to create and validate hypotheses.

Saturday 4 April 2015

What makes for an ideal Data Scientist?

There have been bunch of definitions of data science and the skills required for this... an oft quoted one is by Drew Conway- the key being the integration of skills from diverse areas- math and science skills, technical wizardry and finally business expertise

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

While I agree to this conceptually, I am not sure if most businesses would actually land talent with a perfect blend of the three. Then the conversation shifts to how do we build a team where we can get many these skills across multiple roles- but still ensure that conversations don't get distorted - the last we want is the business need to improve campaign management, get bogged down to an esoteric conversation on whether we use Pig or Hive, or discussing the various parameters of the statistical model. These are critical and have their place- but clearly the most important role is to fully appreciate the business need and real priorities and go about exploring how that need could be realized.

This implies two key principles that we have seen work in my organization
a) the business interfacing is the critical skill- the individual would need to get a true appreciation of the business objectives and understand the various data sources that could be leveraged (along with associated data access/ quality constraints). the individual would also need to be able to present the results in the right context- with the right level confidence and humility
b) the analyst needs to have T-shaped skills- a fair bit around business domain and  also around the technical aspects of data handling and also on the pros and cons of the various statistical models (more on that in subsequent blogs)

Three key attributes for the data scientist needs to be
a) willingness to learn- inevitable they will need to constantly pick up and engage with expertise in various fields
b) curiosity to ask and challenge traditional beliefs- you would be surprised how many experts concede that their heuristics might be questionable in the present circumstance
c) excellent communication capability- both during understand the need and also while communicating the findings in a meaningful and mature manner

let me know if this makes sense to you

Launching the Data Science for Business Blog

I have tasked with building a team of data scientists over the last few months. As I come with a consulting background I have a very different orientation to business outcomes relative to the team of statisticians and "big data" guys we normally hire. I have been asked at multiple forums about why I have been able to get results and communicate without loading up on jargon... here is my attempt to share my insights over a series of blogs

While we will discuss algorithms and statistics, the focus will be more on how we deliver business outcomes- bridging the chasm between academic theory and business needs.

While Data Science is seen as a geek subject with loads of jargon- statistical mumbo jumbo, Hadoop, Big Data analytics and the like-  I believe the need is never greater for the integrators. The specialists who can understand the business context and determine how to achieve it using the new tools. The difference from a typical technology view is that each of the domains is very different and a basic level of breadth is required to pull together the pieces

Hope my thoughts are useful and I look forward to your comments!