In a previous post, I framed the Microsoft Professional Program in Data Science (MPP) and discussed some of the benefits of the program at a high level. In this post, I hope to discuss in more detail my experience with the MPP and what it provided for me and the hundreds of others who took part in the initial cohort.
The MPP aims to provide the core skills necessary for a program graduate to get started in a career in data science. To the extent possible with online learning and hands-on labs, I believe that it largely fulfills that goal. What will truly measure the impact of the MPP over time though–for graduates and for employers–is how graduates are able to apply their knowledge in the field.
Apparently there is a large gap between the jobs available and those with the required skills, and we need to prepare accordingly.
While the MPP is centered around Microsoft technologies to a significant extent, there is also a substantial focus on open source development with either the Python or R track. The foundation is laid to get you started in open source programming, exploratory analysis, and machine learning. In addition, you have the opportunity to learn or enhance skills by using a variety of technologies such as Azure Machine Learning, Apache Spark via HDInsight, streaming data with Stream Analytics, and so on.
At various junctures, there were choices to make between courses, but here is a brief summary of my own path through the MPP:
- Data Science Orientation (required)
- This course introduces the program and provides a foundation in descriptive statistics. It uses Excel for data exploration and stats.
- Querying Data with Transact-SQL (required)
- I’ve worked with SQL for a number of years. I learned some new things about grouping sets in T-SQL, but I don’t think that this course would be difficult for anyone who has worked with data but has never used SQL before (is that an oxymoron?).
- Analyzing and Visualizing Data with Power BI (alternate Excel)
- Like SQL, I know a thing or two about Power BI. As with the SQL course, I picked up a few tips here or there even though I use Power BI professionally. For anyone who wants to get into data science and would not specifically need to work with Power BI, however, the Excel course might be a more appropriate option.
- Statistical Thinking for Data Science and Analytics (required)
- This course is not from Microsoft but instead is offered by Columbia University. It represents the turning point at which the MPP more or less tips from business intelligence into data science. It brings statistics, probability, Bayes, regression, clustering, and more. It is probably the most intense course (or at least it was for me) in the MPP.
- Introduction to R for Data Science (alternate Python)
- The first R course is a great one that is actually a series of DataCamp tutorials. I liked the format of short videos and then a lot of hands-on learning. I’ve been working with R for a while now, but like every other course, I picked up some new knowledge about basics that I had either forgotten or never learned before.
- Data Science Essentials (required)
- The Essentials course has some good hands-on content where you can compare Azure ML, Python and R together in the same lab documents. For instance, if you choose to use a native Azure ML module, the code for R or Python is available for comparison too. As a result, I didn’t feel like I missed out as much by not choosing the Python track.
- Principles of Machine Learning (required)
- Whereas the Data Science Essentials course felt more like an introduction to Azure ML, the Principles of Machine Learning course focuses more in-depth on models. It covers a breadth of classification, regression, clustering and more. This ML course is probably the best preparation for the final project later on.
- Programming with R for Data Science (alternate Python)
- The second R course does not share the initial R course’s DataCamp environment. This was disappointing from a structural perspective but not in terms of content. The course delves into functions, reading data, transforming data, linear modeling, base graphics and more.
- Developing Intelligent Applications (two alternates: Spark or applied machine learning)
- This course focuses on Azure Stream Analytics, the Text Analytics API from Cognitive Services, and Bot Framework. I liked the labs and content, but it was certainly a different experience compared with the other courses. If you are interested in exploring some new technologies and care for C# or app development, take this course. Otherwise, take the Spark on HDInsight or applied machine learning alternates. One of the best parts of having these courses on edX though is that you can audit the alternate courses for free at any point.
- Final Project
- The capstone project is an Azure ML project where you need to apply much of what you’ve learned about data cleansing and machine learning. I can’t loan out much advice about this project other than some fairly general points:
- Take advantage of as many of the 100 submissions as you can
- Focus on cleaning your data
- Do some additional feature engineering
- Focus on feature selection
- Try numerous algorithms and different parameter options
- The capstone project is an Azure ML project where you need to apply much of what you’ve learned about data cleansing and machine learning. I can’t loan out much advice about this project other than some fairly general points:
Have questions or interested in more specific feedback? Leave a Comment.
Hello David, thanks for the blog post. How long did it take to finish this program?
Around 130 hours total. This was heavily weighted toward more time spent on the later courses. The estimate that I saw from either Microsoft or edX early on was around 225 hours if you did not have some prior experience and watched all the lectures at normal speed.
thanks for sharing nice information and nice artical and very usefulll infroamtion…..