Monday, April 20, 2020

One Stop Shop for Machine Learning - Azure Notebooks (with Python)


The goal of this post is to give a better understanding of what Machine Learning is with a detailed case study walk-through and how we can start learning using Python and Azure Notebooks.

Machine Learning, is a means of building models of data – finding, discovering and creating insights from data. It is a suite of statistical methods that are used in conjunction to either 'predict' or 'fill in' a solution based on known parameters. Machine learning does take a lot of burden off of humans (prone to error) and works through data in an incredibly fast rate to really give an impressive result.

Key Terms that are often used in Machine Learning:
Training & Test Data Split usually at 80% - 20%. The training data is used to make sure the machine recognizes patterns in the data and the test data is used to see how well the machine can predict new answers based on its training.
Sentiment Analysis commonly used in marketing and customer service to answer questions such as "Is a product review positive or negative?" and "How are customers responding to a product release?"
Confusion Matrix also known as an error matrix. The confusion matrix quantifies the number of times each answer was classified correctly or incorrectly.

Typically the ML Process consists of
  • Gathering data from various sources
  • Cleaning data to have homogeneity
  • Selection of right ML algorithm model building
  • Gaining insights from the model’s results
  • Transforming results into visual graphs
Here are some of the Top Big Data Use Cases












Below would be the Approach to solve one of the Business Problem Using Machine Learning – Predictive problem to improve client business value





















Here are the detailed steps of the approach
























Now let’s talk technical and get our hands dirty with Machine Learning using Python and Azure Notebooks

Azure Notebooks is a cloud-based platform for building and running Jupyter notebooks. Jupyter is an environment based on IPython that facilitates interactive programming and data analysis using Python and other programming languages. Azure Notebooks provide Jupyter as a service for free. Jupyter notebooks are composed of cells to enter text / code / data.

Case Study : Machine Learning to create a model that predicts which passengers survived the Titanic shipwreck

Before getting into walk-through of the model, let’s get acquainted with Key Python Libraries




















Let’s start building the project - Hypothesis for the survival on the Titanic which can be determined by various parameters from the data set.

Step 1 : Create an Azure Notebook and Import Titanic Data Set which is publicly available


Step 2 : Import the python libraries – pandas & numpy and open the Titanic dataset in pandas data frame






















Step 3 : Data Cleansing – drop the NaNs (Not a Number) and the columns which are not necessary. To avoid complex string manipulations, for the time being let’s ensure data has all numeric values










Step 4 : Train / Test Split – let’s start with 3/4th train and 1/4th test






Step 5 : Setup the model using the class RandomForestClassifier with a Yes/No answer – will a person survive or not







Step 6 : Now let’s check the accuracy of the model along with analysis using confusion/error matrix


















Step 7 : Review the important features









Age was the biggest determiner of survival in the Titanic accident, followed by male/female, and then your fare class

That’s great, Jupyter Notebooks are highly interactive, and since they can include executable code, they provide the perfect platform for manipulating data and building predictive models from it. Develop and run code from anywhere with Jupyter notebooks on Azure. Azure Notebooks helps to get started quickly on prototyping, data science and also for academic research.