This page looks best with JavaScript enabled

Machine Learning Tutorial - Lesson 01

Lesson 01 out of 12

 ·   ·  ☕ 16 min read · 👀... views
All these tutorial are written by me as a freelancing working for tutorial project AlgoDaily. These has been slightly changed and more lessons after lesson 12 has been added to the actual website. Thanks to Jacob, the owner of AlgoDaily, for letting me author such a wonderful Machine Learning tutorial series. You can sign up there and get a lot of resources related to technical interview preparation.

Introduction to Machine Learning

Introduction

In this series, I am presenting to you, the hottest topic of this era, Machine Learning. Throughout this series, you will be starting from scratch, and slowly learn from basic to advanced algorithms of Machine Learning. We will walk you through a lot of popular Machine Learning frameworks like NumPy, Scikit-learn, TensorFlow, Keras, etc. So get ready to start your awesome ride to the world where Computers (Machines) are treated as children and we teach them many things with real-world data.

We will also learn how to understand and manipulate data. I will put effort on creating complete pipeline for your data from preprocessing to model evaluation. After this series, you can do extensive exploratory data analysis to understand and present different summary and visualization of data like a chart given below:

plotly

What is Machine Learning?

Whenever people think of Machine Learning, they imagine a robot who looks very similar to a human and act like a human. It has its own heart and mind, can take decisions, and be intelligent. But actually, machine learning is not necessarily a robot/agent. It can be any device or even only software with no hardware tools (we will mostly build a machine learning algorithm/software in this series). The special characteristic of Machine Learning-based software is that it can react to new kinds of inputs and give output based on that new inputs or instructions.

In a typical programming world, we create an application with a programming language, using different kinds of logic like conditionals, loops, etc. In traditional programming, we give the program a set of formulas or rules, so it can take a limited set of inputs and give a set of fixed outputs. Programming a machine learning algorithm is slightly different from traditional programming. In Machine Learning, we provide a program a set of inputs (and probably their outputs) and expect that it will work on a new set of inputs (that is never seen or expected by the program). When a program can look at data, and then become able to analyze a new set of data, it is Machine Learning.

Traditional vs ML

According to Wikipedia:

Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data.

So like the above image, you do not need to code the rules of adding two numbers. You can only provide a bunch of examples for addition (e.g. 1+2=3, 4+3=7 etc.), and the program will learn how to do addition by itself!

Prerequisites

What do you need before getting started with this Machine Learning series? You can do ML with any programming language. Most of the programming languages have frameworks/libraries for machine learning. But the most popular ones right now are Python and R programming languages. For this series, we will go with Python because it is used more than R.

Moreover, you will need to know Math. Specifically, to understand the core concepts of Machine Learning like gradient descent algorithm, forward feeding, and backpropagation algorithm, you will need to know Calculus. But for the sake of simplicity, generality, and most importantly fun, we will try to remove most of the underlying concepts of ML, and focus on the application and usage of pre-built ML algorithms. So do not worry about math too much if you don’t want it right now.

Applications of Machine Learning

1. Product Recommendations: Currently the most widely used application of ML is recommendation systems. Almost every big tech company uses this for their product advertisements.

2.Image Recognition: Did you ever think that computers will be able to understand scenery? Now it can identify each object of a scene and make decisions just like humans.

3.Speech Recognition: Ever wondered how your smartphone assistant (Siri, Google Assistant, Alexa) can understand your speaking and talk to you? It is also an application of ML. More specifically, machine learning algorithms with sequential data handling capabilities can work for speech recognition and synthesis.

4. Self Driving Car: Self Driving Car will be the future. To make it happen, a collaboration from Image Recognition to Pathfinding and decisions making. All of these are possible with the help of ML.

5. Fraud Detection: ML can easily detect outliers from your data. Using online fraud detection, many bank transaction systems can be kept safe. We will learn how to detect outliers in your data and preprocess it so your data becomes clean.

6. Image synthesis: ML can be used to generate new images from existing images. With the help of Generative Adversarial Networks, all of these cartoon charters are generated.

/images/post/machine-learning/gan-gen.jpg

7. Story Generation: Using advanced Natural Language Processing (NLP) techniques, now computers can create beautiful new stories that are very pleasing to hear.

8. Content writing assistant: Have you ever heard of the online tool Grammarly? It is a wonderful tool that helps you correct your spelling and grammar mistakes. It is solely based on NLP and Machine Learning.

9. Translation: People now can understand any language with the help of their smartphones. This is also possible because of text translation with Machine Learning. Now you can just hold your camera to a foreign-language sign, and it will capture the sign, convert it into text and then translate it for you. We will have a look at OCR in one of our upcoming lessons.

10. Streaming Video Compression: Last but not least, with machine learning high-quality video conferencing is possible. An HD web camera can produce up to 1 Gigabyte of data within seconds. Imagine how much data is needed to be passed through the internet when you are sitting in an online class with 30 more students. But you are getting high-quality video because, with ML, the video is being compressed to kilobytes and then sent through the internet. Image, Audio, and Video compression are also done with ML nowadays.

Brief overview of Python

If you think that you became rusty in python, or don’t have the confidence to go deep into ML without sharpening your python skills, here I will cover the fundamentals of python. You can skip this section if you are confident enough. Below are the top things of python we will need throughout the series:

Importing

As said previously, we will be using a lot of libraries for ML. So we start with importing all those libraries. In python, it is always good practice to import all the required libraries at the top of the .py file.

1
2
import cv2 # Import the popular computer vision library OpenCV
import numpy as np # Importing the library named Numeric Python (NumPy) as np

You can install new libraries through a package manager like pip or anaconda. For example, to install numpy, you simply type in the console:

1
pip install numpy

Stating Variables

Variables are placeholders for different kinds of values. Python is a dynamically typed language, which means one variable can be used to hold different types of values. Python has the following four primitive types for variables:

  1. Integer: An integer can hold a whole number. Python supports big integers by default, so you can keep as big an integer as you want in a variable.
  2. Float: A float is much like an integer, but holds a rational value.
  3. Bool: A Boolean value is a two-state value. True or False.
  4. Str: A String type holds characters or Unicode values.

Below is an example to use all the built-in types of variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
an_integer = 5
a_float = 5.5
a_bool = True
an_str = 'ML'
another_str = "ML"
yet_another_multiline_str = '''This
Multiline
string'''

# This is a comment
# You can initialize multiple variables in one line with pattern matching
a, b = 5, 3.6

# Assignment statement returns the same value back. So we can initialize multiple variables like this
a = b = 5

# To know a type of a variable, use the type() function
print(type(a)) # int

Operations

Below is a snippet that shows all common operations that can be done in python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
a = 5
b = 3

# Arithmetic
print(a + b) # 8
print(a - b) # 2
print(a * b) # 15
print(a / b) # 1.6666666666666667 (floating point division)
print(a // b) # 1 (integer division)
print(a % b) # 2 (modulus)
print(a ** b) # 125 (power)

# Bitwise
print(~a) # -6 (bitwise inversion)
print(a & b) # 1 (bitwise and)
print(a | b) # 7 (bitwise or)
print(a ^ b) # 6 (bitwise xor)
print(a << b) # 40 (bitwise left shift)
print(a >> b) # 0 (bitwise right shift)
## all operators appended with '=' makes it assignment operator

# Comparison operator
print(a == 5 and b == 5) # False (logical and)
print(a == 5 or b == 5) # True (logical or)
print(not (a == 5 or b == 5)) # False (logical not)

# Identity
print(a is b) # False (a and b are the same object)
print(a is not b) # True (a and b are different object)

# Membership operators
print(a in [1, 2, 3, 4, 5]) # True
print(a not in [1, 2, 3, 4, 5]) # False

Complex types

Besides the primitive types, python also has many classes predefined. Two of the most important classes among them are list and tuple. The type str discussed previously is also a subtype of the class list. The key difference among lists and tuples is that lists are mutable, and tuples are immutable. Here is a demonstration of both of them:

1
2
3
4
5
6
7
a_list = [1, 2, 3]
a_tuple = (1, 2, 3)

a_list[1] = 5
print(a_list) # [1, 5, 3]

a_tuple[1] = 5 # TypeError: 'tuple' object does not support item assignment

Conditionals

Like all other programming languages, python has conditionals like the following:

1
2
3
4
if 5 == 5:
    print('this will be always true')
else:
    print('We will never get here')

But unlike most languages, you can use short conditionals to get a value in one like

1
a = 5 if 15 % 3 == 0 else 10 # a = 5, because 15 is divisible by 3

Functions

Functions in python can receive several parameters and can return any value. They can also use variables declared in the global scope.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Global scope variable
is_divisible = False

# declaring function
def divide(val, div):
    global is_divisible # Telling that we are using a global variable
    if val % div == 0:
        is_divisible = True
        # Returning an int
        return val // div
    else:
        # Returning a tuple (int, int)
        return val // div, val % div

Loops

The while loops in python are similar to all other programming languages:

1
2
3
4
5
6
i = 0
while i <= 10:
    # printing the value i. end parameter is printed after printing the value, whose default is '\n'
    print(i, end=' ')
    i += 1
# 0 1 2 3 4 5 6 7 8 9 10

Python does not have a for loop with “initialization”, “condition” and “statement” like C, C++, Java, JS, and many other languages. The only way for loops in python to work is through a list or generator. Let us look at a list first.

1
2
3
4
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in a:
    print(i, end=' ')
# 0 1 2 3 4 5 6 7 8 9 10

Generators

Python has a special kind of data type named generator. A generator is like a function but keeps its state after returning something to resume execution for consecutive calls. Let us create a generator that yields ints from a starting value to an ending value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def range_generator(start, end):
    i = start
    while i != end:
        yield i
        i += 1

for i in range_generator(0, 5):
    print(i, end=' ')

# 0 1 2 3 4

Fortunately, Python provides a very helpful built-in generator function named range. It can be used to create many types of lists. Let us see some example:

1
2
3
4
5
6
7
8
9
# create a list of even numbers from 0 to 10
for i in range(0, 11, 2):
    print(i)
# 0, 2, 4, 6, 8, 10

# create a list of even numbers from 0 to 10 but reversed
for i in range(10, -1, -2):
    print(i)
# 10, 8, 6, 4, 2, 0

List comprehensions

Python has a list/tuple creator syntax, that can be used to create many complex types of lists with only one line. For this feature, python is known as ** one-liner programming language**.

1
2
3
list_of_evens = [i for i in range(0, 11, 2)]
list_of_another_evens = [i*2 for i in range(6)]
# Both are: 0, 2, 4, 6, 8, 10

Pattern Matching feature

Python has an awesome pattern matching feature that can be used to save a lot of time and effort while creating efficient understandable code. Let us see a demonstration of pythons pattern matching:

1
2
3
4
5
6
three_numbers = [1, 2, 3]

def add_three_numbers(a, b, c):
    return a + b + c

print(add_three_numbers(*three_numbers)) # 6

Here, the list three_numbers is expanded with the * operator, and then matched sequentially with a, b, and c. In future lessons, you will see a lot of functions like below:

1
2
3
4
def a_functions(param1, param2, *args, **kwargs):
    an_option_without_name = args[0]
    an_option_with_name = kwargs['an_option']
    # Rest of the body

Here, args and kwargs are not any specific keywords. You can use any other name if you want. It is just used by convention. If a function can evolve over time (with different release versions of the library), it is a good idea to keep the args and kwargs so later we can get any parameters in the newer version without breaking backward compatibility.

Also, another example of using args is shown with the summing function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def sum(*args):
    if not args:
        return 0
    total = args[0]
    # We will understand list slicing within a minute. The below syntax means that we are skipping the first index
    for i in args[1:]:
        total += i
    return total

print(sum(1, 2, 3, 4)) # 10

List slicing

We can slice a list/str pretty easily with the list slicing syntax. The syntax is like list[inclusive_start:exclusive_end:jump]. You will get the gist with the example below:

1
2
3
4
5
6
7
8
# Create a generator to get from 0 to 9.
# Then convert it into list
a = list(range(10)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

print(a[5:8]) # [5, 6, 7]
print(a[5:]) # [5, 6, 7, 8, 9]
print(a[:8]) # [0, 1, 2, 3, 4, 5, 6, 7]
print(a[3:8:2]) # [3, 5, 7], get every second element starting from index 3 till index 8

Remember, these are the bare-bone basics of the wonderful python language. This will give you a quick start with python if you have programming knowledge with any other programming language or you think you need to sharpen the basics of python before this series. There are a lot more than just these with python. But with this given knowledge, a simple google search will help you understand easily if you encounter a new syntax.

Setting up the Machine Learning environment

Now that you have become more confident to work with python, let us set up the environment for all the upcoming lessons. The best practice to do data science in python is to use a version manager. You can use anything like Anaconda, Virtualenv, or even the built-in pip if you are comfortable with it. A version manager is necessary because you may need a specific version of python to do a specific experiment with a package. If you directly install python to your machine, then you will not be able to use separate versions for your different experiments.

Go ahead, download, and install Anaconda. If you are low on resources, you can try the miniconda. After installing, add the bin folder from the installed directory into your PATH variable. You can follow the official guide from Microsoft to do so. After adding to PATH, verify that you can use it in the CMD. Open a CMD and type:

1
conda -V

If the command works, you have successfully installed conda. Anaconda is a python environment manager. It can manage different python versions and their libraries separately in different environments. Let us now create a python environment using anaconda with python version 3.8.

1
conda create -n newenv python=3.8

After that, you can activate the environment, so whenever you run a python file, it will be interpreted with the python interpreter from the newenv.

1
conda activate newenv

Now write a python file with print('hello world') and save it as main.py. Then run it with python main.py (remember to activate the newenv environment). You can see that it works and the python version used will be the python in your environment. Finally, let us install some useful packages that we will use throughout all the upcoming lessons. Type the following command in CMD.

1
conda install -n newenv numpy pandas jupyterlab

Let us dissect the command part by part. The first word conda is the executable name. Then we specify a command install indicating conda that we want to install something. Then we provide an argument -n with the value newenv. -n newenv indicates that we want to install something into the environment newenv. After that, we provide three packages numpy, pandas, and jupyterlab. Press Y where necessary and you have installed 3 new packages into your environment named newenv. Let’s test those packages now.

Jupyterlab (or jupyter notebook) is an awesome tool for data science and machine learning enthusiasts. It gives you so much flexibility that you will not want to leave that soon. Activate the environment and open a jupyterlab instance from CMD.

1
2
conda activate newenv
jupyter-lab

Your default browser will open with a localhost page like below:

jupyterlab

Press the “Python 3” button under the Notebook section and you will create a new notebook file in your current directory.

Now in the first cell, type the following python code:

1
2
3
4
import numpy as np
arr = [1, 2, 3, 4]

np.sum(arr)

Then press Shift+Enter to run the cell. You will get the sum of the list, 10.

jup

Now let us explain what you just did. Jupyterlab or Jupyter notebook creates python kernels that run for each notebook. Notebooks are file formats (written in JSON) that use python kernels to run. When you run a notebook cell in jupyter notebook, the state of the cell is saved in the python kernel. That means all the variables and functions declared there will be available throughout the notebook.

For the rest of the series, it is up to you to decide which method to use for your data science career. We will only discuss the python codes you need. You can run that code in a notebook or as a file.

Conclusion

In this lesson, we warmed up ourselves with python and set up our environment for Machine Learning. From the next lesson, we will step into the field of ML. Until then, I will recommend you to play around with python, anaconda, and jupyter notebook.

Share on

Rahat Zaman
WRITTEN BY
Rahat Zaman
Graduate Research Assistant, School of Computing