Machine Learning 1.0

Some of the very well-known vendors that use and develop a large number of products and solutions are:

To understand how “Machine Learning” works, it is necessary to know how a machine can identify handwritten digits. The method is based on the technology of 80’s and 90’s. Though what is shown below is not the most modern day machines, it’s easy to understand the bases of this “Sci-Fi” technology style.

How can AI-powered customer service support telecom operators to develop a better relationship with their clients? Learn more on our blog: Welcome To The Future – AI Customer Service.

We are going to split this topic in two:

  • How a learning machine identifies one input and reflects the output
  • How a learning machine learns how to identify the many ways to write digits

How Machine Learning Works

The challenge is to train a machine to be able to identify the different ways to write a number. For example, number 3 could have these “versions”:

We can easily get the idea of digit 3 from these draws. But, how can we do it using a basic matrix of 28×28 pixels in which each pixel is illuminated in a different level? A code could identify the same idea as our brain:

 The approach I’m using applies different layers to recognize the digit.


In this level, each pixel would be a “neuron” (here comes the neural networks name), a value between 0 to 1. Here, one is entirely black, and 1 is fully white, 0.5 would be a gray. This number will be the activation value. A high value would activate the neuron, and a low value would deactivate it. Since we are using 28×28 matrix, we will have 784 inputs, each one with its activation value.

These 784 inputs interact with the second layer of neurons. Then it determinates the activation or deactivation of this second layer.


In this approach, we are going to use two hidden layers, each one with 16 neurons. The input layer will affect the first of the hidden layers. The result if the activation or deactivation of each one of its neurons or patterns.

Then, the 2nd hidden layer will affect the 3rd layer (2nd hidden one). This process will determine the state of each 3rd layer neurons.

The goal of the hidden layers is to process the input to determine which will be the most likely neuron of the resulting layer (4th layer).


This is the last one. The 3rd layer will affect it and, in our case, will also determine the digit that machine “thinks” is the one written at the input layer.

The interaction between these four layers define the structure of the machine:

Now let’s try to explain how each hidden layer determinates the pattern of the next layer.

We identify each digit for some common patterns or shapes associated with an idea.

For example, for number nine our brain pictures a loop followed by a vertical line:

In the 3rd layer, we associate each neuron with one of these shapes. We expect that this layer can identify which of these elements are the one that 2nd layer “wants” to push and activate proper neuron:

Based on the combination of these values, it determinates which value will have each neuron of the resulting layer. Hopefully, it will be the digit we want it to recognize.

Before, these shapes need to be identified or “activated,” this is the function of 2nd layer.

Edges form each of the shapes of the 3rd layer. The combination of these edges activation values will determinate which are the shapes values represented by neurons of 3rd layer.

In summary, the Input layer will provide 784 values as a result of the original trace. In the 2nd layer, it will define which are the most likely edges. In the 3rd layer, it will determine the shapes. Finally, the resulting layer will select the expected digit.

Network Interaction

How do these layers interact between them and determine the value of each neuron? Each neuron of a specific layer is affected by each neuron of the above layer. In the picture below we are trying to determine the activation value of the neuron that represents “-“ edge.

To do so, each value of the neurons coming from input layer will be multiplied by a weight. For those neurons (or pixels in case of input layer) that are part of the edge, the weight will be high, and for those far away from it, it should be low.

The sum of these values will be stretched to a value from 0 to 1, by using a mathematical model: 

This process is repeated by each neuron of the 2nd layer to activate the edges of the original trace.

The same logic is applied in the 3rd and 4th layer. In the first case, each neuron will represent a shape and will assign a higher weight to the edges that conform it. The sum of all these values will determinate its activation value.

If we calculate the number of weights to be used in all interactions:


The Learning Part

How are these weights calculated? How are they “learned” by the machine?

The method is same as our brain: training. We send input and also the output that should be selected. The machine compares the result and adjusts their weights to increase its performance.

After a time of training, it does an evaluation process. Here several unknown traces are entered in the machine. Then, it tests the result. The best outcome will determine the accuracy of the machine.

MNIST Database has published thousands of different traces and the values that represent to use them in the training and evaluation process.


This is a complicated and in-depth topic. Here I’m only trying to illustrate how Machine Learning works and push to go deeper and try to discover new business opportunities.

From SIRI to self-driving cars, Artificial Intelligence (AI) is progressing and can be anything from Google’s search algorithms to IBM’s Watson. Keep reading to get a fully understand of AI’s future.

You may also like

Jupyter Notebooks

Jupyter Notebooks, Gain Insights From Your Data

Don’t be naive with Naive Bayes