Hello.
Today, we'll talk about Deep learning.
It's an artificial
intelligence field which
has exploded in recent years.
Perhaps you've already heard about it
if you've seen my video on Go,
or, a few months ago,
these psychedelic images
published by Google
supposedly depicting a
Deep learning algorithm dreaming.
We'll see today what Deep learning is
and what's its position in
comparison to the other AI fields.
We've been talking about AI
for more than 50 years now.
The history of this field is a bit tumultuous.
It started with a
very enthusiastic period
when we thought we'd quickly manage
to do incredible things,
it was then followed by a
period of disillusionment
when we understood we had
underestimated the difficulty.
I won't tell you the whole history
but you need to know
that in the 90s,
after the disillusionment,
there was a revival of AI
under the form of what's
called Machine learning.
Machine learning is a field that studies
how algorithms can learn
by studying examples.
We'll see what that means.
And Deep learning
is just a specific way
to do Machine learning.
I'll tell you how
this way is original
and why it's exploded lately.
To understand how
computers and algorithms
can learn to do things,
we'll take a simple example.
Say you are a botanist and you're
interested in a tree species.
You walk in the forest
and you take notes.
For each tree you meet, you
write down its height and diameter
and you gather all
your notes in a data table.
Here's the table, each line
is a different tree whose
diameter and height were measured.
With this kind of data,
we can do a simple thing,
we can create a graph.
The abscissa is the diameter,
the ordinate is the height
and each dot is a tree you measured.
Here's what we get.
With your human eyes and brain,
you notice something with these data:
all the dots are
more or less aligned
and can be linked by a straight line.
Drawing a straight line is a good
idea because we can generalize;
so we have several of notes on
specific cases, the trees we measured,
and we deduce a general
relationship, this straight line.
Thanks to this link
we discovered in the data,
we can do predictions.
It means that if I take a
new tree from the same species
that hasn't been measured yet,
and I give you its diameter,
you can estimate its height
thanks to this straight line.
The word 'prediction' is
perhaps ill-chosen
since it's not about predicting the future.
It's about guessing a
value we don't have.
So, to recap,
we take data, particular observations,
we discover a link between these data,
and with this link, we can
generalize, make predictions.
Here are all the
components of Machine learning.
Machine learning is about trying to
do the same with algorithms.
Say we take the same data and
give them to a computer.
We can imagine an
algorithm which will try
like us, to draw a
straight line through the dots.
A straight line is defined
by two things:
a slope, a, and an intercept, b.
The classic equation y = ax+b.
We can imagine an algorithm
which varies a and b
until it finds a straight line
that matches the point cloud.
We often imagine a and b,
the parameters of the straight line,
as two knobs
the algorithm can turn
until it finds the best straight line.
Once this is done,
we leave the knobs alone
and we have a line that
allows an algorithm to generalize,
to extrapolate to cases
it's never seen.
It just needs the straight line equation.
So this is Machine learning, basically.
We have data, here an input
x and an output y,
shown to an algorithm that
turns its knobs
until it understands the
link between x and y.
It's the learning phase.
After that, we can make predictions.
It's the ultimate goal of a
Machine learning algorithm, to be able
to make predictions, extrapolations.
Our example was really simple.
We have only one number x as input,
the tree's diameter, as output,
only one number y, its height, and
the link between the two is simple,
it's a simple straight line.
But in real Machine
learning problems,
there can be more
complicated relationships,
and above all, there can be
more than one input data.
We'll take some examples of
Machine learning applications.
You've certainly noticed
that your Facebook news feed
doesn't show you all
the news for your friends
or pages to which you subscribed.
It only chooses some and
displays them in a certain order.
Behind that, there is a
Machine learning algorithm
which tries to
understand what news
might interest you the most.
This algorithm takes as input
many features of the news,
who posted it, when,
what it is about, its type, etc.,
and it tries to predict if
you'll be interested or not.
Another example of
Machine learning use
is Internet fraud prevention.
Each time you use your
debit card on the Internet,
somewhere, an algorithm is running
and tries to figure out if it's
you or if it's a fraud.
To do so, it uses input data,
the nature of the transaction,
its location, its amount,
and many other similar data.
Another example of
Machine learning use
we'll talk about today
is image recognition.
Image recognition consists in
trying to create an algorithm
that takes an image as input data,
and tries to guess
what the image represents as output.
You may have realized that
for these examples,
my ax+b line and its two knobs
would be largely insufficient.
We need something that can
take many input data
and build relationships much
more complex
between the inputs and the output.
This is where neural
networks step in.
A neuron will play the
same role as
my straight line earlier.
It's a mathematical function
that will link inputs x
with an output y.
It should be made clear that
we're talking about an artificial neuron,
it's a mathematical construct
that imitates approximately
the way a real neuron functions.
Real biological neurons are
cells found in
our nervous system,
connected to each other.
Each neuron has a
terminal, an axon
with which the neuron can
send a signal to other neurons.
This is how a neuron functions:
the neuron receives, or not,
an electrical signal from other neurons,
and depending on the signal,
it has two options:
either it doesn't send anything along its axon,
or it sends an electrical signal
and in this case, it's called discharging.
The idea with the artificial neuron,
which dates back to several decades,
is to imitate that behavior
with a mathematical function.
Here's how it works.
Say we have an artificial neuron
with three inputs
X1, X2 and X3.
We add up the three inputs
by giving a coefficient to each,
it's called a weight.
If the total is above a certain
threshold, the neuron will send 1 as output,
else it'll send 0.
An artificial neuron is a mathematical
function which takes x's as input
and gives a y as output.
This function has knobs we can turn,
the weight and the threshold,
a bit like when we could modify
a and b coefficients on the straight line.
But a neuron alone is
not enough to create
complex relationships,
but what's interesting is
that we can connect many
neurons and pile them up
to create more
complex functions.
These are called neural networks.
We should say artificial
neural networks.
By piling up neurons,
we can create functions as
complex as we want with many
inputs and outputs
and as many knobs to turn as there are
weights and thresholds in the network.
These neural networks
are very versatile.
They can be adapted to many
inputs and outputs types,
but, basically, we use
them the same way as
my straight line earlier.
We take a neural network.
We show it a database with
inputs and outputs examples.
We turn the know until it
makes the correct connection
between the inputs and the outputs.
Let me remind you this is
called the learning phase.
Once we've done that, our network is
trained and able to predict
the output if we show it a new
input, it's the prediction phase.
One of the neural networks' disadvatanges
compared to my straight line earlier
is that it's what we
sometimes call a black box,
i.e. once we've found all
the knobs positions,
we have a function
that is mathematically
a bit complex and
difficult to interpret.
Generally, as long as the neural network
gives the right answer,
we're happy.
By the way, it seems that
the learning phase, for
artificial neural networks,
is quite similar to
what actually happens in our brain.
When we learn things, the strength
of the connections between our neurons changes,
that's called synaptic plasticity.
We can compare the way it works
with how we play on the weights
in our artificial neural network.
Again, the aim of neural networks
is not to create a brain model.
It's just a mathematical
construct inspired by it.
So, this was theory.
Now, practice.
When we want to
create a neural network,
we pile up many of this basic
unit called artificial neuron.
But how many of these do we
need to put into our network?
If we look at our brain,
in our brain,
there are around 100 billion neurons.
Since there can be several
thousands of synapses for each neuron,
we have about a million billion connections.
Obviously, we won't be able to
do that with a neural network.
Without going so far, as soon as
we put too many neurons in our network,
there are too many knobs to turn
and it becomes really difficult
to find the right positions,
the learning phase becomes
practically impossible.
That's why, often, in practice,
we only have a very simple structure
with three neural layers:
the input neurons, x, the
output neuron or neurons, y,
and an intermediate neural layer.
The more neurons in
the intermediate layer,
the more powerful and
versatile the network is a priori,
but it gets harder to train it
and to find the right know positions.
If we try to add more layers,
we soon get lost
in the complexity of the learning phase.
For problems with a reasonable
amount of input data,
for instance for
Internet fraud prevention,
this kind of method can work well.
But for image recognition,
limits are soon reached.
If we take a small photo, say
400 x 400, we thus have 160,000 pixels,
so 160,000 numbers to give
as input to the network.
And we end with an
extremely complex network
we won't be able to train.
So, giving an untreated image to
a neural network doesn't work.
Fortunately, there is a
method which works:
making intermediate
features for the image.
Say we're interested in a problem,
consisting in recognizing vehicles.
You want to make an algorithm,
which is given an image
and tells us if
this image represents
a car, a bus, a
train, a bike, etc.
For us, human beings,
there are several features
that allow us to see the difference.
We can, for instance, count
the number of visible wheels,
examine the width height ratio,
the main color,
the amount of glazed surface,
the number of panes, their shape, etc.
To do image recognition in
this kind of situation,
the solution is to create
an intermediate algorithm which
won't be a neural network and
whose aim will be to analyze
the images and to
extract the interesting features.
We'll then give
those features
to a neural network to
do recognition.
This approach can work
because the amount of input data
we'll give the network
will be much smaller
than what we'd have if we gave
it the untreated image.
To summarize the image,
there will be a few tens of features,
a few hundreds at the most.
Since we've summarized the image by
reducing it to a list of essential features,
we sometimes call it image abstraction.
That's all well, but
there is a small problem.
The recognition quality will
heavily depend on
the way we did this
intermediate construct
of the image essential features.
And to do this properly, we need
to know what we're talking about.
In my example with the vehicles,
we need to know that what's important
to differentiate a vehicle from another
is the number of wheels, panes, etc.
So we need a 'specialist' of the field.
So it's the creator
of the intermediate algorithm
who actually does most
of the intelligence work.
It's to get around that
that we use Deep learning.
We've seen how classic
Machine learning works
and how we would use it to
do image recognition,
provided that we have a previous
step, a step to construct
the image essential features
that we could then
give to a neural network.
With Deep learning, the idea is
to skip this step, it sounds a bit crazy.
Basically, we'll create a big
network with many neural layers
and we'll give it the untreated image.
That's called a deep network,
hence the name Deep learning.
I've told you that a priori
this idea cannot work.
Too many neurons, too
many knobs to turn
and the learning phase doesn't work.
Well, there are still
people who wanted to try.
One of these pioneers is
French, his name is Yann LeCun,
he started to use these
methods back in the 90s.
At the time, the AI community
was not at all convinced
and had other interests.
During two decades,
only a few people
in the world kept trying
to do Deep learning.
And in 2012, there was an explosion.
Every year,
there's an image
recognition contest
between the best
algorithms of the world.
Here you can see the
percentage errors of
the different algorithms
in the 2010 and 2011 contests.
The lower the percentage,
the better the algorithm.
In 2012, to everyone's surprise,
it's a Deep learning algorithm
that won by a wide margin.
And since the following year,
everyone started to use
Deep learning in this contest.
What's interesting is to try
to understand why, suddenly,
a Deep learning algorithm works
better than the usual methods
when, for 20 years, everyone
thought it would never work.
We've said it, in the usual methods,
an algorithm extracts from the
image the essential features
that summarize the image
under the form of concepts.
And those features, we usually
give them to a shallow network.
What's amazing with Deep
learning is that if
we manage to properly train
the network, we realize that
the upper layers contain
those essential features,
all the important components of
the image to be able to recognize it.
So the algorithm created them itself,
it discovered them without
needing us do the work for it.
With the example
of vehicle recognition,
it means we wouldn't have
to explain to the algorithm
what a wheel is and
that a wheel is
important to
recognize a vehicle.
The algorithm would discover
without any help the concept of wheel
and its importance to
classify vehicles.
I've told you, after the remarkable
success of a Deep learning algorithm
in the 2012 image
recognition contest,
everyone stated to use Deep learning.
And Yann LeCun, who had
remained in the background,
suddenly became an AI superstar.
Of course, everyone started to
hire Deep learning specialists:
Google, Amazon, Baidu, etc.
And of course, Facebook, that hired LeCun
as director of their AI laboratory.
There is something
I haven't mentioned yet.
Why did this method
nobody believed in
a few years ago
suddenly start to work?
As it often happens,
there are several reasons.
First, algorithms were improved,
and, contrary to what
I could have led you to believe,
we don't use any deep network.
It's not just a pile of neurons
and we have particular
structures that work well.
There's also a hardware-related reason,
the computing power,
especially the development
of the graphics cards processors,
GPUs are more powerful than
traditional processors
when it comes to manipulating images.
But the real reason of the
Deep learning algorithm success,
especially regarding image recognition,
has been data availability.
When we create a deep network,
we can end up with thousands
or even millions of neurons
so millions of knobs to turn
during the learning phase,
until you find the right relationship.
To do so, you need millions of
examples to show to the network.
In 2009, a Stanford laboratory released
a database called ImageNet
which today contains more than
15 millions of classified images.
So we have images, and next to each,
a description of what it is.
So a car, a dog, a wave, etc.
There are more than 10,000 different categories.
You can try it, and search in
ImageNet for instance
for cat images, and I think
there are several tens of thousands.
And when we give all those images
to a Deep learning algorithm,
we don't need to tell it
which are the essential features that
make a cat a cat:
head shape, the eye color, the ear height, etc.
The algorithm will find out on its own.
Today, we can make deep networks with more
than a hundred layers and
several millions of neurons.
We're far from the human brain, but
image recognition performance
is still quite impressive.
Here are some examples.
On Facebook, every day,
around 800 million images
are uploaded, so it's easy
to understand why they're interested
in having an algorithm that can
quickly recognize what
an image depicts.
Sometimes, it fails.
There are Deep learning
applications other than
the analyze of the
graphic details of our private life
we share on social networks.
For instance, there are algorithms
that can analyze what's depicted
on an image and automatically
create a one-sentence description.
It can be useful for
partially sighted people.
Another application is autonomous driving,
the self-driving car we've
heard about a lot recently.
To cite an example from Fei Fei Li,
one of the ImageNet pioneers,
it's very important for
a self-driving car to be able to
differentiate a crumpled paper bag
on the road that it can run over
from a rock in the middle of the road
that should be avoided.
The most interesting Deep
learning application, for me,
is the ability to create images.
I've told you earlier,
a Deep learning algorithm,
a deep network, if you give
it an untreated image as input,
it'll find out on its
own how this untreated image
can be summarized by
several essential features.
And there, to simplify,
we can try to use the network
the other way around so that
if it's given a series of numbers as input,
it can create an image as output.
This image will be
new, unique,
and will have the features
corresponding to what we entered.
But this image will have
been entirely made up.
Algorithms that can do that are
what we call generative models.
For instance, here's a recent
example which is quite remarkable.
A generative model created by
a Deep learning algorithm was used
to create bedrooms.
Here's a series of bedrooms
invented by an algorithm.
And it also works with
chairs or manga characters.
My favorite example is this one.
These are album covers that
were made up by a Deep learning
algorithm which had previously
studied real album covers.
The most media-covered example,
even if not the most useful,
is the one I talked about in
the beginning: Google Deep Dream.
It's a network that's been
trained on a lot of images
and that was then used to create
images that are a bit psychedelic.
The principle is the same
as when we try to
find shapes in the clouds.
When we play at
observing the clouds, basically,
we can try to convince ourselves
to see shapes in the clouds.
Here, we gave images to a deep network
and forced it to see something else.
Here's the result. Nice, isn't it?
Again, this example is anecdotal,
but I hope you're convinced that
there are many fantastic
applications for Deep learning.
And I'm sure that,
a few years from now,
this kind of algorithm will be everywhere.
Thanks for watching this video.
If you want to go further,
I recommend 2 other videos.
The first one is Fei Fei Li,
one of the ImageNet pioneers,
who introduced, at a Ted conference,
the issue of image recognition
and what we can do today.
It's quite short.
If you want to go even further,
you can go watch the inaugural lesson
of Yann LeCun in the College de France.
It's longer and he goes into detail.
As usual, you can find me
on social networks: Facebook, Twitter.
You can support me on Tipeee.
Thank to all the tippers
who support me.
You can also go and have a look
at my book published by Flammarion.
And if you want to meet me,
I'll be at Vulgarizators in Lyon
on the 16th of April.
Do not hesitate to come and to
attend all the conferences.
Thanks, see you soon.