Hello. Today, we'll talk about Deep learning. It's an artificial intelligence field which has exploded in recent years. Perhaps you've already heard about it if you've seen my video on Go, or, a few months ago, these psychedelic images published by Google supposedly depicting a Deep learning algorithm dreaming. We'll see today what Deep learning is and what's its position in comparison to the other AI fields. We've been talking about AI for more than 50 years now. The history of this field is a bit tumultuous. It started with a very enthusiastic period when we thought we'd quickly manage to do incredible things, it was then followed by a period of disillusionment when we understood we had underestimated the difficulty. I won't tell you the whole history but you need to know that in the 90s, after the disillusionment, there was a revival of AI under the form of what's called Machine learning. Machine learning is a field that studies how algorithms can learn by studying examples. We'll see what that means. And Deep learning is just a specific way to do Machine learning. I'll tell you how this way is original and why it's exploded lately. To understand how computers and algorithms can learn to do things, we'll take a simple example. Say you are a botanist and you're interested in a tree species. You walk in the forest and you take notes. For each tree you meet, you write down its height and diameter and you gather all your notes in a data table. Here's the table, each line is a different tree whose diameter and height were measured. With this kind of data, we can do a simple thing, we can create a graph. The abscissa is the diameter, the ordinate is the height and each dot is a tree you measured. Here's what we get. With your human eyes and brain, you notice something with these data: all the dots are more or less aligned and can be linked by a straight line. Drawing a straight line is a good idea because we can generalize; so we have several of notes on specific cases, the trees we measured, and we deduce a general relationship, this straight line. Thanks to this link we discovered in the data, we can do predictions. It means that if I take a new tree from the same species that hasn't been measured yet, and I give you its diameter, you can estimate its height thanks to this straight line. The word 'prediction' is perhaps ill-chosen since it's not about predicting the future. It's about guessing a value we don't have. So, to recap, we take data, particular observations, we discover a link between these data, and with this link, we can generalize, make predictions. Here are all the components of Machine learning. Machine learning is about trying to do the same with algorithms. Say we take the same data and give them to a computer. We can imagine an algorithm which will try like us, to draw a straight line through the dots. A straight line is defined by two things: a slope, a, and an intercept, b. The classic equation y = ax+b. We can imagine an algorithm which varies a and b until it finds a straight line that matches the point cloud. We often imagine a and b, the parameters of the straight line, as two knobs the algorithm can turn until it finds the best straight line. Once this is done, we leave the knobs alone and we have a line that allows an algorithm to generalize, to extrapolate to cases it's never seen. It just needs the straight line equation. So this is Machine learning, basically. We have data, here an input x and an output y, shown to an algorithm that turns its knobs until it understands the link between x and y. It's the learning phase. After that, we can make predictions. It's the ultimate goal of a Machine learning algorithm, to be able to make predictions, extrapolations. Our example was really simple. We have only one number x as input, the tree's diameter, as output, only one number y, its height, and the link between the two is simple, it's a simple straight line. But in real Machine learning problems, there can be more complicated relationships, and above all, there can be more than one input data. We'll take some examples of Machine learning applications. You've certainly noticed that your Facebook news feed doesn't show you all the news for your friends or pages to which you subscribed. It only chooses some and displays them in a certain order. Behind that, there is a Machine learning algorithm which tries to understand what news might interest you the most. This algorithm takes as input many features of the news, who posted it, when, what it is about, its type, etc., and it tries to predict if you'll be interested or not. Another example of Machine learning use is Internet fraud prevention. Each time you use your debit card on the Internet, somewhere, an algorithm is running and tries to figure out if it's you or if it's a fraud. To do so, it uses input data, the nature of the transaction, its location, its amount, and many other similar data. Another example of Machine learning use we'll talk about today is image recognition. Image recognition consists in trying to create an algorithm that takes an image as input data, and tries to guess what the image represents as output. You may have realized that for these examples, my ax+b line and its two knobs would be largely insufficient. We need something that can take many input data and build relationships much more complex between the inputs and the output. This is where neural networks step in. A neuron will play the same role as my straight line earlier. It's a mathematical function that will link inputs x with an output y. It should be made clear that we're talking about an artificial neuron, it's a mathematical construct that imitates approximately the way a real neuron functions. Real biological neurons are cells found in our nervous system, connected to each other. Each neuron has a terminal, an axon with which the neuron can send a signal to other neurons. This is how a neuron functions: the neuron receives, or not, an electrical signal from other neurons, and depending on the signal, it has two options: either it doesn't send anything along its axon, or it sends an electrical signal and in this case, it's called discharging. The idea with the artificial neuron, which dates back to several decades, is to imitate that behavior with a mathematical function. Here's how it works. Say we have an artificial neuron with three inputs X1, X2 and X3. We add up the three inputs by giving a coefficient to each, it's called a weight. If the total is above a certain threshold, the neuron will send 1 as output, else it'll send 0. An artificial neuron is a mathematical function which takes x's as input and gives a y as output. This function has knobs we can turn, the weight and the threshold, a bit like when we could modify a and b coefficients on the straight line. But a neuron alone is not enough to create complex relationships, but what's interesting is that we can connect many neurons and pile them up to create more complex functions. These are called neural networks. We should say artificial neural networks. By piling up neurons, we can create functions as complex as we want with many inputs and outputs and as many knobs to turn as there are weights and thresholds in the network. These neural networks are very versatile. They can be adapted to many inputs and outputs types, but, basically, we use them the same way as my straight line earlier. We take a neural network. We show it a database with inputs and outputs examples. We turn the know until it makes the correct connection between the inputs and the outputs. Let me remind you this is called the learning phase. Once we've done that, our network is trained and able to predict the output if we show it a new input, it's the prediction phase. One of the neural networks' disadvatanges compared to my straight line earlier is that it's what we sometimes call a black box, i.e. once we've found all the knobs positions, we have a function that is mathematically a bit complex and difficult to interpret. Generally, as long as the neural network gives the right answer, we're happy. By the way, it seems that the learning phase, for artificial neural networks, is quite similar to what actually happens in our brain. When we learn things, the strength of the connections between our neurons changes, that's called synaptic plasticity. We can compare the way it works with how we play on the weights in our artificial neural network. Again, the aim of neural networks is not to create a brain model. It's just a mathematical construct inspired by it. So, this was theory. Now, practice. When we want to create a neural network, we pile up many of this basic unit called artificial neuron. But how many of these do we need to put into our network? If we look at our brain, in our brain, there are around 100 billion neurons. Since there can be several thousands of synapses for each neuron, we have about a million billion connections. Obviously, we won't be able to do that with a neural network. Without going so far, as soon as we put too many neurons in our network, there are too many knobs to turn and it becomes really difficult to find the right positions, the learning phase becomes practically impossible. That's why, often, in practice, we only have a very simple structure with three neural layers: the input neurons, x, the output neuron or neurons, y, and an intermediate neural layer. The more neurons in the intermediate layer, the more powerful and versatile the network is a priori, but it gets harder to train it and to find the right know positions. If we try to add more layers, we soon get lost in the complexity of the learning phase. For problems with a reasonable amount of input data, for instance for Internet fraud prevention, this kind of method can work well. But for image recognition, limits are soon reached. If we take a small photo, say 400 x 400, we thus have 160,000 pixels, so 160,000 numbers to give as input to the network. And we end with an extremely complex network we won't be able to train. So, giving an untreated image to a neural network doesn't work. Fortunately, there is a method which works: making intermediate features for the image. Say we're interested in a problem, consisting in recognizing vehicles. You want to make an algorithm, which is given an image and tells us if this image represents a car, a bus, a train, a bike, etc. For us, human beings, there are several features that allow us to see the difference. We can, for instance, count the number of visible wheels, examine the width height ratio, the main color, the amount of glazed surface, the number of panes, their shape, etc. To do image recognition in this kind of situation, the solution is to create an intermediate algorithm which won't be a neural network and whose aim will be to analyze the images and to extract the interesting features. We'll then give those features to a neural network to do recognition. This approach can work because the amount of input data we'll give the network will be much smaller than what we'd have if we gave it the untreated image. To summarize the image, there will be a few tens of features, a few hundreds at the most. Since we've summarized the image by reducing it to a list of essential features, we sometimes call it image abstraction. That's all well, but there is a small problem. The recognition quality will heavily depend on the way we did this intermediate construct of the image essential features. And to do this properly, we need to know what we're talking about. In my example with the vehicles, we need to know that what's important to differentiate a vehicle from another is the number of wheels, panes, etc. So we need a 'specialist' of the field. So it's the creator of the intermediate algorithm who actually does most of the intelligence work. It's to get around that that we use Deep learning. We've seen how classic Machine learning works and how we would use it to do image recognition, provided that we have a previous step, a step to construct the image essential features that we could then give to a neural network. With Deep learning, the idea is to skip this step, it sounds a bit crazy. Basically, we'll create a big network with many neural layers and we'll give it the untreated image. That's called a deep network, hence the name Deep learning. I've told you that a priori this idea cannot work. Too many neurons, too many knobs to turn and the learning phase doesn't work. Well, there are still people who wanted to try. One of these pioneers is French, his name is Yann LeCun, he started to use these methods back in the 90s. At the time, the AI community was not at all convinced and had other interests. During two decades, only a few people in the world kept trying to do Deep learning. And in 2012, there was an explosion. Every year, there's an image recognition contest between the best algorithms of the world. Here you can see the percentage errors of the different algorithms in the 2010 and 2011 contests. The lower the percentage, the better the algorithm. In 2012, to everyone's surprise, it's a Deep learning algorithm that won by a wide margin. And since the following year, everyone started to use Deep learning in this contest. What's interesting is to try to understand why, suddenly, a Deep learning algorithm works better than the usual methods when, for 20 years, everyone thought it would never work. We've said it, in the usual methods, an algorithm extracts from the image the essential features that summarize the image under the form of concepts. And those features, we usually give them to a shallow network. What's amazing with Deep learning is that if we manage to properly train the network, we realize that the upper layers contain those essential features, all the important components of the image to be able to recognize it. So the algorithm created them itself, it discovered them without needing us do the work for it. With the example of vehicle recognition, it means we wouldn't have to explain to the algorithm what a wheel is and that a wheel is important to recognize a vehicle. The algorithm would discover without any help the concept of wheel and its importance to classify vehicles. I've told you, after the remarkable success of a Deep learning algorithm in the 2012 image recognition contest, everyone stated to use Deep learning. And Yann LeCun, who had remained in the background, suddenly became an AI superstar. Of course, everyone started to hire Deep learning specialists: Google, Amazon, Baidu, etc. And of course, Facebook, that hired LeCun as director of their AI laboratory. There is something I haven't mentioned yet. Why did this method nobody believed in a few years ago suddenly start to work? As it often happens, there are several reasons. First, algorithms were improved, and, contrary to what I could have led you to believe, we don't use any deep network. It's not just a pile of neurons and we have particular structures that work well. There's also a hardware-related reason, the computing power, especially the development of the graphics cards processors, GPUs are more powerful than traditional processors when it comes to manipulating images. But the real reason of the Deep learning algorithm success, especially regarding image recognition, has been data availability. When we create a deep network, we can end up with thousands or even millions of neurons so millions of knobs to turn during the learning phase, until you find the right relationship. To do so, you need millions of examples to show to the network. In 2009, a Stanford laboratory released a database called ImageNet which today contains more than 15 millions of classified images. So we have images, and next to each, a description of what it is. So a car, a dog, a wave, etc. There are more than 10,000 different categories. You can try it, and search in ImageNet for instance for cat images, and I think there are several tens of thousands. And when we give all those images to a Deep learning algorithm, we don't need to tell it which are the essential features that make a cat a cat: head shape, the eye color, the ear height, etc. The algorithm will find out on its own. Today, we can make deep networks with more than a hundred layers and several millions of neurons. We're far from the human brain, but image recognition performance is still quite impressive. Here are some examples. On Facebook, every day, around 800 million images are uploaded, so it's easy to understand why they're interested in having an algorithm that can quickly recognize what an image depicts. Sometimes, it fails. There are Deep learning applications other than the analyze of the graphic details of our private life we share on social networks. For instance, there are algorithms that can analyze what's depicted on an image and automatically create a one-sentence description. It can be useful for partially sighted people. Another application is autonomous driving, the self-driving car we've heard about a lot recently. To cite an example from Fei Fei Li, one of the ImageNet pioneers, it's very important for a self-driving car to be able to differentiate a crumpled paper bag on the road that it can run over from a rock in the middle of the road that should be avoided. The most interesting Deep learning application, for me, is the ability to create images. I've told you earlier, a Deep learning algorithm, a deep network, if you give it an untreated image as input, it'll find out on its own how this untreated image can be summarized by several essential features. And there, to simplify, we can try to use the network the other way around so that if it's given a series of numbers as input, it can create an image as output. This image will be new, unique, and will have the features corresponding to what we entered. But this image will have been entirely made up. Algorithms that can do that are what we call generative models. For instance, here's a recent example which is quite remarkable. A generative model created by a Deep learning algorithm was used to create bedrooms. Here's a series of bedrooms invented by an algorithm. And it also works with chairs or manga characters. My favorite example is this one. These are album covers that were made up by a Deep learning algorithm which had previously studied real album covers. The most media-covered example, even if not the most useful, is the one I talked about in the beginning: Google Deep Dream. It's a network that's been trained on a lot of images and that was then used to create images that are a bit psychedelic. The principle is the same as when we try to find shapes in the clouds. When we play at observing the clouds, basically, we can try to convince ourselves to see shapes in the clouds. Here, we gave images to a deep network and forced it to see something else. Here's the result. Nice, isn't it? Again, this example is anecdotal, but I hope you're convinced that there are many fantastic applications for Deep learning. And I'm sure that, a few years from now, this kind of algorithm will be everywhere. Thanks for watching this video. If you want to go further, I recommend 2 other videos. The first one is Fei Fei Li, one of the ImageNet pioneers, who introduced, at a Ted conference, the issue of image recognition and what we can do today. It's quite short. If you want to go even further, you can go watch the inaugural lesson of Yann LeCun in the College de France. It's longer and he goes into detail. As usual, you can find me on social networks: Facebook, Twitter. You can support me on Tipeee. Thank to all the tippers who support me. You can also go and have a look at my book published by Flammarion. And if you want to meet me, I'll be at Vulgarizators in Lyon on the 16th of April. Do not hesitate to come and to attend all the conferences. Thanks, see you soon.