Hello everybody, and welcome to this third sequence on binary coding. We saw earlier how to represent information with bits, whether it is text, numbers, pictures and sounds. Now, let's talk about data compressing. Why compress data? Let's say we have a Full HD movie on a DVD for example, for now, let's just take the video into consideration, not the sound. So a 2-hour video is actually 7,200 seconds. There are generally 24 images per second, because the eye needs to be under the impression that the images display smoothly. So there will be roughly 170,000 images. If the video is Full HD, each image is made of 1920 pixels in width, and 1080 in height, so approximately 2 million pixels per image. Finally, every pixel, as we saw in the previous sequence, will generally be made of 24 bits if the pixels are coloured, 8 bits for every colour element. So, all in all, we will have 8,000 billion bits used to store the entire data for this video. If we make a quick calculation, that actually makes one terabyte. One terabyte is huge. If you use your ADSL broadband access, it will generally take you about 100 hours to download one terabyte, because the ADSL broadband access only has a speed range of 10 to 20 megabytes. So if you want to do video on demand for example, it will get complicated, because if you need 100 hours to get the film home, you will have to wait a long time, so it won't really be "on demand". So the question is: technically, how do we how do we store all this information, the movie's terabyte, so that it can be stored on a DVD, because a DVD can store up to a few gigabytes, no more, and so that it can be downloaded as a video on demand? There are several steps, first, we need to compress the images. Images sometimes have redundant areas. I put up a technical drawing here, on the right of this presentation, on this picture, we can see that some areas are identical. For example, those 2 areas here are crossed out exactly the same way, some A letters are drawn the same way, some lines are identical too, with the same thickness, etc. There is a lot of redundant information, so it would be a shame to store the same information several times. So when the data is going to be stored, instead of storing every pixel, the data will be stored and repeated. So it will be like "let's store the data about the black and white crossings in the bottom, and let's reuse it later, without duplicating it". That will reduce the image's size significantly because redundant information will only be stored once. This is done on very common image formats, like GIF or PNG for example, which are often used on the Web. Those compression techniques can very well be applied on other things, like text, for example. If the same word is repeated several times, it can be stored once and the other occurrences will take the first storage as a reference. It can also be used for sound, if the same chorus is repeated in the exact same way within a tune, we may be able to store it only once. Now let's talk about photo compressing. Previously, we rather talked about technical drawings containing redundant elements. It's more complicated for photos because photos generally contain a set of pixels that do not repeat themselves. So unless you take a uniform wall with unchanging lighting, small details will generally vary. If you look at this photo, there are meadows in the centre of the picture that are quite unchanging, so it will be possible to store it in the same way we just described. But in practice, those meadows are not totally unchanging because there are small bushes, colours may vary because of the sun, or the grass tinge. So the question is: do those details really matter? The human will not necessarily catch all the tones, so we may assume that some details may be deleted. So compression will be made by doing approximations with pixel blocks. We will take parts of the picture into account and say, "that rectangular pixel block, rather than taking every detail into account, let's say that all the pixels are roughly identical" so we assume that they are identical. We can save space by saying, "I delete some details to factorise the data". Those are commonly used techniques like when you make pictures using JPEG. So that's a lossy compression here. Which means the data returned will be different from the original, it will be slightly less detailed. But it will weigh much less, since a lot of details won't have to be stored, so the pictures will have their size reduced a lot. The result can be more or less satisfying depending on the compression rate. I took the same picture here as an example, I heavily compressed it. Speaking in terms of file size, the picture was divided by 10, but on the other hand, we can see small rectangles appearing on screen, those are the areas where the compression algorithm assumed the pixels were identical, even though it's not true actually. In this case, the file was too heavily compressed, and the picture doesn't look good. So we have to find a compromise, reducing the file size while not deleting too many details. Now let's talk about videos, since the original purpose was to obtain a video that doesn't weigh one terabyte, but rather a few gigabytes. In a video, there can be redundant areas, just as in a picture, there will be repetitions between the successive images. So if you look at the 4 images here, you can notice that the background hasn't changed much. It's going to move a bit towards the left since the rider is moving towards the right, but the background is the same. Then the horse, maybe its legs, its tail or its head have moved, but the body hasn't necessarily moved a lot. So it might be a shame to store several times the horse's data when it is actually almost identical on the various images. The rider hasn't moved a lot either. So there are lots of similarities across the successive images. To store a video, we are going to store some of the images, and for the others, we will search for similarities between the images and say: "the next image is the previous one with this or that modification". Those are common practices for formats such as MPEG, which will store complete images from time to time, particularly at the beginning of each shot, then it will only store the differences for the following images. That saves a lot of storing space since a lot of information is not duplicated anymore. Finally, we'll talk about sound compression. To compress the sound, we will use information such as the fact that the human ear does not pick up all frequencies. There are lots of frequency range, but the human ear can only receive between 20 Hertz and 20 kiloHertz. A sound measured with a tool will contain frequencies that are both higher or lower than that. So they can be suppressed. When coding the sound, those useless frequencies will be deleted. We can also consider that similar frequencies are identical. Those techniques are used in quite common file formats, such as MP3, for example. So the sound will be coded that way, with a range of frequencies corresponding to what the human ear can actually pick up. The signal will be coded that way so effects like reverberation can be applied, all the while keeping the code under the form of frequencies. So what can we learn from this chapter? First, there are various ways to code and decode, compress and extract images and sounds, and all multimedia data generally. Those techniques will be more or less fitting to the various situations. For example, technical drawing and photos will not use the same compression techniques, as we saw earlier. Compression can be applied with or without loss, which can have both advantages and inconveniences, so the choice is made depending on the end usage. And for the computer, it's always the same, it sees a set of bits, it doesn't know what it is made for, it doesn't know it is an image. It's a meaningless row of bits, but when it is given that information in order to display it, the data will be given a heading that describes, "this is an image which was coded and compressed that way". So the computer will be able to decode, extract and display the data by using the right method, the right pixels, the right colours for the pixels, etc. And when the computer is given an image with a scanner for example, the computer will be told how to handle this image, how to code it and compress it in order to reduce its size, etc. Every time, there will be a heading how the data was coded and compressed, detailing the size, the duration, the compression type, the colour type, etc. This is done for both images and sounds.