Hello. Welcome to this 4th
binary coding sequence.
We will talk about data organization.
We saw how to represent data in the form of bits,
in numbers, images or sounds.
Now, let's see how to store them,
concretely, how to organize them.
Data is stored in a RAM on the hard drive,
transferred on the network.
It doesn't matter what they match with,
there has to be a clear way to tell how
they are encoded, how they are organized,
in which order, in which format, etc.
The computer will either
read or generate them, or do both.
In both cases, it must
understand how they are coded.
In practice, there'll be a header
that describes the data's
size, the potential length,
the potential colors,
the compression format, etc.
The header will specify the data coding.
The computer will first look
at the header that is with the data,
to see how they are stored, then
look at the data according
to what the header described.
So the computer manipulates
a certain amount of basic data,
including whole numbers, large and small.
Whole numbers can be stored on 8 bits,
so on 1 bit, so 256 values 2 power 8,
or even higher whole numbers
- today's computers can manipulate
whole numbers on 64 bits too -
that means 16 billion billions of values.
There are different types of large whole numbers
the computer can manipulate efficiently.
These whole numbers can eventually be negative.
If needed, we can use either
positive or negative whole numbers.
Whole numbers are good because they are accurate,
there are many values, not necessarily enough,
because sometimes
we'll need to manipulate decimals.
In I.T., it's called a floating point number,
which is a number followed
by a power of 10, an exponent.
These numbers can be of great value,
much bigger than whole ones, on the other hand,
the amount of numbers will be limited
to about 10 or 20,
according to the kind
of floating-point number used.
With these 2 big categories of numbers,
many things can be done.
The computer can manipulate them efficiently.
The timer checks what data it has to store
and choose the type of data accordingly.
So, if it has to store an age, for example
- which is between 0 and 100 -
it will use a whole number, a positive one,
stored on a rather low number of bits.
On the other hand, if it has to store
temperatures, it will choose a floating number,
without much precision, as temperatures
don't have to be indicated
on a wide range of values.
For a level of gray,
as seen previously,
it will approach 8 bits,
rather small whole numbers.
Afterwards, more complex data than
simply numbers will be manipulated
Sometimes, a date has to be stored.
A date is made of 3 numbers: the day's, the month's and the year's.
A color has 3 components:
red, green and blue,
meaning a number for each component.
There are number combinations to store.
They are called Tuples, or structures / recordings in IT.
Let's simply say that a tuple
is made up of 3 whole numbers,
so the day, the month,
the year or the quantity of red, green or blue,
So the 3 numbers are assembled
to create a more complex data
made with these 3 values.
Eventually, we can make tuples of tuples
if we say, for example,
that a period is made with 2 dates
that are made with 3 numbers;
or an image is made with pixels
that are made with 3 colors.
So we can gather these numbers
to create more complex data.
Tuples are practical
as they enable to assemble numbers,
but sometimes the numbers
we want to assemble are not known in advance.
For example, to manage party guests
or a trip's stages,
we don't necessarily know
how many there will be for sure.
So other data types will be used like charts.
We'll put in the chart a set of tuples,
with a similar structure.
That's to say, here, there'll be a name,
a date of birth and a height.
There can't be anything else
different between each line.
There will be identical structure elements,
that will be displayed one under
the other and numbered from 1
to the number of elements in the chart.
This allows to manage elements easily.
Another way of storing this kind
of information is in a list.
A list is something that points
towards the beginning of a list.
Same as before, a list is a tuple.
For example, here, the tuple contains
a town, a date and an end date.
The tuple points towards the next one.
So there's a list
with a cursor at the beginning
and each element points towards
the next one in the list.
As a structure, it looks like the chart
as there's a series of elements of the same type.
But in practice, it will change things
from an algorithmic, efficiency point of view.
Sometimes the chart is more practical,
sometimes the list is more practical.
For example, if I want to insert
an element at the start of a chart,
it will be more complicated than in a list,
as in the chart,
lines have to be moved towards the bottom,
that means moving all the saved data that had been stored.
Elements don't have to be moved
when inserted in a list.
An element will just be created
in the computer's memory
and we'll say, "now the start
of the list is this element
and it points towards
the previous start of the list."
Some operations will be more efficient in a chart,
and some will be more efficient in a list.
It will depend on what we want to do
with our data structures.
All these data structures
will be stored in the memory, concretely.
So will they be allocated in the memory?
And in which order, what size, etc.?
The most important now is to clearly define
how to store information and where.
There will be a place where we'll specify
how the data is organized in the memory.
The programmer is the one
who will do it when writing his program.
At the start, he will choose,
for example, to store the chart's lines
beginning with the person's name,
then their age and then their height.
Or in a different order that has to be
clearly defined at the beginning of the program
so that the whole program recognizes the information when encountered.
Specification can't be changed in the middle of the program.
Same problem when saving things in a file:
one program will have to generate
the file while respecting the specs
so that the others can understand it.
This is what we call an open file,
an open file format.
If we take a GIF image which is an open file,
we'll know in advance that if we want to decode it,
there will necessarily be a set of characters
at the beginning of the file which is "GIF89a",
then 16 bits to store the width,
then 16 bits to store the height of the image.
So anyone who wants to read a GIF file
to display the proper image will know
the information is organized that way.
Anyone who wants to save an image
with this format will know it can be
done this way so that others can read it.
We'll simply define specs
that everyone will have to respect.
When the format of the file isn't open,
people can generate it
but others may not be able to read it.
So we can keep in mind
that data is made up with basic elements,
whole numbers, floating point numbers
and maybe letters or other things.
and these elements will be combined
to represent more complex data
with tuples meaning assembling elements, charts or lists, etc.
that can contain a lot and enlarge or shrink
depending on the programmer and on the user.
The most important is to define
clearly how things are assembled
so that everyone can use them.
Specifications must be defined to say,
"data is organized in the memory in such way,
such order and such size,"
so that the person who writes and the one who reads can understand each other.