We all use YouTube, and the platform needs no introduction. It is the world’s largest video sharing website with over a billion users. However, did you ever think; how YouTube actually works? What happens to our videos when we upload them on their system?
Usually, we come to the website, search something, pick a video, and click on the play button; that is it. However, if we look at the background, there are many processes that the very video has to pass through before we can play it. From the uploading to the playing time, there are various procedures that every content must undergo. Following paragraphs have some insight on what happens with the video we upload.
How Big YouTube Is?
Before we move on to discuss the operating processes of the website; check out a few facts about YouTube.
- 400 hours of videos are uploaded every minute
- YouTube has around 1.3 billion users – almost a third of all people on the Internet
- If you start watching every video that has been uploaded to the YouTube, by the time you finish, it will be the year 2081.
- 80% of the YouTube users are outside the USA
- YouTube has launched local versions in more than 88 countries.
- You can navigate YouTube in a total of 76 different languages, which covers 95% of the global internet population.
How YouTube Works?
Though YouTube has its imperfections, uploading and watching videos on the prolific platform is generally a seamless experience. This is pretty amazing when you consider all of the file types, resolutions, and formats that YouTube has to account for as it disseminates videos to over a billion people worldwide.
Despite the fact that every camera has its own mechanism and produces the files in various formats, YouTube ensures that it seamlessly play the content on different browser/app without any buffering or technical error. How it does, read below.
What Happens When You Upload a Video?
When a user uploads a video to YouTube, the first thing YouTube does is to convert it into an easily playable format. So the first thing you need to know is that YouTube does not play the originally uploaded file; instead, it converts it into the best playable format that does not impact the system and could be played seamlessly. If the original file is hosted and executed, it can cause technical difficulties (can break the internet) or lead to constant buffering on a slower internet. This is called “YouTube Processing,” where the system tries to make the file as small as possible.
Looking at the resolution and frame rate, it creates what they call as “Mezzanine” or a high-quality copy of the original video. This mezzanine is then cut into pieces (5 seconds each) and every chunk is passed into a different machine. This is followed by mathematics that produces the same video in various formats and resolutions (240p, 360p, 720p, 1080p).
When these chunks are ready, they are stitched back to their original place. Depending upon the resolution of input, around 25 different formats/files are generated out of this process. This is to ensure good user experience. See, if you have uploaded a 4K version, the person watching it on a LED TV will see the 4K version; however, an individual in some remote village with a sluggish internet connection will see the same video on his phone, in a different (playable) quality.
How These Versions Are Made?
There is a process called CODEC – means compression and decompression. This means some mathematics can shrink a file down to a much smaller version than the original one without any visual loss in quality. When a device captures the video, it is just like a bunch of dots which it collects, and they remain similar even when the video is compressed.
Yes, technological coding is not always perfect; therefore, YouTube keeps conducting experiments and surveys to find out the best quality output. Depending on the survey or test results, that particular format is preferred, and videos are shown in the very format by default.
Once video coding and decoding are completed; the next process is to generate thumbnails and recognize voice for captions.
YouTube allows users to pick up the best of available thumbnails. It tries to recognize the voice in order to offer automatic captioning which an uploader can alter as well.
Once this development process is complete, here comes what we call ‘search process’; an algorithmic mechanic process that makes the video searchable against certain terms and keywords.
What Happens When You Play a Video?
We generally assume that when we click on the “play” button; YouTube goes and brings that video to us. This was entirely correct roughly a decade ago. In the past, a user’s “play” click was followed by downloading of the video, but that was interruptive. A slower internet or server would break the processor cause constant buffering, which would ultimately disturb the user experience.
YouTube engineers came with a solution to this, and named it as “SlicedBread”. It is a metaphor that means to break down the video into a different piece and laying down the track for ‘train’ (read user) when the engine reaches there. Once a slice has been served, the process reads the next best process to be laid down.
This learning process is dubbed as ‘Adaptive Bitrate’, which simply means that the type of your next video-slice can change because your internet speed and bandwidth is changing. When you click “Play”, YouTube makes some analysis – what’s the users’ screen size, internet speed or how much he has already downloaded. If things are perfect, it brings a high resolution (1080p) for you. However, in case your bandwidth has constrained due to another download on your device, YouTube will automatically adjust the type of the next video slice and brings a low-resolution copy. This ensures that the user does not get disturbed, and his/her video does not lead to buffering.
Moreover, the video selection process is also pretty complicated. When a person plays a video, the bot tries to bring in the content from the nearest location to the user, because it saves bandwidth and offers a seamless experience.
This “Content Distribution Network” can be understood as a tree; the root is the base of a tree, and it has branches. Same is the case with data centers that host video content, so when a person plays a particular video, the request is sent to the nearest center to facilitate the seamless experience.