How Video Streaming and Content Distribution Networks (CDN) Work

17 Aug, 2025

According to data from research firms, streaming video through Netflix alone accounts for 15% of total global internet traffic. If you include services like YouTube, Amazon Prime, and Apple TV, the figure is estimated to be even higher. In this post, we will examine how video streaming services are actually implemented and how they are optimized using application-level protocols and servers that function similarly to caches.

Characteristics of Internet Video

To understand streaming video applications, we first need to understand the characteristics of video as a medium. Video is fundamentally a sequence of images, typically displayed at a constant rate of 24 or 30 images per second. An uncompressed digital image consists of pixels, and each pixel is encoded with multiple bits representing luminance and color.

The bit rate of a video is the number of bits required per second to display the video. For example, if a video has 30fps, 24-bit pixel depth, 480x240 resolution, and no compression, it requires 82,944,000 bits per second, or 82.944 Mbps (30x480x240x24).

This teaches us that video is essentially unusable without compression – a one-hour video at 720p resolution and 30fps would require approximately 278GB. For more details on video compression, you can refer to the leandromoreira/digital_video_introduction README (a Korean version is also available).

The most important characteristic of video is that it can be compressed. Video quality and bit rate are inversely related, and today’s commercial compression algorithms can fundamentally compress video to any desired bit rate. Of course, higher bit rates result in better image quality and an overall improved viewing experience for users.

From a networking perspective, the most prominent characteristic of video is its high bit rate. Compressed internet video typically ranges from 100kbps to over 4Mbps for streaming high-definition content, and 4K streaming requires bit rates of 10Mbps or more. This means that high-end video requires enormous amounts of traffic and storage capacity. For example, a single 2Mbps video with a duration of 67 minutes consumes 1GB of storage and traffic.

The most important performance metric for streaming video so far has been average end-to-end throughput. To provide continuous playback, the network must deliver an average throughput to the streaming application that equals or exceeds the transmission rate of the compressed video. Additionally, compression can be used to create multiple quality versions of the same video. For instance, you can create three versions of the same video at 300 kbps, 1 Mbps, and 3 Mbps, and users can decide which version to watch based on their currently available bandwidth.

HTTP Streaming and Progressive Download

Early HTTP streaming was implemented using the Progressive Download (PD) approach. In this method, video is stored as a regular file with a specific URL on the HTTP server. When a user wants to watch a video, the client establishes a TCP connection with the server and issues an HTTP GET request for that URL. The server then transmits the video file within an HTTP response message as fast as the underlying network protocols and traffic conditions allow.

It is called Progressive Download because the file is downloaded sequentially (progressively) while playback can occur simultaneously. Rather than waiting for the entire file to download, playback can begin once a certain amount of data accumulates in the buffer, which improved the user experience. However, strictly speaking, this is a download rather than streaming in the modern sense.

On the client side, transmitted bytes are stored in an application buffer. When the number of bytes in this buffer exceeds a predetermined threshold, video frames are periodically extracted from the client application buffer, decompressed, and displayed on the user’s screen. Thus, the streaming video application displays earlier parts of the video while receiving and buffering frames for later portions.

This Progressive Download approach to HTTP streaming was actually deployed in many systems including YouTube in the early-to-mid 2000s, but it has an important limitation: all clients receive the same encoded video regardless of differences in available bandwidth between them. Differences in available bandwidth exist not only between different clients but also over time for the same client. Furthermore, if the network speed is faster than the playback speed, data is downloaded and buffered far ahead of what the user is actually watching. If the user stops watching midway, the already downloaded data is wasted.

The Emergence and Operation of DASH

Due to these limitations, new forms of HTTP-based adaptive streaming technologies were developed. Notable examples include HLS (HTTP Live Streaming) developed by Apple and DASH (Dynamic Adaptive Streaming over HTTP), an MPEG standard. While HLS, which forms the basis of LL-HLS (Low-Latency HLS) that I explained in a previous post, is also widely used, here I will focus on DASH as it is an international standard and more universally adopted. In practice, when developing media players, you often need to support both protocols, but the underlying principles are very similar.

In DASH, video is encoded in multiple versions, each with different bit rates and quality levels. The client dynamically requests different versions of the video in units of chunks that are several seconds long.

When available bandwidth is sufficient, the client requests high bit rate video versions, and when available bandwidth is low, it requests low bit rate video versions. The client uses HTTP GET requests to select different video chunk versions each time. DASH allows clients with different internet connections to choose videos with different encoding rates. A client with a slow 3G connection can receive low-quality video, while a client with a high-speed fiber connection can receive high-quality video.

Additionally, DASH allows clients to adapt to time-varying end-to-end available bandwidth during a session. This characteristic is particularly important for mobile users, as available bandwidth frequently changes depending on the base station conditions during movement.

When using DASH, each video version is stored with a different URL on the HTTP server. The HTTP server has a manifest file that provides the URL for each version by bit rate. The client first requests the manifest file to learn about the various versions available from the server. Then the client selects the desired version of video chunk data each time and requests it by specifying the URL and byte-range in an HTTP GET request message.

While downloading video chunk data, the client uses the measured received bandwidth and a bit rate determination algorithm to decide which version of video chunk data to select next. If the client has a sufficient amount of buffered video and the measured received bandwidth is high, it will naturally select high-quality video chunk data. Conversely, if the client has a small amount of buffered video and the received bandwidth is low, it will select low-quality video chunk data. Therefore, DASH allows clients to freely switch between different quality levels.

The Need for Content Distribution Networks (CDN)

Today, many internet video companies distribute video streams of several Mbps to countless users every day. Companies like YouTube have tens of millions of videos in their library and provide streaming services to hundreds of millions of users daily. Delivering this enormous streaming traffic seamlessly and reliably across the globe is an immense challenge.

The simplest approach for an internet video company to provide streaming services would be to build a single massive data center, store all video content there, and transmit video streams directly to users worldwide from that data center. However, this approach has three critical problems.

First, if a client is geographically far from the data center, the packet path from server to client traverses many different communication links and ISPs, which may be located on different continents. If any of these links has a transmission capacity lower than the video consumption rate, end-to-end throughput decreases and the user experiences frustrating screen freezes. The longer the end-to-end path, the more likely such issues become.

The second problem is that popular videos will be transmitted repeatedly over the same communication links. This not only wastes network bandwidth but also results in the internet video company paying duplicate costs for transmitting the same bytes to the ISPs that provide the connections.

The third problem is that building a single data center creates the risk of the entire service going down due to a single failure. If the data center’s servers or their connection links to the internet fail, no video streaming service is possible.

Two CDN Philosophies

To solve the problem of distributing enormous amounts of video data to users worldwide, virtually all video streaming companies use Content Distribution Networks (CDN). A CDN operates servers distributed across multiple locations and stores copies of video and other web content data on these distributed servers. Users are connected to the CDN server at the location that can provide the best service and user experience.

A CDN can be a private CDN owned by the content provider – for example, YouTube videos are distributed through Google’s CDN. Alternatively, third-party CDNs can serve multiple content providers, with Akamai, Limelight, and Level-3 being examples of third-party CDNs. CDNs generally adopt one of two philosophies regarding server placement.

The Enter Deep philosophy, pioneered by Akamai, involves deploying server clusters deep into ISP access networks by building them at locations throughout the world. Akamai implemented this approach by building server clusters at thousands of locations. The goal is to place servers as close to users as possible, reducing the number of links and routers between users and CDN servers, and improving the latency and throughput experienced by users. The downside is that the highly distributed design increases the cost of maintaining and managing server clusters.

The Bring Home philosophy, adopted by Limelight and other companies, involves building larger server clusters at a smaller number of key locations, effectively bringing ISPs to the “Home.” Instead of connecting to access ISPs, these CDNs typically place their clusters at Internet Exchange Points (IXP). Compared to the Enter Deep approach, this second method reduces cluster maintenance and management costs but results in relatively worse latency and throughput for users.

How CDNs Work

Once server cluster locations are determined, the CDN stores copies of content across these clusters. The CDN does not need to maintain copies of every video at each cluster, as some videos may be rarely popular or only popular in certain countries. In practice, CDNs use a pull approach rather than a push approach for clusters. When a user requests a video that is not in the local cluster, the video is fetched from the central server or another cluster, served to the user, and simultaneously a copy is stored locally. Like internet caches, when a cluster’s storage is full, infrequently used video data is deleted.

When a web browser on a user’s host requests playback of a specific video by specifying a URL, the CDN intercepts the request, selects the most appropriate CDN cluster for that client at that point in time, and redirects the client’s request to a server in that cluster. Most CDNs leverage DNS to intercept user requests and redirect them.

For example, suppose a content provider like Netflix uses a CDN provider like Akamai to distribute videos to customers. On the content provider’s web pages, each video is assigned a URL containing the string ‘video’ and a unique ID. If a video is assigned the URL http://video.example.com/6Y7B23V, the following six-step process occurs.

When the user visits the content provider’s web page and clicks a video link, the user’s host sends a DNS query for video.example.com. The user’s Local DNS Server (LDNS) detects the ‘video’ string in the hostname and forwards the query to the content provider’s authoritative DNS server. The content provider’s authoritative DNS server, in order to redirect the DNS query to the CDN, provides the LDNS with the CDN’s hostname instead of an IP address.

From this point, the DNS query enters the CDN’s private DNS infrastructure. The user’s LDNS sends a second query for the CDN host, which is resolved by the CDN’s DNS to the IP address of a CDN content server and returned to the LDNS. At this point, the server that will deliver content to the client is determined. The LDNS provides the user’s host with the IP address of the CDN server that will serve the content, and once the client obtains the CDN server’s IP address, it establishes a direct TCP connection to that IP address and sends an HTTP GET request for the video. If DASH is being used, the server first sends the client a manifest file containing a list of URLs for different versions of the video, and the client can dynamically select different versions of video chunk data.

Cluster Selection Policy

One of the keys to CDN deployment is the cluster selection policy – the mechanism for dynamically directing clients to a specific server cluster or CDN data center. During the DNS resolution process, the CDN learns the IP address of the client’s LDNS server and needs to select the best cluster based on that IP address.

One simple cluster policy assigns the geographically closest cluster to the client. Using commercial geolocation databases like Quova or MaxMind, the LDNS IP address can be mapped geographically. When a DNS query arrives from a specific LDNS, the CDN selects the cluster closest to that LDNS. This simple method works quite well for most clients.

However, it does not work well for some clients because the geographically closest cluster may not be the closest in terms of network path length (number of hops). Additionally, an inherent problem with DNS-based methods is that some users may be configured to use a DNS server that is quite far away as their LDNS. In this case, the cluster selected based on the LDNS IP address would be far from the user’s host. Moreover, this method ignores internet path delays and changes in available bandwidth, always assigning the same cluster to a client.

To select the best cluster reflecting current network traffic conditions, CDNs also perform real-time measurements of delay and loss performance between clusters and clients periodically. For example, a CDN can have each cluster periodically send probe messages like pings or DNS queries to LDNS servers. The problem is that many LDNS servers are configured not to respond to such messages.

Netflix and YouTube

Let us conclude our discussion of streaming stored video by examining two successful case studies. Netflix and YouTube take quite different approaches but are both built on many of the principles discussed earlier.

Netflix Architecture

Netflix is one of the leading providers of online movies and TV series worldwide. Netflix video distribution involves two major components: Amazon Cloud and its own CDN infrastructure. (Note: Netflix’s architecture has continued to evolve significantly since the description below, but the fundamental principles remain instructive.)

Netflix has a website that handles various functions including user registration and login, payments, movie genre browsing, and movie recommendation services. The website and backend databases all run on Amazon Cloud within Amazon’s servers. Amazon Cloud also handles important functions such as content ingestion, content processing, and uploading versions to the CDN.

Netflix receives studio master versions of movies and uploads them to hosts in the Amazon Cloud system. Machines in the Amazon Cloud system then generate multiple format versions of each movie suitable for the various player device specifications of customers, including desktop computers, smartphones, and game consoles connected to TVs. Multiple versions at different bit rates are also generated for each format to support HTTP adaptive streaming using DASH. Once the various versions of a movie are generated, hosts in the Amazon Cloud system can upload these versions to the CDN.

When Netflix first started its video streaming service in 2007, three CDN companies distributed the video content. Later, it built its own private cloud capable of streaming all its videos. To build its own CDN, Netflix installed server racks at IXPs and within residential ISPs themselves. As of 2023, Netflix Open Connect operates over 17,000 servers across 6,000+ locations worldwide.

Netflix provides potential ISP partners with free rack installation guides for their networks. Each rack server has 10 Gbps Ethernet ports and over 100 terabytes of storage. The number of servers in a rack varies. IXP installations contain dozens of servers and the complete streaming video library including multiple DASH-compatible versions.

An interesting point is that Netflix does not use pull caching but instead distributes videos by pushing them to CDN servers during off-peak hours. For locations that cannot hold the entire library, only the most-watched videos are pushed daily.

With knowledge of the basic components of Netflix’s architecture, let us look more closely at the interactions between the various servers and clients involved in movie delivery. The web pages for browsing Netflix’s video library are served from Amazon Cloud servers. When a user selects a movie to play, Netflix software running on Amazon Cloud first determines which CDN servers have copies of that movie.

Among the servers that have the movie, the software determines the “best” server for the client request. If the client is using a local ISP that has a CDN server rack installed at that ISP, and this rack has a copy of the requested movie, this rack server is typically selected. Otherwise, a nearby IXP server is selected.

Once Netflix determines the CDN server to deliver content, it sends the client a manifest file with URLs for different versions of the requested movie along with the specific server’s IP address. The client and that CDN server then interact directly using a proprietary version of DASH. The client can request different versions of video chunk data using the byte-range header in HTTP GET request messages. Netflix uses video chunk data of approximately 4 seconds. While downloading video chunk data, the client measures received throughput and uses a rate determination algorithm to decide the quality of the next video chunk data to request.

Netflix uses multiple technologies including adaptive streaming and CDN. However, because Netflix uses its own CDN dedicated exclusively to video distribution, it can simplify and fine-tune the CDN design. In particular, Netflix does not need to use DNS redirection to connect specific clients to CDN servers. Instead, Netflix software running on Amazon Cloud tells the client which specific CDN server to use. Additionally, Netflix CDN uses push caching rather than pull caching. Content is pushed to servers at scheduled times during off-peak hours rather than being dynamically fetched during cache misses.

YouTube Architecture

Hundreds of hours of video are uploaded to YouTube every day, and billions of videos can be viewed daily. YouTube launched its service in April 2005 and was acquired by Google in November 2006. Although Google/YouTube’s designs and protocols are proprietary, several independent measurement efforts have provided a basic understanding of how YouTube operates.

Like Netflix, YouTube actively uses CDNs for video distribution. Similar to Netflix, Google uses its own private CDN to distribute YouTube videos and has installed server clusters at hundreds of IXP and ISP locations. From these locations, as well as from massive data centers, videos are distributed directly.

Google also uses DNS to connect users to specific server clusters. In most cases, Google’s cluster selection policy connects the client to the cluster with the lowest RTT between the client and the cluster. However, sometimes for balanced workload distribution across clusters, clients may be directed to a more distant cluster via DNS.

YouTube employs HTTP streaming. YouTube creates and serves multiple versions with different bit rates and quality levels for its videos. In 2011, YouTube had users manually select their version rather than using adaptive streaming like DASH. YouTube has since adopted DASH-based adaptive streaming, automatically adjusting video quality based on network conditions. To reduce wasted bandwidth and server resources due to seeking and early termination, YouTube uses HTTP byte-range headers to limit additional data flow beyond a targeted amount of prefetched data.

Millions of videos are uploaded to YouTube every day. YouTube uses HTTP streaming not only for video transmission from server to client but also for video upload from user to server. YouTube converts each uploaded video to its own format and creates multiple versions. This entire process takes place within Google data centers.

Conclusion

If you are an Android developer (or iOS developer), you have likely had experience displaying error screens and adding reconnection logic in response to network conditions, especially in weak signal situations. Since mobile devices are inherently portable client devices, they are heavily dependent on network conditions. Large-scale streaming services like YouTube and Netflix are also expected to implement custom ABR (Adaptive Bitrate) algorithms that account for these mobile characteristics. In fact, examining ExoPlayer’s AdaptiveTrackSelection.java implementation, you can see it employs an ABR algorithm that comprehensively considers buffer state, network stability, whether the content is live, multiple tracks, TTFB, and more. In particular, for live streaming, there is logic that limits quality improvements when too close to the live edge (the current broadcast point).

  private long minDurationForQualityIncreaseUs(long availableDurationUs, long chunkDurationUs) {
    if (availableDurationUs == C.TIME_UNSET) {
      // We are not in a live stream. Use the configured value.
      return minDurationForQualityIncreaseUs;
    }
    if (chunkDurationUs != C.TIME_UNSET) {
      // We are currently selecting a new live chunk. Even under perfect conditions, the buffered
      // duration can't include the last chunk duration yet because we are still selecting a track
      // for this or a previous chunk. Hence, we subtract one chunk duration from the total
      // available live duration to ensure we only compare the buffered duration against what is
      // actually achievable.
      availableDurationUs -= chunkDurationUs;
    }
    long adjustedMinDurationForQualityIncreaseUs =
        (long) (availableDurationUs * bufferedFractionToLiveEdgeForQualityIncrease);
    return min(adjustedMinDurationForQualityIncreaseUs, minDurationForQualityIncreaseUs);
  }

For example, suppose you are watching a soccer match in real time. The screen the viewer is currently watching is at 100 seconds after kickoff, while the actual current point being broadcast (live edge) is at 108 seconds. The viewer is watching approximately 8 seconds behind the live broadcast. This 8 seconds is the maximum buffer headroom available, because footage beyond 108 seconds has not yet been transmitted from the broadcaster and is physically impossible to receive.

The key point here is that the player is about to select a new chunk of 2 seconds in length. Since this 2-second chunk has not yet been downloaded, the actual buffer headroom is 8 seconds minus 2 seconds, or 6 seconds. ExoPlayer requires that at least 75% of this 6 seconds – that is, 4.5 seconds – of buffer is secured before increasing quality. (In other words, the closer you are to the live edge, the less buffer headroom there is, making quality increases risky. This appears to be a default margin policy aimed at preventing buffer underrun by securing 75% of the distance to the live edge as buffer.)

Of course, well-known streaming services do not use ExoPlayer’s default AdaptiveTrackSelection.java implementation but instead implement their own ABR algorithms tailored to their service needs. However, the fundamental goal of all ABR algorithms is the same: when network quality suddenly drops, immediately switch to lower quality to guarantee uninterrupted playback, and when conditions improve, recover to higher quality. This is precisely the core problem that protocols like DASH and HLS were designed to solve.

I hope this post has been helpful for developers who are developing or customizing media players. Understanding how CDNs work and why adaptive streaming with DASH and HLS is necessary will enable you to develop streaming applications that provide a better user experience.

References

#network #cdn #streaming #video #dash