A Comprehensive Look at LL-HLS

06 Jul, 2025

Why Do We Need LL-HLS?

Have you ever had experiences like these while watching a live stream?

Watching an LCK match live on YouTube and seeing “GG” flood the chat before the Nexus explosion even appears on your screen
Or watching the World Cup while your friend on cable TV texts you “GOAL!!!” before you even see it

As Roger Pantos (Apple) presented at WWDC2019, “when watching sports, you often hear the goal through your apartment wall before seeing it on your Apple TV” – this is exactly the problem.

This 20-30 second delay is the main culprit that significantly diminishes the immersion of live streaming.

That is why LL-HLS was born.

Low-Latency HLS (LL-HLS) is a technology that reduces the latency of traditional HLS to 1-2 seconds, enabling streaming that is truly close to “real-time.” It retains all the advantages of the stable HLS protocol that has been in use since 2009.

As announced at WWDC2020, LL-HLS has exited beta and is officially supported on iOS 14, tvOS 14, and macOS. It supports all HLS features including bitrate switching, FairPlay Streaming, and fMP4 CMAF, and can be used in native applications without additional entitlements.

But how did they manage to reduce 20-30 seconds of latency down to 1-2 seconds?

When the Apple development team designed LL-HLS, they established these principles:

HTTP is still the best: The optimal way to deliver media simultaneously to hundreds of thousands of viewers
Cooperation with CDNs: Leverage CDNs’ inherent characteristics (HTTP proxy caching) rather than working against them
Efficient bitrate switching: Since the buffer near the live edge is very small, the efficiency of the switching mechanism is critical

After careful consideration, Apple announced the 5 pillars of LL-HLS:

Reducing publishing latency: Splitting video into smaller pieces and sending them immediately
Optimizing segment discovery: Finding new video segments faster
Eliminating round trips: Removing unnecessary request-response cycles
Optimizing playlist delivery: Delivering playlist files more efficiently
Accelerating bitrate switching: Making quality changes faster

Let us examine each of these in detail.

Traditional HLS Architecture
+---------+    +---------+    +---------+    +---------+
| Encode  |--->|Complete |--->| Publish |--->| Player  |
| 6s GOP  |    | Segment |    | to CDN  |    | Buffer  |
+---------+    +---------+    +---------+    +---------+
    6s             0s            0.5s          18-30s
                        Total Latency: 24-36 seconds

 LL-HLS Architecture
+---------+    +---------+    +---------+    +---------+
| Encode  |--->| Partial |--->|Immediate|--->|Optimized|
| 1s GOP  |    |Segments |    | Publish |    | Buffer  |
+---------+    +---------+    +---------+    +---------+
    1s           0.2s          0.1s           1-2s
                        Total Latency: 2-5 seconds

The 5 Pillars of LL-HLS

1. Partial Segments (Reducing Publishing Latency)

Problem: When encoding a 6-second segment in real time, it takes 6 seconds before there is content available to upload to the CDN.

Solution: Allow small portions to be published before the main segment is ready.

Partial Segments are subsets of regular segments, containing a portion of the media within their parent segment.

HLS: Wait 6 seconds -> File complete -> Start transmission
LL-HLS: Every 0.2 seconds -> Immediately transmit small chunk

GOP (Group of Pictures): Video encoding level
- Keyframe (I-frame) interval shortened to 1-2 seconds
- Minimizes encoding latency
CMAF (Common Media Application Format): Container level
- fMP4-based segment format
- Divisible into chunks
Partial Segments: Delivery level
- 200-500ms delivery units
- Created using CMAF chunks

Note: Partial segments are primarily useful near the live edge. Once away from the live edge, they are removed from the playlist to keep it concise.

2. Blocking Playlist Reload (Optimizing Segment Discovery)

Problem: Due to the polling mechanism in traditional HLS, it could take up to 6 seconds for clients to discover new segments.

Solution: Allow clients to request the next playlist update before it is actually ready.

HLS

Player: "Any new video?"
Server: "Not yet" (304 response)
Player: (waits a few seconds...) "How about now?"
Server: "Still nothing" (304 response)
Player: (waits again...)

LL-HLS

Player: "Send me segment 273 as soon as it's ready!" (pre-order)
Server: (waits until segment 273 is ready...)
Server: "Here it is!" (immediate delivery with 200 response)

This way, new video segments can be received as soon as they are produced.

Request Flow Comparison

3. Eliminating Round Trips via Blocking Preload Hints

Problem: An additional round trip was needed to request segments after receiving the playlist.

Here is an example:

Player: “Give me the video list!”
Server: “Here’s the list!” (delivers playlist)
Player: “Now give me segment 273!”
Server: “Here’s the video!” (delivers video file)

This required a total of two request-response cycles, which contributed to latency.

Initial solution (2019): Use HTTP/2 Push to simultaneously deliver segments along with the playlist.

Improved solution (2020): Blocking Preload Hints

As announced at WWDC2020, HTTP/2 Push was replaced with Blocking Preload Hints. This was because the Push approach was incompatible with many content delivery methods, particularly ad-supported content delivery.

How Blocking Preload Hints Work:

Client: “Send me the next part as soon as it’s ready!” (pre-order)
Server: (holds the request until the part is ready…)
Server: “Here it is!” (immediate delivery with 200 response)

This is similar to Blocking Playlist Reload, but requests segment parts instead of playlists.

Advantages at the CDN Level:

Blocking Preload Hints actually perform better than HTTP/2 Push at the CDN level. When clients initiate requests, CDN cache filling is triggered automatically without additional round trips. It also does not require CDN support for Push, making implementation simpler (HTTP/2 support is still needed).

Connection Flow with Blocking Preload Hints
+--------+                           +--------+
| Client |                           | Server |
+---+----+                           +----+---+
    | GET playlist                        |
    |<-----------playlist with------------|
    |         preload hint                |
    |                                     |
    | GET hinted resource (blocks)        |
    |------------------------------------>
    |         (server produces media)     |
    |<-----------200 OK-------------------|
    |         (immediate playback)        |

4. Delta Updates (Optimizing Playlist Delivery)

Problem: Transmitting a playlist containing 3-5 hours worth of segments 3-4 times per second generates significant overhead, even with gzip compression.

Solution: Leverage the playlist information the client already has and transmit only the changed portions.

The first playlist request returns the full playlist, but subsequent requests can receive delta updates containing only the changes near the live edge.

Playlist delta updates using EXT-X-SKIP save 60-80% of bandwidth and can often fit within a single network packet.

For providers using many date-range tags in long DVR windows, a method was added to include date-range tags in Playlist Delta Updates. This ensures that only the most recent tags are included in updates.

Full Playlist (100KB)               Delta Update (5KB)
#EXTM3U                           #EXTM3U
#EXT-X-VERSION:3                  #EXT-X-VERSION:6
#EXT-X-MEDIA-SEQUENCE:100         #EXT-X-MEDIA-SEQUENCE:264
#EXTINF:6.0,                      #EXT-X-SKIP:SKIPPED-SEGMENTS=3
segment100.ts                     #EXTINF:6.0,
#EXTINF:6.0,                      segment267.mp4
segment101.ts                     #EXTINF:6.0,
... (97 more segments)            segment268.mp4
#EXTINF:6.0,                      #EXTINF:6.0,
segment199.ts                     segment269.mp4
                                  #EXT-X-PART:DURATION=0.33334...
                                  #EXT-X-DATERANGE:ID="recent-ad",START-DATE="2020-01-01T00:00:00Z"

5. Rendition Reports (Accelerating Bitrate Switching)

Problem: Additional requests are needed to determine the latest state of other renditions during bitrate switching.

Solution: Include the latest information about other bitrate tiers within the current playlist update.

When a client loads the latest version of a specific bitrate playlist, the update can include information about other renditions that the client may switch to within the next 1-2 seconds.

Core LL-HLS Protocol Tags

EXT-X-SERVER-CONTROL

Indicates the server’s support for blocking reload and delta updates.

#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=24.0

CAN-BLOCK-RELOAD: Whether the server supports blocking playlist requests
PART-HOLD-BACK: Minimum distance (in seconds) from the live edge
CAN-SKIP-UNTIL: Content duration that can be skipped via delta updates

EXT-X-PART-INF & EXT-X-PART

Provides information about partial segments.

#EXT-X-PART-INF:PART-TARGET=0.33334

#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"

The INDEPENDENT attribute indicates that the part contains a keyframe and can be used to start decoding.

EXT-X-PRELOAD-HINT

Enables pre-segment requests.

#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePart273.3.mp4"
#EXT-X-PRELOAD-HINT:TYPE=MAP,URI="init.mp4"

Clients can issue GET requests in advance, eliminating unnecessary round trips. This is the key mechanism that replaced HTTP/2 Push in the 2020 update.

EXT-X-RENDITION-REPORT

Provides rendition switching information.

#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=2
#EXT-X-RENDITION-REPORT:URI="../4M/waitForMSN.php",LAST-MSN=273,LAST-PART=1

Originally, clients could request specific rendition reports, but this caused a combinatorial explosion of different request URLs referencing the same playlist update, which reduced caching efficiency. Therefore, the report delivery directive was removed, and instead all Rendition Reports are included in every playlist update.

This enables efficient ABR with minimal round trips and reports the current state of other renditions.

Gap Signaling (Added in 2020)

Gap Signaling was added in 2020 to better handle encoding interruptions in LL-HLS streams.

#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4",GAP=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"

#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=2,GAP=YES

Through the GAP attribute, clients can know that there is no media data for a specific part or rendition, allowing them to respond more appropriately to live stream interruptions.

HLS Origin API and Delivery Directives

To enable all LL-HLS features, clients must inform the server of their intent to use new capabilities (delta updates, blocking playlist reload, etc.). The HLS Origin API is used for this purpose.

Important: For the first time in HLS, query parameters have been included in the specification. All query parameters starting with _HLS are reserved for protocol use.

_HLS_msn=<M>: Request playlist containing media sequence number M or higher
_HLS_part=<N>: Request part N of the specified MSN (requires _HLS_msn)
_HLS_skip=YES|v2: Request delta update with EXT-X-SKIP tag

Request Example

GET playlist.m3u8?_HLS_msn=1803&_HLS_part=1

Latency Analysis and Measurement

Glass-to-Glass Latency Components

Component	HLS	LL-HLS
Encoding	5s	1-2s
Segmentation	6-10s	0.2-0.5s
CDN Propagation	Variable	Variable
Player Buffer	18-30s	1-3s
Decoding/Rendering	<1s	<1s
Total	25-40s	2-5s

Measurement Methodology

Clapperboard Method - The most accurate glass-to-glass measurement

+-------------+     +-------------+
|   Source     |     |   Player    |
| Timestamp:   |     | Timestamp:  |
| 13:50:29    |---->| 13:50:34    |
+-------------+     +-------------+
        Glass-to-Glass Latency: 5 seconds

PDT Tag Method - Synchronization using EXT-X-PROGRAM-DATE-TIME

#EXT-X-PROGRAM-DATE-TIME:2025-07-04T12:00:00.000Z
#EXTINF:6.0,
segment1000.ts

Network Efficiency and CDN Behavior

HTTP Request Pattern Analysis

LL-HLS generates a very different network traffic pattern due to partial segments and blocking requests.

Metric	HLS	LL-HLS
Playlist requests per minute	10	120-180
Segment size	2-10 MB	200-500 KB
Request pattern	Sequential	Concurrent
Connection type	Short-lived	Persistent
HTTP version	HTTP/1.1+	HTTP/2

6. Complete LL-HLS Playlist Example

A complete example based on the Apple official documentation:

#EXTM3U
#EXT-X-TARGETDURATION:4
#EXT-X-VERSION:6
#EXT-X-SERVER-CONTROL:CAN-BLOCK-RELOAD=YES,PART-HOLD-BACK=1.0,CAN-SKIP-UNTIL=24.0
#EXT-X-PART-INF:PART-TARGET=0.33334
#EXT-X-MEDIA-SEQUENCE:264
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:13:28.106Z
#EXT-X-MAP:URI="init.mp4"

#EXTINF:4.00008,
fileSequence270.mp4

#EXT-X-PART:DURATION=0.33334,URI="filePart271.0.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.1.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.2.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.3.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.4.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.5.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.6.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.7.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.8.mp4",INDEPENDENT=YES
#EXT-X-PART:DURATION=0.33334,URI="filePart271.9.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.10.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart271.11.mp4"

#EXTINF:4.00008,
fileSequence271.mp4
#EXT-X-PROGRAM-DATE-TIME:2019-02-14T02:14:00.106Z

#EXT-X-PART:DURATION=0.33334,URI="filePart272.a.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.b.mp4"
#EXT-X-PART:DURATION=0.33334,URI="filePart272.c.mp4"

#EXT-X-PRELOAD-HINT:TYPE=PART,URI="filePart273.3.mp4"
#EXT-X-RENDITION-REPORT:URI="../1M/waitForMSN.php",LAST-MSN=273,LAST-PART=2
#EXT-X-RENDITION-REPORT:URI="../4M/waitForMSN.php",LAST-MSN=273,LAST-PART=1

7. Real-World Performance Benchmarks

Real-world implementations show consistent latency improvements:

StreamShark: 5 seconds (LL-HLS) vs 20-30 seconds (HLS)
Apple Tests: 3 seconds glass-to-glass
AWS Implementation: 5-10 seconds end-to-end
Twitch Community HLS: ~5 seconds

Commercial Player Test Results

Actual test results conducted by AWS Media Services:

The first scenario prioritizes latency, while the second prioritizes encoding efficiency.

8. CDN Caching Strategy

As Roger Pantos explained at WWDC2019, CDN caching was one of the primary causes of latency in traditional HLS.

Caching Problems in Traditional HLS

In traditional HLS, the following problem occurred. Assume the origin server has a playlist containing 3 segments (1, 2, 3). When the first client makes a request, the CDN edge server has nothing cached, so it fetches the latest playlist from the origin and delivers it.

Then, 1-2 seconds later, a new segment 4 is added at the origin and the playlist is updated.

When a second client makes a request at this point, the CDN edge returns the cached old playlist (containing only segments 1, 2, 3). The client has no way of knowing that segment 4 exists.

Why does the CDN not check the origin every time? Because if it checked the origin for every random incoming client request, the “origin would melt.” Therefore, the CDN must cache for the duration of the TTL (Time To Live), and the longer this TTL, the greater the latency.

LL-HLS’s Innovative Solution

LL-HLS solves this problem through “cache busting.” It uses a different URL for each playlist update.

When the first client requests a specific update, the CDN determines “I haven’t seen this URL before” and forwards it to the origin. The origin responds with “it’s not ready yet,” and once ready, delivers it to the client through the CDN.

When the next client requests the same update, the CDN recognizes the URL and serves it immediately from cache. However, a client wanting the next update requests it with a completely different URL, so the CDN immediately knows this is a new request. Rather than serving an old version, it forwards the request directly to the origin.

This way, new playlist update requests inherently have cache-busting capability, and caching works more efficiently overall at the CDN.

9. Player Support Status

Apple Ecosystem

AVPlayer: Stable playback on macOS, iOS17/iPadOS17, tvOS17
Safari Mobile: As of iOS 14, direct playback is not recommended (improvements needed for bitrate upswitching, playhead positioning predictability, and drift compensation); support has improved significantly in iOS 15+

Android ExoPlayer

As mentioned in the Google ExoPlayer team’s blog post, low-latency streams work without additional configuration (although the example is for LL-DASH).

Internal Implementation

The AndroidX Media3 ExoPlayer HLS implementation includes the following key files:

HlsPlaylistParser.java: LL-HLS tag parsing and blocking reload logic
HlsChunkSource.java: Partial segment loading and delta update handling
HlsMediaPeriod.java: Automatic live offset adjustment and buffering strategy
HlsPlaylistTracker.java: Blocking playlist request management

Note: The code snippets below are simplified pseudocode illustrating the concept of how ExoPlayer handles LL-HLS internally. They are not verbatim source code from the Media3 repository.

Automatic Detection Mechanism (Simplified Pseudocode)

// Conceptual: EXT-X-SERVER-CONTROL tag detection in HlsPlaylistParser
if (serverControlTag != null && serverControlTag.canBlockReload) {
    // Enable blocking playlist reload
    playlistTracker.enableBlockingPlaylistReload()
}

// Conceptual: Partial segment handling in HlsChunkSource
if (playlistSnapshot.partList.isNotEmpty()) {
    // Create chunks based on partial segments
    return createPartialSegmentChunk(playlistSnapshot.partList)
}

Blocking Request Implementation (Simplified Pseudocode)

// Conceptual: Blocking request handling in HlsPlaylistTracker
fun maybeThrowPrimaryPlaylistRefreshError() {
    if (playlistBundles[primaryPlaylistIndex]?.playlistError != null) {
        // Handle errors during blocking requests
        throw playlistBundles[primaryPlaylistIndex]?.playlistError!!
    }
}

To further optimize latency, you can add the following configuration:

// Global live streaming settings
val player = ExoPlayer.Builder(context)
    .setMediaSourceFactory(
        DefaultMediaSourceFactory(context)
            .setLiveTargetOffsetMs(2000) // 2-second target offset
    )
    .build()

// Per-MediaItem settings
val mediaItem = MediaItem.Builder()
    .setUri(llHlsUrl)
    .setLiveConfiguration(
        MediaItem.LiveConfiguration.Builder()
            .setTargetOffsetMs(2000)     // Target latency
            .setMinOffsetMs(1000)        // Minimum latency
            .setMaxOffsetMs(5000)        // Maximum latency
            .setMinPlaybackSpeed(0.97f)  // Minimum playback speed
            .setMaxPlaybackSpeed(1.03f)  // Maximum playback speed
            .build()
    )
    .build()

player.setMediaItem(mediaItem)
player.prepare()

Key Points

Automatic Recognition: Automatically detects LL-HLS manifests and applies optimizations
Automatic Adjustment: Automatically adjusts live offset and playback speed based on network conditions
HTTP/2 Support: Automatically leverages LL-HLS blocking requests and multiplexing

Reference: Android Official Live Streaming Documentation

Web-Based Players

hls.js: JavaScript HLS client (experimental LL-HLS support)
dash.js: JavaScript DASH client (Low-Latency DASH support)
Shaka Player: Google’s open-source player (experimental LL-HLS support)

Commercial Players

THEOplayer: LL-HLS successfully validated
JW Player: LL-HLS support and validation complete

10. Byte-Range Addressing for Parts

For more efficient delivery, LL-HLS also supports byte-range addressing.

#EXTINF:4.08,
fs270.mp4
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="20000@0"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="23000@20000"
#EXT-X-PART:DURATION=1.02,URI="fs271.mp4",BYTERANGE="18000@43000"
#EXT-X-PRELOAD-HINT:TYPE=PART,URI="fs271.mp4",BYTERANGE-START=61000

This approach allows multiple parts to be efficiently referenced from a single file.

11. Server Reference Implementation Update (2020)

Apple’s Low-Latency server reference implementation was significantly improved.

CMAF Support

fMP4 CMAF Packaging: Starting in late 2019, an option was added to package media as fragmented MPEG-4, providing compatibility with CMAF (Common Media Application Format).

# Generate CMAF-compatible LL-HLS stream
./mediafilesegmenter -f /path/to/output -B playlist -t 4 -S 3 -D \
  --generate-variant-plist \
  --iframe-only-playlist \
  --low-latency \
  --fmp4-fragment-duration 0.33334 \
  input.mov

Simplified Web Server

Go Script Integration: Previously, you needed to configure a separate web server and connect PHP scripts. Now, running a single Go script implements both delivery directives and an HTTP/2 web server in one package.

# Previous approach (complex)
# 1. Configure Apache/Nginx
# 2. Set up PHP scripts
# 3. Connect and test

# 2020 approach (simple)
go run hls-ll-server.go --content-dir /path/to/segments --port 8080

Unified Tool Package

Single Download: Low-Latency tools have been integrated into the regular HLS tool package, and everything is now available in a single download.

12. HTTP Transport Optimization

LL-HLS requires HTTP/2 on the CDN side to leverage multiplexing benefits.

Future Direction: As Roger Pantos announced on the public hls-interest mailing list, HTTP/3 support was added in iOS 17. For LL-HLS delivery over HTTP/3, server-defined priorities as described in RFC 9218 (Extensible Prioritization Scheme for HTTP) should be used to prioritize playlist delivery.

Standardization Complete

Since 2020, the Low-Latency extensions have been included as a core part of the IETF HLS specification. Two new appendices have also been added describing the Low-Latency Server Configuration Profile and CDN tune-in algorithm.

Closing Thoughts

LL-HLS is an innovative protocol that achieves an 80% reduction in latency compared to traditional HLS while maintaining the scalability of HTTP-based streaming. With 2-5 seconds of end-to-end latency, it approaches broadcast-quality latency and opens new possibilities for applications where live sports, game streaming, and real-time interaction are critical.

The improvements introduced with the 2020 official release, including Blocking Preload Hints, Gap Signaling, and CMAF support, provide a more stable and efficient low-latency streaming environment. With upcoming enhancements such as HTTP/3 support and server-defined priorities (RFC 9218), an even more optimized low-latency streaming environment is expected. In particular, it holds great promise for delivering innovative user experiences in fields where real-time interaction matters most, such as live commerce, sports broadcasting, and game streaming.

References

#ll-hls #http-live-streaming #protocol