Lock-Free Algorithms and CAS Operations

27 Apr, 2025

In multithreaded environments, one approach to resolving race conditions that occur when multiple threads access shared resources is the Lock-Free algorithm. In this post, we will explore the concept of Lock-Free algorithms and examine CAS operations.

Lock-Based vs Lock-Free

A Lock-Free algorithm is an algorithm that allows multiple threads to concurrently access and modify shared resources without using traditional locks, while guaranteeing that the overall system continues to make progress without falling into deadlock or livelock.

The key characteristic of Lock-Free algorithms is that the entire system always processes work above a certain level. Even if one thread is slow or stalled, other threads can continue their work, ensuring overall system progress.

Let us first look at the Lock-based structure, which uses traditional locking.

As shown in the figure above, in a Lock-based structure, when Thread 2 acquires the lock, Thread 1 must wait until the lock is released. If Thread 2 is suspended, Thread 1 must wait indefinitely.

In contrast, in a Lock-Free structure, Thread 1 detects concurrent access and immediately receives a failure response, allowing it to retry. Here, Thread 2 accesses the data without acquiring a lock. When Thread 1 simultaneously attempts to access the same data structure, it detects the concurrent access and returns immediately, notifying the thread that the operation could not be completed (red). Thread 1 continues retrying until the operation completes successfully (green).

The advantage of this approach is that no lock is needed. However, a potential issue is that if Thread 2 (or other threads) accesses the data at a high frequency, Thread 1 may need many attempts before finally succeeding. Therefore, Thread 1 can fall into a starvation state.

Now I will first explain non-blocking algorithms, and then examine how CAS (compare-and-swap) operations enable the non-blocking approach.

Non-Blocking Algorithms

Non-Blocking algorithms can be categorized into three levels.

Obstruction-Free: The weakest form, where if all other threads are suspended, a single thread is guaranteed to make progress. Thread 1 (top thread) can make progress (green arrow) as soon as other threads (Thread 2, … n) are suspended (shown in yellow at the bottom).
Lock-Free: At least one thread is always guaranteed to make progress. Other threads may suffer from starvation, but the system as a whole always progresses. The difference from Obstruction-Free is that at least one thread is not starving even when other threads are not suspended. At minimum, Thread 1 can make progress while other threads (Thread 2, … n) may be in a starvation state (red arrows).
Wait-Free: The most ideal form, guaranteeing that all threads will complete their work within a finite amount of time. All threads make progress without starvation. As shown in the figure above, other threads (Thread 2, … n) are guaranteed to continue making progress (green arrows) after a period of starvation (red arrows).

CAS (Compare-And-Set/Swap) Operation

One of the fundamental operations used to avoid locks is the CAS (Compare-And-Swap) operation. (Note: the term Compare-And-Set is also used – the distinction is that Compare-And-Swap returns the old value, while Compare-And-Set returns a boolean indicating success or failure. In practice, the terms are often used interchangeably.) The idea is to update a variable only if it still holds the expected value that the thread previously read. CAS relies on hardware-level atomic instructions and volatile/atomic semantics to ensure visibility across threads. CAS is an atomic operation, where the read and update are processed together as a single operation.

Here, both threads read value 3 from the shared variable. Thread 2 succeeds (green) and updates the variable to 8. Thread 1’s first CAS expects the value to still be 3, so the CAS fails (red). Therefore, Thread 1 re-reads the value, and its second CAS succeeds. The important point here is that CAS does not acquire a lock on the data structure – it returns ’true’ if the update succeeds, and ‘false’ otherwise.

fun compareAndSet(expectedValue: T, newValue: T): Boolean {
    if (currentValue == expectedValue) {
        currentValue = newValue
        return true  // Success
    }
    return false  // Failure
}

The value is updated to the new value only when it still matches the expected value; otherwise, it returns ‘false’. The following code snippet shows an example of how CAS can be invoked.

fun testCas() {
    var v = value
    var x = v + 1
    while (!cas(v, x)) {
        v = value
        x = v + 1
    }
}

The code above attempts to update the value until the CAS operation succeeds. However, there is a possibility of a thread falling into a starvation state. If other threads simultaneously perform CAS on the same variable, a specific thread’s operation may never succeed (or may take an unreasonably long time to succeed). Nevertheless, if a CAS fails, we know that another thread succeeded, thus guaranteeing the overall progress required for lock-free. It is important that the hardware must support CAS as a truly atomic operation without using locks – Java provides CAS implementations through Atomic variables.

The A-B-A Problem

CAS does not prevent the A-B-A problem. As we saw earlier, CAS updates a value only when the value in main memory is still what we expect. However, CAS will also succeed when the value has been changed and then changed back to the original value. The image below illustrates this situation.

Thread 1 and Thread 2 both read the variable’s value of 3. Then Thread 2 performs a CAS and successfully sets the variable to 8. Thread 2 also performs another CAS and successfully sets the variable back to 3. Finally, Thread 1 performs a CAS expecting the value to be 3, and it succeeds despite the variable’s value having been modified twice in the meantime. This is called the A-B-A problem. Of course, this behavior may not be an issue depending on the use case. However, for some cases it can be an undesirable outcome.

Conclusion

Lock-Free algorithms are a powerful technique that can provide high performance and stability in multithreaded environments. Among them, CAS (Compare-And-Swap) is the most widely known operation, but other atomic operations such as LL/SC (Load-Link/Store-Conditional) and Fetch-and-Add also exist. These various approaches each have their own unique strengths and weaknesses.

Implementing Lock-Free algorithms is complex and requires careful consideration of special situations such as the ABA problem, so a cautious approach is necessary. Since the CAS operation alone cannot solve the ABA problem, approaches using version counters or stamps must be adopted (such as Java’s AtomicStampedReference class).

In real projects, it is advisable to use proven libraries such as Java’s java.util.concurrent.atomic package (e.g., AtomicInteger, AtomicReference), which provides lock-free CAS operations. Note that Kotlin’s kotlinx.coroutines.sync package provides Mutex and Semaphore, which are lock-based synchronization primitives – not lock-free. For lock-free operations in Kotlin, you should use the java.util.concurrent.atomic classes or Kotlin’s kotlinx.atomicfu library. These libraries have already solved complex concurrency issues and provide optimized implementations.

Additionally, Lock-Free algorithms do not guarantee that all threads make progress. In certain situations, some threads may fall into a starvation state. If you need stronger guarantees, you can consider Wait-Free algorithms, but they are much more complex to implement and have higher performance costs.

Although I am an Android developer, I believe that Lock-Free algorithms can make a significant difference in real-time systems where latency matters or in server applications requiring high throughput.

Consider, for example, a client-side image processor operating over UDP, where image packets received over the network need to be processed and rendered in real time. A Lock-Free queue could be well-utilized for synchronization between packet reception and image processing.

Especially in UDP environments where packet loss is possible and fast processing is crucial, implementing a pattern where the receiving thread places packets into a Lock-Free queue and the processing thread consumes them could significantly improve throughput and responsiveness compared to traditional lock-based implementations. In such scenarios, the ABA problem rarely occurs, so a simple CAS-based implementation may be sufficient.

I hope this post has been helpful for the many backend and client developers looking to understand and apply Lock-Free algorithms.

References

#multi-thread #lock-free #concurrent-programming #compare-and-swap