Spin Locks and Contention
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Focus so far: Correctness and Progress
Art of Multiprocessor Programming 2 Focus so far: Correctness and Progress Models Accurate (we never lied to you) But idealized (so we forgot to mention a few things) Protocols Elegant Important But naïve
New Focus: Performance
Art of Multiprocessor Programming 3 New Focus: Performance Models More complicated (not the same as complex!) Still focus on principles (not soon obsolete) Protocols Elegant (in their fashion) Important (why else would we pay attention) And realistic (your mileage may vary)
Kinds of Architectures
Art of Multiprocessor Programming 4 Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream SIMD (Vector) Single instruction Multiple data MIMD (Multiprocessors) Multiple instruction Multiple data.
Kinds of Architectures
Art of Multiprocessor Programming 5 Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream SIMD (Vector) Single instruction Multiple data MIMD (Multiprocessors) Multiple instruction Multiple data. Our space (1)
MIMD Architectures
Art of Multiprocessor Programming 6 MIMD Architectures Memory Contention Communication Contention Communication Latency Shared Bus memory Distributed
Today: Revisit Mutual Exclusion
Art of Multiprocessor Programming 7 Today: Revisit Mutual Exclusion Performance, not just correctness Proper use of multiprocessor architectures A collection of locking algorithms… (1)
What Should you do if you can’t get a lock?
Art of Multiprocessor Programming 8 What Should you do if you can’t get a lock? Keep trying “spin” or “busy-wait” Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor (1)
What Should you do if you can’t get a lock?
Art of Multiprocessor Programming 9 What Should you do if you can’t get a lock? Keep trying “spin” or “busy-wait” Good if delays are short Give up the processor Good if delays are long Always good on uniprocessor our focus
Basic Spin-Lock
Art of Multiprocessor Programming 10 Basic Spin-Lock CS Resets lock upon exit spin lock critical section . . .
Basic Spin-Lock
Art of Multiprocessor Programming 11 Basic Spin-Lock CS Resets lock upon exit spin lock critical section . . . …lock introduces sequential bottleneck
Basic Spin-Lock
Art of Multiprocessor Programming 12 Basic Spin-Lock CS Resets lock upon exit spin lock critical section . . . …lock suffers from contention
Basic Spin-Lock
Art of Multiprocessor Programming 13 Basic Spin-Lock CS Resets lock upon exit spin lock critical section . . . Notice: these are distinct phenomena …lock suffers from contention
Basic Spin-Lock
Art of Multiprocessor Programming 14 Basic Spin-Lock CS Resets lock upon exit spin lock critical section . . . Seq Bottleneck no parallelism …lock suffers from contention
Basic Spin-Lock
Art of Multiprocessor Programming 15 Basic Spin-Lock CS Resets lock upon exit spin lock critical section . . . Contention ??? …lock suffers from contention
Review: Test-and-Set
Art of Multiprocessor Programming 16 Review: Test-and-Set Boolean value Test-and-set (TAS) Swap true with current value Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”
Review: Test-and-Set
Art of Multiprocessor Programming 17 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet( boolean newValue) { boolean prior = value; value = newValue; return prior; } } (5)
Review: Test-and-Set
Art of Multiprocessor Programming 18 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } } Package java.util.concurrent.atomic
Review: Test-and-Set
Art of Multiprocessor Programming 19 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet( boolean newValue) { boolean prior = value; value = newValue; return prior; } } Swap old and new values
Review: Test-and-Set
Art of Multiprocessor Programming 20 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean( false ) boolean prior = lock.getAndSet( true )
Review: Test-and-Set
Art of Multiprocessor Programming 21 Review: Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) boolean prior = lock.getAndSet( true ) (5) Swapping in true is called “test-and-set” or TAS
Test-and-Set Locks
Art of Multiprocessor Programming 22 Test-and-Set Locks Locking Lock is free: value is false Lock is taken: value is true Acquire lock by calling TAS If result is false , you win If result is true , you lose Release lock by writing false
Test-and-set Lock
Art of Multiprocessor Programming 23 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean( false ); void lock() { while (state.getAndSet( true )) {} } void unlock() { state.set( false ); }}
Test-and-set Lock
Art of Multiprocessor Programming 24 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean( false ); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean
Test-and-set Lock
Art of Multiprocessor Programming 25 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet( true )) {} } void unlock() { state.set(false); }} Keep trying until lock acquired
Test-and-set Lock
Art of Multiprocessor Programming 26 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set( false ); }} Release lock by resetting state to false
Space Complexity
Art of Multiprocessor Programming 27 Space Complexity TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the (n) lower bound? We used a RMW operation…
Performance
Art of Multiprocessor Programming 28 Performance Experiment n threads Increment shared counter 1 million times How long should it take? How long does it take?
Graph
Art of Multiprocessor Programming 29 Graph ideal time threads no speedup because of sequential bottleneck
Mystery #1
Art of Multiprocessor Programming 30 Mystery #1 time threads TAS lock Ideal What is going on?
Test-and-Test-and-Set Locks
Art of Multiprocessor Programming 31 Test-and-Test-and-Set Locks Lurking stage Wait until lock “looks” free Spin while read returns true (lock taken) Pouncing state As soon as lock “looks” available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking
Test-and-test-and-set Lock
Art of Multiprocessor Programming 32 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean( false ); void lock() { while ( true ) { while (state.get()) {} if (!state.getAndSet( true )) return ; } }
Test-and-test-and-set Lock
Art of Multiprocessor Programming 33 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } } Wait until lock looks free
Test-and-test-and-set Lock
Art of Multiprocessor Programming 34 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet( true )) return ; } } Then try to acquire it
Mystery #2
Art of Multiprocessor Programming 35 Mystery #2 TAS lock TTAS lock Ideal time threads
Mystery
Art of Multiprocessor Programming 36 Mystery Both TAS and TTAS Do the same thing (in our model) Except that TTAS performs much better than TAS Neither approaches ideal
Opinion
Art of Multiprocessor Programming 37 Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren’t (in field tests) Need a more detailed model …
Bus-Based Architectures
Art of Multiprocessor Programming 38 Bus-Based Architectures Bus cache memory cache cache
Bus-Based Architectures
Art of Multiprocessor Programming 39 Bus-Based Architectures Bus cache memory cache cache Random access memory (10s of cycles)
Bus-Based Architectures
Art of Multiprocessor Programming 40 Bus-Based Architectures cache memory cache cache Shared Bus Broadcast medium One broadcaster at a time Processors and memory all “snoop” Bus
Bus-Based Architectures
Art of Multiprocessor Programming 41 Bus-Based Architectures Bus cache memory cache cache Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information
Jargon Watch
Art of Multiprocessor Programming 42 Jargon Watch Cache hit “I found what I wanted in my cache” Good Thing™
Jargon Watch
Art of Multiprocessor Programming 43 Jargon Watch Cache hit “I found what I wanted in my cache” Good Thing™ Cache miss “I had to shlep all the way to memory for that data” Bad Thing™
Cave Canem
Art of Multiprocessor Programming 44 Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Will discuss complexities later
Processor Issues Load Request
Art of Multiprocessor Programming 45 Bus Processor Issues Load Request cache memory cache cache data
Processor Issues Load Request
Art of Multiprocessor Programming 46 Bus Processor Issues Load Request Bus cache memory cache cache data Gimme data
Memory Responds
Art of Multiprocessor Programming 47 cache Bus Memory Responds Bus memory cache cache data Got your data right here data
Processor Issues Load Request
Art of Multiprocessor Programming 48 Bus Processor Issues Load Request memory cache cache data data Gimme data
Processor Issues Load Request
Art of Multiprocessor Programming 49 Bus Processor Issues Load Request Bus memory cache cache data data Gimme data
Processor Issues Load Request
Art of Multiprocessor Programming 50 Bus Processor Issues Load Request Bus memory cache cache data data I got data
Other Processor Responds
Art of Multiprocessor Programming 51 Bus Other Processor Responds memory cache cache data I got data data data Bus
Other Processor Responds
Art of Multiprocessor Programming 52 Bus Other Processor Responds memory cache cache data data data Bus
Modify Cached Data
Art of Multiprocessor Programming 53 Modify Cached Data Bus data memory cache data data (1)
Modify Cached Data
Art of Multiprocessor Programming 54 Modify Cached Data Bus data memory cache data data data (1)
Modify Cached Data
Art of Multiprocessor Programming 55 memory Bus data Modify Cached Data cache data data
Modify Cached Data
Art of Multiprocessor Programming 56 memory Bus data Modify Cached Data cache What’s up with the other copies? data data
Cache Coherence
Art of Multiprocessor Programming 57 Cache Coherence We have lots of copies of data Original copy in memory Cached copies at processors Some processor modifies its own copy What do we do with the others? How to avoid confusion?
Write-Back Caches
Art of Multiprocessor Programming 58 Write-Back Caches Accumulate changes in cache Write back when needed Need the cache for something else Another processor wants it On first modification Invalidate other entries Requires non-trivial protocol …
Write-Back Caches
Art of Multiprocessor Programming 59 Write-Back Caches Cache entry has three states Invalid: contains raw seething bits Valid : I can read but I can’t write Dirty : Data has been modified Intercept other load requests Write back to memory before using cache
Invalidate
Art of Multiprocessor Programming 60 Bus Invalidate memory cache data data data
Invalidate
Art of Multiprocessor Programming 61 Bus Invalidate Bus memory cache data data data Mine, all mine!
Invalidate
Art of Multiprocessor Programming 62 Bus Invalidate Bus memory cache data data data cache Uh,oh
Invalidate
Art of Multiprocessor Programming 63 cache Bus Invalidate memory cache data data Other caches lose read permission
Invalidate
Art of Multiprocessor Programming 64 cache Bus Invalidate memory cache data data Other caches lose read permission This cache acquires write permission
Invalidate
Art of Multiprocessor Programming 65 cache Bus Invalidate memory cache data data Memory provides data only if not present in any cache, so no need to change it now (expensive)
Another Processor Asks for Data
Art of Multiprocessor Programming 66 cache Bus Another Processor Asks for Data memory cache data data Bus
Owner Responds
Art of Multiprocessor Programming 67 cache data Bus Owner Responds memory cache data data Bus Here it is!
End of the Day …
Art of Multiprocessor Programming 68 Bus End of the Day … memory cache data data Reading OK, no writing data data
Mutual Exclusion
Art of Multiprocessor Programming 69 Mutual Exclusion What do we want to optimize? Bus bandwidth used by spinning threads Release/Acquire latency Acquire latency for idle lock
Simple TASLock
Art of Multiprocessor Programming 70 Simple TASLock TAS invalidates cache lines Spinners Miss in cache Go to bus Thread wants to release lock delayed behind spinners
Test-and-test-and-set
Art of Multiprocessor Programming 71 Test-and-test-and-set Wait until lock “looks” free Spin on local cache No bus use while lock busy Problem: when lock is released Invalidation storm …
Local Spinning while Lock is Busy
Art of Multiprocessor Programming 72 Local Spinning while Lock is Busy Bus memory busy busy busy busy
On Release
Art of Multiprocessor Programming 73 Bus On Release memory free invalid invalid free
On Release
Art of Multiprocessor Programming 74 On Release Bus memory free invalid invalid free miss miss Everyone misses, rereads (1)
On Release
Art of Multiprocessor Programming 75 On Release Bus memory free invalid invalid free TAS(…) TAS(…) Everyone tries TAS (1)
Problems
Art of Multiprocessor Programming 76 Problems Everyone misses Reads satisfied sequentially Everyone does TAS Invalidates others’ caches Eventually quiesces after lock acquired How long does this take?
Measuring Quiescence Time
Measuring Quiescence Time Acquire lock Pause without using bus Use bus heavily Art of Multiprocessor Programming 77 1 P 2 P n P If pause > quiescence time, critical section duration independent of number of threads If pause < quiescence time, critical section duration slower with more threads
Quiescence Time
Art of Multiprocessor Programming 78 Quiescence Time Increses linearly with the number of processors for bus architecture time threads
Mystery Explained
Art of Multiprocessor Programming 79 Mystery Explained TAS lock TTAS lock Ideal time threads Better than TAS but still not as good as ideal
Solution: Introduce Delay
Art of Multiprocessor Programming 80 Solution: Introduce Delay spin lock time d d 1 r d 2 r If the lock looks free But I fail to get it There must be contention Better to back off than to collide again
Dynamic Example: Exponential Backoff
Art of Multiprocessor Programming 81 Dynamic Example: Exponential Backoff time d 2d 4d spin lock If I fail to get lock wait random duration before retry Each subsequent failure doubles expected wait
Exponential Backoff Lock
Art of Multiprocessor Programming 82 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while ( true ) { while (state.get()) {} if (!lock.getAndSet(true)) return ; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Exponential Backoff Lock
Art of Multiprocessor Programming 83 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
Exponential Backoff Lock
Art of Multiprocessor Programming 84 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
Exponential Backoff Lock
Art of Multiprocessor Programming 85 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return ; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
Exponential Backoff Lock
Art of Multiprocessor Programming 86 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Back off for random duration
Exponential Backoff Lock
Art of Multiprocessor Programming 87 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Double max delay, within reason
Spin-Waiting Overhead
Art of Multiprocessor Programming 88 Spin-Waiting Overhead TTAS Lock Backoff lock time threads
Backoff: Other Issues
Art of Multiprocessor Programming 89 Backoff: Other Issues Good Easy to implement Beats TTAS lock Bad Must choose parameters carefully Not portable across platforms
Idea
Art of Multiprocessor Programming 90 Idea Avoid useless invalidations By keeping a queue of threads Each thread Notifies next in line Without bothering the others
Anderson Queue Lock
Art of Multiprocessor Programming 91 Anderson Queue Lock flags next T F F F F F F F idle
Anderson Queue Lock
Art of Multiprocessor Programming 92 Anderson Queue Lock flags next T F F F F F F F acquiring getAndIncrement
Anderson Queue Lock
Art of Multiprocessor Programming 93 Anderson Queue Lock flags next T F F F F F F F acquiring getAndIncrement
Anderson Queue Lock
Art of Multiprocessor Programming 94 Anderson Queue Lock flags next T F F F F F F F acquired Mine!
Anderson Queue Lock
Art of Multiprocessor Programming 95 Anderson Queue Lock flags next T F F F F F F F acquired acquiring
Anderson Queue Lock
Art of Multiprocessor Programming 96 Anderson Queue Lock flags next T F F F F F F F acquired acquiring getAndIncrement
Anderson Queue Lock
Art of Multiprocessor Programming 97 Anderson Queue Lock flags next T F F F F F F F acquired acquiring getAndIncrement
Anderson Queue Lock
Art of Multiprocessor Programming 98 acquired Anderson Queue Lock flags next T F F F F F F F acquiring
Anderson Queue Lock
Art of Multiprocessor Programming 99 released Anderson Queue Lock flags next T T F F F F F F acquired
Anderson Queue Lock
Art of Multiprocessor Programming 100 released Anderson Queue Lock flags next T T F F F F F F acquired Yow!
Anderson Queue Lock
Art of Multiprocessor Programming 101 Anderson Queue Lock class ALock implements Lock { boolean[] flags={ true , false ,…, false }; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;
Anderson Queue Lock
Art of Multiprocessor Programming 102 Anderson Queue Lock class ALock implements Lock { boolean[] flags={ true , false ,…, false }; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; One flag per thread
Anderson Queue Lock
Art of Multiprocessor Programming 103 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; Next flag to use
Anderson Queue Lock
Art of Multiprocessor Programming 104 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; Thread-local variable
Anderson Queue Lock
Art of Multiprocessor Programming 105 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false ; } public unlock() { flags[(mySlot+1) % n] = true ; }
Anderson Queue Lock
Art of Multiprocessor Programming 106 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Take next slot
Anderson Queue Lock
Art of Multiprocessor Programming 107 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Spin until told to go
Anderson Queue Lock
Art of Multiprocessor Programming 108 Anderson Queue Lock public lock() { myslot = next.getAndIncrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false ; } public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for re-use
Anderson Queue Lock
Art of Multiprocessor Programming 109 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true ; } Tell next thread to go
Local Spinning
Art of Multiprocessor Programming 110 released Local Spinning flags next T F F F F F F F acquired Spin on my bit Unfortunately many bits share cache line
False Sharing
111 released False Sharing flags next T F F F F F F F acquired Spin on my bit Line 1 Line 2 Spinning thread gets cache invalidation on account of store by threads it is not waiting for Result: contention Art of Multiprocessor Programming
The Solution: Padding
112 released The Solution: Padding flags next T / / / F / / / acquired Line 1 Line 2 Art of Multiprocessor Programming Spin on my line
Performance
Art of Multiprocessor Programming 113 Performance Shorter handover than backoff Curve is practically flat Scalable performance queue TTAS
Anderson Queue Lock
Art of Multiprocessor Programming 114 Anderson Queue Lock Good First truly scalable lock Simple, easy to implement Back to FIFO order (like Bakery)
Anderson Queue Lock
Art of Multiprocessor Programming 115 Anderson Queue Lock Bad Space hog… One bit per thread one cache line per thread What if unknown number of threads? What if small number of actual contenders?
CLH Lock
Art of Multiprocessor Programming 116 CLH Lock FIFO order Small, constant-size overhead per thread
Initially
Art of Multiprocessor Programming 117 Initially false tail idle
Initially
Art of Multiprocessor Programming 118 Initially false tail idle Queue tail
Initially
Art of Multiprocessor Programming 119 Initially false tail idle Lock is free
Initially
Art of Multiprocessor Programming 120 Initially false tail idle
Purple Wants the Lock
Art of Multiprocessor Programming 121 Purple Wants the Lock false tail acquiring
Purple Wants the Lock
Art of Multiprocessor Programming 122 Purple Wants the Lock false tail acquiring true
Purple Wants the Lock
Art of Multiprocessor Programming 123 Purple Wants the Lock false tail acquiring true Swap
Purple Has the Lock
Art of Multiprocessor Programming 124 Purple Has the Lock false tail acquired true
Red Wants the Lock
Art of Multiprocessor Programming 125 Red Wants the Lock false tail acquired acquiring true true
Red Wants the Lock
Art of Multiprocessor Programming 126 Red Wants the Lock false tail acquired acquiring true Swap true
Red Wants the Lock
Art of Multiprocessor Programming 127 Red Wants the Lock false tail acquired acquiring true true
Red Wants the Lock
Art of Multiprocessor Programming 128 Red Wants the Lock false tail acquired acquiring true true
Red Wants the Lock
Art of Multiprocessor Programming 129 Red Wants the Lock false tail acquired acquiring true true Implicit Linked list
Red Wants the Lock
Art of Multiprocessor Programming 130 Red Wants the Lock false tail acquired acquiring true true
Red Wants the Lock
Art of Multiprocessor Programming 131 Red Wants the Lock false tail acquired acquiring true true true Actually, it spins on cached copy
Purple Releases
Art of Multiprocessor Programming 132 Purple Releases false tail release acquiring false true false Bingo!
Purple Releases
Art of Multiprocessor Programming 133 Purple Releases tail released acquired true
Space Usage
Art of Multiprocessor Programming 134 Space Usage Let L = number of locks N = number of threads ALock O(LN) CLH lock O(L+N)
CLH Queue Lock
Art of Multiprocessor Programming 135 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean( true ); }
CLH Queue Lock
Art of Multiprocessor Programming 136 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean( true ); } Not released yet
CLH Queue Lock
Art of Multiprocessor Programming 137 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }}
CLH Queue Lock
Art of Multiprocessor Programming 138 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Queue tail
CLH Queue Lock
Art of Multiprocessor Programming 139 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Thread-local Qnode
CLH Queue Lock
Art of Multiprocessor Programming 140 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Swap in my node
CLH Queue Lock
Art of Multiprocessor Programming 141 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Spin until predecessor releases lock
CLH Queue Lock
Art of Multiprocessor Programming 142 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { myNode.locked.set( false ); myNode = pred; } }
CLH Queue Lock
Art of Multiprocessor Programming 143 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { myNode.locked.set( false ); myNode = pred; } } Notify successor
CLH Queue Lock
Art of Multiprocessor Programming 144 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { myNode.locked.set(false); myNode = pred; } } Recycle predecessor’s node
CLH Queue Lock
Art of Multiprocessor Programming 145 CLH Queue Lock Class CLHLock implements Lock { public void unlock() { myNode.locked.set(false); myNode = pred; } } (we don’t actually reuse myNode. Code in book shows how it’s done.)
CLH Lock
Art of Multiprocessor Programming 146 CLH Lock Good Lock release affects predecessor only Small, constant-sized space Bad Doesn’t work for uncached NUMA architectures
NUMA Architecturs
Art of Multiprocessor Programming 147 NUMA Architecturs Acronym: N on- U niform M emory A rchitecture Illusion: Flat shared memory Truth: No caches (sometimes) Some memory regions faster than others
NUMA Machines
Art of Multiprocessor Programming 148 NUMA Machines Spinning on local memory is fast
NUMA Machines
Art of Multiprocessor Programming 149 NUMA Machines Spinning on remote memory is slow
CLH Lock
Art of Multiprocessor Programming 150 CLH Lock Each thread spins on predecessor’s memory Could be far away …
MCS Lock
Art of Multiprocessor Programming 151 MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead
Initially
Art of Multiprocessor Programming 152 Initially false false idle tail
Acquiring
Art of Multiprocessor Programming 153 Acquiring false false true acquiring (allocate Qnode) tail
Acquiring
Art of Multiprocessor Programming 154 Acquiring false tail false true acquired swap
Acquiring
Art of Multiprocessor Programming 155 Acquiring false tail false true acquired
Acquired
Art of Multiprocessor Programming 156 Acquired false tail false true acquired
Acquiring
Art of Multiprocessor Programming 157 Acquiring tail false acquired acquiring true swap
Acquiring
Art of Multiprocessor Programming 158 Acquiring tail acquired acquiring true false
Acquiring
Art of Multiprocessor Programming 159 Acquiring tail acquired acquiring true false
Acquiring
Art of Multiprocessor Programming 160 Acquiring tail acquired acquiring true false
Acquiring
Art of Multiprocessor Programming 161 Acquiring tail acquired acquiring true true false
Acquiring
Art of Multiprocessor Programming 162 Acquiring tail acquired acquiring true true Yes! false
MCS Queue Lock
Art of Multiprocessor Programming 163 MCS Queue Lock class Qnode { boolean locked = false ; qnode next = null ; }
MCS Queue Lock
Art of Multiprocessor Programming 164 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null ) { qnode.locked = true ; pred.next = qnode; while (qnode.locked) {} }}}
MCS Queue Lock
Art of Multiprocessor Programming 165 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Make a QNode
MCS Queue Lock
Art of Multiprocessor Programming 166 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} add my Node to the tail of queue
MCS Queue Lock
Art of Multiprocessor Programming 167 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null ) { qnode.locked = true ; pred.next = qnode; while (qnode.locked) {} }}} Fix if queue was non-empty
MCS Queue Lock
Art of Multiprocessor Programming 168 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Wait until unlocked
MCS Queue Unlock
Art of Multiprocessor Programming 169 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null ) { if (tail.CAS(qnode, null ) return ; while (qnode.next == null ) {} } qnode.next.locked = false ; }}
MCS Queue Lock
Art of Multiprocessor Programming 170 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null ) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Missing successor?
MCS Queue Lock
Art of Multiprocessor Programming 171 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null ) return ; while (qnode.next == null) {} } qnode.next.locked = false; }} If really no successor, return
MCS Queue Lock
Art of Multiprocessor Programming 172 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null ) {} } qnode.next.locked = false; }} Otherwise wait for successor to catch up
MCS Queue Lock
Art of Multiprocessor Programming 173 MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false ; }} Pass lock to successor
Purple Release
Art of Multiprocessor Programming 174 Purple Release false releasing swap false
Purple Release
Art of Multiprocessor Programming 175 Purple Release false releasing swap false By looking at the queue, I see another thread is active
Purple Release
Art of Multiprocessor Programming 176 Purple Release false releasing swap false I have to wait for that thread to finish By looking at the queue, I see another thread is active
Purple Release
Art of Multiprocessor Programming 177 Purple Release false releasing prepare to spin true
Purple Release
Art of Multiprocessor Programming 178 Purple Release false releasing spinning true
Purple Release
Art of Multiprocessor Programming 179 Purple Release false releasing spinning true false
Purple Release
Art of Multiprocessor Programming 180 Purple Release false releasing true Acquired lock false
Abortable Locks
Art of Multiprocessor Programming 181 Abortable Locks What if you want to give up waiting for a lock? For example Timeout Database transaction aborted by user
Back-off Lock
Art of Multiprocessor Programming 182 Back-off Lock Aborting is trivial Just return from lock() call Extra benefit: No cleaning up Wait-free Immediate return
Queue Locks
Art of Multiprocessor Programming 183 Queue Locks Can’t just quit Thread in line behind will starve Need a graceful way out
Queue Locks
Art of Multiprocessor Programming 184 Queue Locks spinning true spinning true true spinning
Queue Locks
Art of Multiprocessor Programming 185 Queue Locks spinning true spinning true false locked
Queue Locks
Art of Multiprocessor Programming 186 Queue Locks spinning true locked false
Queue Locks
Art of Multiprocessor Programming 187 Queue Locks locked false
Queue Locks
Art of Multiprocessor Programming 188 Queue Locks spinning true spinning true true spinning
Queue Locks
Art of Multiprocessor Programming 189 Queue Locks spinning true true true spinning
Queue Locks
Art of Multiprocessor Programming 190 Queue Locks spinning true true false locked
Queue Locks
Art of Multiprocessor Programming 191 Queue Locks spinning true false
Queue Locks
Art of Multiprocessor Programming 192 Queue Locks pwned true false
Abortable CLH Lock
Art of Multiprocessor Programming 193 Abortable CLH Lock When a thread gives up Removing node in a wait-free way is hard Idea: let successor deal with it.
Initially
Art of Multiprocessor Programming 194 Initially tail idle Pointer to predecessor (or null) A
Initially
Art of Multiprocessor Programming 195 Initially tail idle Distinguished available node means lock is free A
Acquiring
Art of Multiprocessor Programming 196 Acquiring tail acquiring A
Acquiring
Art of Multiprocessor Programming 197 Acquiring acquiring A Null predecessor means lock not released or aborted
Acquiring
Art of Multiprocessor Programming 198 Acquiring acquiring A Swap
Acquiring
Art of Multiprocessor Programming 199 Acquiring acquiring A
Acquired
Art of Multiprocessor Programming 200 Acquired locked A Reference to AVAILABLE means lock is free .
Normal Case
spinning spinning locked Art of Multiprocessor Programming 201 Normal Case Null means lock is not free & request not aborted
One Thread Aborts
Art of Multiprocessor Programming 202 One Thread Aborts spinning Timed out locked
Successor Notices
Art of Multiprocessor Programming 203 Successor Notices spinning Timed out locked Non-Null means predecessor aborted
Recycle Predecessor’s Node
Art of Multiprocessor Programming 204 Recycle Predecessor’s Node spinning locked
Spin on Earlier Node
Art of Multiprocessor Programming 205 Spin on Earlier Node spinning locked
Spin on Earlier Node
Art of Multiprocessor Programming 206 Spin on Earlier Node spinning released A The lock is now mine
Time-out Lock
Art of Multiprocessor Programming 207 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;
Time-out Lock
Art of Multiprocessor Programming 208 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; AVAILABLE node signifies free lock
Time-out Lock
Art of Multiprocessor Programming 209 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; Tail of the queue
Time-out Lock
Art of Multiprocessor Programming 210 Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; Remember my node …
Time-out Lock
Art of Multiprocessor Programming 211 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred== null || myPred.prev == AVAILABLE) { return true; }
Time-out Lock
Art of Multiprocessor Programming 212 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred == null || myPred.prev == AVAILABLE) { return true; } Create & initialize node
Time-out Lock
Art of Multiprocessor Programming 213 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred == null || myPred.prev == AVAILABLE) { return true; } Swap with tail
Time-out Lock
Art of Multiprocessor Programming 214 Time-out Lock public boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode myPred = tail.getAndSet(qnode); if (myPred == null || myPred.prev == AVAILABLE) { return true; } ... If predecessor absent or released, we are done
Time-out Lock
Art of Multiprocessor Programming 215 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null ) { myPred = predPred; } } spinning spinning locked
Time-out Lock
Art of Multiprocessor Programming 216 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } } Keep trying for a while …
Time-out Lock
Art of Multiprocessor Programming 217 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } } Spin on predecessor’s prev field
Time-out Lock
Art of Multiprocessor Programming 218 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { myPred = predPred; } } Predecessor released lock
Time-out Lock
Art of Multiprocessor Programming 219 Time-out Lock long start = now(); while (now()- start < timeout) { Qnode predPred = myPred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null ) { myPred = predPred; } } Predecessor aborted, advance one
Time-out Lock
Art of Multiprocessor Programming 220 Time-out Lock if (!tail.compareAndSet(qnode, myPred)) qnode.prev = myPred; return false; } } What do I do when I time out?
Time-out Lock
Art of Multiprocessor Programming 221 Time-out Lock if (!tail.compareAndSet(qnode, myPred)) qnode.prev = myPred; return false; } } Do I have a successor ? If CAS fails, I do. Tell it about myPred
Time-out Lock
Art of Multiprocessor Programming 222 Time-out Lock if (!tail.compareAndSet(qnode, myPred)) qnode.prev = myPred; return false; } } If CAS succeeds: no successor, simply return false
Time-Out Unlock
Art of Multiprocessor Programming 223 Time-Out Unlock public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null )) qnode.prev = AVAILABLE; }
Time-out Unlock
Art of Multiprocessor Programming 224 public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null )) qnode.prev = AVAILABLE; } Time-out Unlock If CAS failed : successor exists, notify it can enter
Timing-out Lock
Art of Multiprocessor Programming 225 public void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null )) qnode.prev = AVAILABLE; } Timing-out Lock CAS successful: set tail to null, no clean up since no successor waiting
One Lock To Rule Them All?
Art of Multiprocessor Programming 226 One Lock To Rule Them All? TTAS+Backoff, CLH, MCS, ToLock… Each better than others in some way There is no one solution Lock we pick really depends on: the application the hardware which properties are important
Art of Multiprocessor Programming 227   creativecommons.org/licenses/by-sa/2.5/ Creative Commons Attribution-ShareAlike 2.5 License . You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions : Attribution . You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike . If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.