Transactional Memory
Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A
Our Vision for the Future
2 Our Vision for the Future In this course, we covered …. Best practices … New and clever ideas … And common-sense observations. Art of Multiprocessor Programming
Our Vision for the Future
3 Our Vision for the Future In this course, we covered …. Best practices … New and clever ideas … And common-sense observations. Nevertheless … Concurrent programming is still too hard … Here we explore why this is …. And what we can do about it. Art of Multiprocessor Programming
Locking
4 Locking Art of Multiprocessor Programming
Coarse-Grained Locking
5 Coarse-Grained Locking Easily made correct … But not scalable. Art of Multiprocessor Programming
Fine-Grained Locking
6 Fine-Grained Locking Can be tricky … Art of Multiprocessor Programming
Locks are not Robust
7 Locks are not Robust If a thread holding a lock is delayed … No one else can make progress Art of Multiprocessor Programming
Locking Relies on Conventions
Locking Relies on Conventions Relation between Lock bit and object bits Exists only in programmer’s mind /* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */ Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul) Art of Multiprocessor Programming
Simple Problems are hard
9 Simple Problems are hard enq(x) enq(y) double-ended queue No interference if ends “far apart” Interference OK if queue is small Clean solution is publishable result: [Michael & Scott PODC 97] Art of Multiprocessor Programming
Locks Not Composable
Art of Multiprocessor Programming 10 Locks Not Composable Transfer item from one queue to another Must be atomic : No duplicate or missing items
Locks Not Composable
Art of Multiprocessor Programming 11 Locks Not Composable Lock source Lock target Unlock source & target
Locks Not Composable
Art of Multiprocessor Programming 12 Locks Not Composable Lock source Lock target Unlock source & target Methods cannot provide internal synchronization Objects must expose locking protocols to clients Clients must devise and follow protocols Abstraction broken!
Monitor Wait and Signal
13 Monitor Wait and Signal zzz Empty buffer Yes! Art of Multiprocessor Programming If buffer is empty, wait for item to show up
Wait and Signal do not Compose
14 Wait and Signal do not Compose empty empty zzz… Art of Multiprocessor Programming Wait for either?
The Transactional Manifesto
Art of Multiprocessor Programming 15 15 The Transactional Manifesto Current practice inadequate to meet the multicore challenge Research Agenda Replace locking with a transactional API Design languages or libraries Implement efficient run-times
Transactions
Art of Multiprocessor Programming 16 16 Transactions Block of code …. Atomic : appears to happen instantaneously Serializable : all appear to happen in one-at-a-time order Commit : takes effect (atomically) Abort : has no effect (typically restarted)
Atomic Blocks
Art of Multiprocessor Programming 17 17 atomic { x.remove(3); y.add(3); } atomic { y = null ; } Atomic Blocks
Atomic Blocks
Art of Multiprocessor Programming 18 18 atomic { x.remove(3); y.add(3); } atomic { y = null ; } Atomic Blocks No data race
A Double-Ended Queue
Art of Multiprocessor Programming 19 19 public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this .left; this .left.right = q; this .left = q; } A Double-Ended Queue Write sequential Code
A Double-Ended Queue
Art of Multiprocessor Programming 20 20 public void LeftEnq(item x) atomic { Qnode q = new Qnode(x); q.left = this .left; this .left.right = q; this .left = q; } } A Double-Ended Queue
A Double-Ended Queue
Art of Multiprocessor Programming 21 21 public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } } A Double-Ended Queue Enclose in atomic block
Warning
Art of Multiprocessor Programming 22 22 Warning Not always this simple Conditional waits Enhanced concurrency Complex patterns But often it is…
Composition?
Art of Multiprocessor Programming 23 Composition?
Composition?
Art of Multiprocessor Programming 24 Composition? public void Transfer(Queue<T> q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } } Trivial or what?
Conditional Waiting
Art of Multiprocessor Programming 25 25 public T LeftDeq() { atomic { if ( this .left == null ) retry ; } } Conditional Waiting Roll back transaction and restart when something changes
Composable Conditional Waiting
Art of Multiprocessor Programming 26 26 Composable Conditional Waiting atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1 st method. If it retries … Run 2 nd method. If it retries … Entire statement retries
Hardware Transactional Memory
Art of Multiprocessor Programming 27 27 Hardware Transactional Memory Exploit Cache coherence Already almost does it Invalidation Consistency checking Speculative execution Branch prediction = optimistic synch!
HW Transactional Memory
Art of Multiprocessor Programming 28 28 HW Transactional Memory Interconnect caches memory read active T
Transactional Memory
Art of Multiprocessor Programming 29 29 Transactional Memory read active T T active caches memory
Transactional Memory
Art of Multiprocessor Programming 30 30 Transactional Memory active T T active committed caches memory
Transactional Memory
Art of Multiprocessor Programming 31 31 Transactional Memory write active committed T D caches memory
Rewind
Art of Multiprocessor Programming 32 32 Rewind active T T active write aborted D caches memory
Transaction Commit
Art of Multiprocessor Programming 33 33 Transaction Commit At commit point If no cache conflicts, we win. Mark transactional entries Read-only: valid Modified: dirty (eventually written back) That’s all, folks! Except for a few details …
Not all Skittles and Beer
Art of Multiprocessor Programming 34 34 Not all Skittles and Beer Limits to Transactional cache size Scheduling quantum Transaction cannot commit if it is Too big Too slow Actual limits platform-dependent
HTM Strengths & Weaknesses
HTM Strengths & Weaknesses Ideal for lock-free data structures
HTM Strengths & Weaknesses
HTM Strengths & Weaknesses Ideal for lock-free data structures Practical proposals have limits on Transaction size and length Bounded HW resources Guarantees vs best-effort
HTM Strengths & Weaknesses
HTM Strengths & Weaknesses Ideal for lock-free data structures Practical proposals have limits on Transaction size and length Bounded HW resources Guarantees vs best-effort On fail Diagnostics essential Retry in software?
Composition
Composition Locks don’t compose, transactions do. Composition necessary for Software Engineering. But practical HTM doesn’t really support composition! Why we need STM
Simple Lock-Based STM
Art of Multiprocessor Programming 39 Simple Lock-Based STM STMs come in different forms Lock-based Lock-free Here : a simple lock-based STM
Synchronization
Art of Multiprocessor Programming 40 Synchronization Transaction keeps Read set : locations & values read Write set : locations & values to be written Deferred update Changes installed at commit Lazy conflict detection Conflicts detected at commit
STM: Transactional Locking
Art of Multiprocessor Programming 41 41 STM: Transactional Locking Map Application Memory V# V# V# Array of version #s & locks
Reading an Object
Art of Multiprocessor Programming 42 42 Reading an Object Mem Locks V# V# V# V# V# Add version numbers & values to read set
To Write an Object
Art of Multiprocessor Programming 43 43 To Write an Object Mem Locks V# V# V# V# V# Add version numbers & new values to write set
To Commit
Art of Multiprocessor Programming 44 44 To Commit Mem Locks V# V# V# V# V# X Y V#+1 V#+1 Acquire write locks Check version numbers unchanged Install new values Increment version numbers Unlock.
Problem: Internal Inconsistency
Problem: Internal Inconsistency A Zombie is an active transaction destined to abort. If Zombies see inconsistent states bad things can happen
Internal Consistency
Art of Multiprocessor Programming 46 Internal Consistency x y 4 2 8 4 Invariant: x = 2y Transaction A: reads x = 4 Transaction B : writes 8 to x, 16 to y, aborts A ) Transaction A: (zombie) reads y = 4 computes 1/(x-y) Divide by zero FAIL!
Solution: The Global Clock
Art of Multiprocessor Programming 47 Solution: The Global Clock Have one shared global clock Incremented by (small subset of) writing transactions Read by all transactions Used to validate that state worked on is always consistent
Read-Only Transactions
100 Art of Multiprocessor Programming 48 48 Read-Only Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock
Read-Only Transactions
100 Art of Multiprocessor Programming 49 49 Read-Only Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory
Read-Only Transactions
100 Art of Multiprocessor Programming 50 50 Read-Only Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version # unchanged
Read-Only Transactions
Art of Multiprocessor Programming 51 51 Read-Only Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version # unchanged Check that version #s less than local read clock 100
Read-Only Transactions
Art of Multiprocessor Programming 52 52 Read-Only Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock Read lock, version #, and memory On Commit: check unlocked & version # unchanged Check that version #s less than local read clock 100 We have taken a snapshot without keeping an explicit read set!
Regular Transactions
100 Art of Multiprocessor Programming 53 53 Regular Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock
Regular Transactions
100 Art of Multiprocessor Programming 54 54 Regular Transactions Mem Locks 12 32 56 19 17 100 Shared Version Clock Private Read Version (RV) Copy version clock to local read version clock On read/write, check: Unlocked & version # < RV Add to R/W set
On Commit
Art of Multiprocessor Programming 55 55 On Commit Mem Locks 100 Shared Version Clock 100 12 32 56 19 17 Private Read Version (RV) Acquire write locks
On Commit
Art of Multiprocessor Programming 56 56 On Commit Mem Locks 100 Shared Version Clock 100 101 12 32 56 19 17 Private Read Version (RV) Acquire write locks Increment Version Clock
On Commit
Art of Multiprocessor Programming 57 57 On Commit Mem Locks 100 Shared Version Clock 100 101 12 32 56 19 17 Private Read Version (RV) Acquire write locks Increment Version Clock Check version numbers ≤ RV
On Commit
Art of Multiprocessor Programming 58 58 On Commit Mem Locks 100 Shared Version Clock 100 101 12 32 56 19 17 Private Read Version (RV) Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory x y
On Commit
Art of Multiprocessor Programming 59 59 On Commit Mem Locks 100 Shared Version Clock 100 101 12 32 56 19 17 Private Read Version (RV) Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory Update write version #s x y 100 100
TM Design Issues
Art of Multiprocessor Programming 60 TM Design Issues Implementation choices Language design issues Semantic issues
Granularity
Art of Multiprocessor Programming 61 Granularity Object managed languages, Java, C#, … Easy to control interactions between transactional & non-trans threads Word C, C++, … Hard to control interactions between transactional & non-trans threads
Direct/Deferred Update
Art of Multiprocessor Programming 62 Direct/Deferred Update Deferred modify private copies & install on commit Commit requires work Consistency easier Direct Modify in place, roll back on abort Makes commit efficient Consistency harder
Conflict Detection
Art of Multiprocessor Programming 63 Conflict Detection Eager Detect before conflict arises “Contention manager” module resolves Lazy Detect on commit/abort Mixed Eager write/write, lazy read/write …
Conflict Detection
Art of Multiprocessor Programming 64 Conflict Detection Eager detection may abort transactions that could have committed. Lazy detection discards more computation.
Contention Management & Scheduling
Art of Multiprocessor Programming 65 Contention Management & Scheduling How to resolve conflicts ? Who moves forward and who rolls back ? Lots of empirical work but formal work in infancy
Contention Manager Strategies
Art of Multiprocessor Programming 66 Contention Manager Strategies Exponential backoff Priority to Oldest? Most work? Non-waiting? None Dominates But needed anyway Judgment of Solomon
I/O & System Calls?
Art of Multiprocessor Programming 67 I/O & System Calls? Some I/O revocable Provide transaction- safe libraries Undoable file system/DB calls Some not Opening cash drawer Firing missile
I/O & System Calls
Art of Multiprocessor Programming 68 I/O & System Calls One solution: make transaction irrevocable If transaction tries I/O, switch to irrevocable mode. There can be only one … Requires serial execution No explicit aborts In irrevocable transactions
Exceptions
Art of Multiprocessor Programming 69 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); }
Exceptions
Art of Multiprocessor Programming 70 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException!
Exceptions
Art of Multiprocessor Programming 71 Exceptions int i = 0; try { atomic { i++; node = new Node(); } } catch (Exception e) { print(i); } Throws OutOfMemoryException! What is printed?
Unhandled Exceptions
Art of Multiprocessor Programming 72 Unhandled Exceptions Aborts transaction Preserves invariants Safer Commits transaction Like locking semantics What if exception object refers to values modified in transaction?
Nested Transactions
Art of Multiprocessor Programming 73 Nested Transactions atomic void foo() { bar(); } atomic void bar() { }
Nested Transactions
Art of Multiprocessor Programming 74 Nested Transactions Needed for modularity Who knew that cosine() contained a transaction? Flat nesting If child aborts, so does parent First-class nesting If child aborts, partial rollback of child only
Remember 1993?
Remember 1993?
Citation Count
Citation Count
TM Today
TM Today 93,300
Second Opinion
2,210,000 Second Opinion
Hatin’ on TM
Hatin’ on TM STM is too inefficient
Hatin’ on TM
Hatin’ on TM Requires radical change in programming style
Hatin’ on TM
Hatin’ on TM Erlang-style shared nothing only true path to salvation
Hatin’ on TM
Hatin’ on TM There is nothing wrong with what we do today.
Gartner Hype Cycle
Gartner Hype Cycle Hat tip: Jeremy Kemp
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming 84 Multicore forces us to rethink almost everything I, for one, Welcome our new Multicore Overlords …
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming 85 Multicore forces us to rethink almost everything Standard approaches too complex I, for one, Welcome our new Multicore Overlords …
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming 86 Multicore forces us to rethink almost everything Standard approaches won’t scale Transactions might make life simpler… I, for one, Welcome our new Multicore Overlords …
I, for one, Welcome our new Multicore Overlords …
Art of Multiprocessor Programming 87 Multicore forces us to rethink almost everything Standard approaches won’t scale Transactions might … Multicore programming Plenty more to do… Maybe it will be you… I, for one, Welcome our new Multicore Overlords …
Thanks ! תודה
Thanks ! ‭הדות‬ Art of Multiprocessor Programming 88