Documentation/RCU/whatisRCU.rst

   1 .. _whatisrcu_doc:
   2
   3 What is RCU?  --  "Read, Copy, Update"
   4 ======================================
   5
   6 Please note that the "What is RCU?" LWN series is an excellent place
   7 to start learning about RCU:
   8
   9 | 1.    What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
  10 | 2.    What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
  11 | 3.    RCU part 3: the RCU API      http://lwn.net/Articles/264090/
  12 | 4.    The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
  13 |       2010 Big API Table           http://lwn.net/Articles/419086/
  14 | 5.    The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
  15 |       2014 Big API Table           http://lwn.net/Articles/609973/
  16
  17
  18 What is RCU?
  19
  20 RCU is a synchronization mechanism that was added to the Linux kernel
  21 during the 2.5 development effort that is optimized for read-mostly
  22 situations.  Although RCU is actually quite simple once you understand it,
  23 getting there can sometimes be a challenge.  Part of the problem is that
  24 most of the past descriptions of RCU have been written with the mistaken
  25 assumption that there is "one true way" to describe RCU.  Instead,
  26 the experience has been that different people must take different paths
  27 to arrive at an understanding of RCU.  This document provides several
  28 different paths, as follows:
  29
  30 :ref:`1.        RCU OVERVIEW <1_whatisRCU>`
  31
  32 :ref:`2.        WHAT IS RCU'S CORE API? <2_whatisRCU>`
  33
  34 :ref:`3.        WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
  35
  36 :ref:`4.        WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
  37
  38 :ref:`5.        WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
  39
  40 :ref:`6.        ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
  41
  42 :ref:`7.        ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
  43
  44 :ref:`8.        FULL LIST OF RCU APIs <8_whatisRCU>`
  45
  46 :ref:`9.        ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
  47
  48 People who prefer starting with a conceptual overview should focus on
  49 Section 1, though most readers will profit by reading this section at
  50 some point.  People who prefer to start with an API that they can then
  51 experiment with should focus on Section 2.  People who prefer to start
  52 with example uses should focus on Sections 3 and 4.  People who need to
  53 understand the RCU implementation should focus on Section 5, then dive
  54 into the kernel source code.  People who reason best by analogy should
  55 focus on Section 6.  Section 7 serves as an index to the docbook API
  56 documentation, and Section 8 is the traditional answer key.
  57
  58 So, start with the section that makes the most sense to you and your
  59 preferred method of learning.  If you need to know everything about
  60 everything, feel free to read the whole thing -- but if you are really
  61 that type of person, you have perused the source code and will therefore
  62 never need this document anyway.  ;-)
  63
  64 .. _1_whatisRCU:
  65
  66 1.  RCU OVERVIEW
  67 ----------------
  68
  69 The basic idea behind RCU is to split updates into "removal" and
  70 "reclamation" phases.  The removal phase removes references to data items
  71 within a data structure (possibly by replacing them with references to
  72 new versions of these data items), and can run concurrently with readers.
  73 The reason that it is safe to run the removal phase concurrently with
  74 readers is the semantics of modern CPUs guarantee that readers will see
  75 either the old or the new version of the data structure rather than a
  76 partially updated reference.  The reclamation phase does the work of reclaiming
  77 (e.g., freeing) the data items removed from the data structure during the
  78 removal phase.  Because reclaiming data items can disrupt any readers
  79 concurrently referencing those data items, the reclamation phase must
  80 not start until readers no longer hold references to those data items.
  81
  82 Splitting the update into removal and reclamation phases permits the
  83 updater to perform the removal phase immediately, and to defer the
  84 reclamation phase until all readers active during the removal phase have
  85 completed, either by blocking until they finish or by registering a
  86 callback that is invoked after they finish.  Only readers that are active
  87 during the removal phase need be considered, because any reader starting
  88 after the removal phase will be unable to gain a reference to the removed
  89 data items, and therefore cannot be disrupted by the reclamation phase.
  90
  91 So the typical RCU update sequence goes something like the following:
  92
  93 a.      Remove pointers to a data structure, so that subsequent
  94         readers cannot gain a reference to it.
  95
  96 b.      Wait for all previous readers to complete their RCU read-side
  97         critical sections.
  98
  99 c.      At this point, there cannot be any readers who hold references
 100         to the data structure, so it now may safely be reclaimed
 101         (e.g., kfree()d).
 102
 103 Step (b) above is the key idea underlying RCU's deferred destruction.
 104 The ability to wait until all readers are done allows RCU readers to
 105 use much lighter-weight synchronization, in some cases, absolutely no
 106 synchronization at all.  In contrast, in more conventional lock-based
 107 schemes, readers must use heavy-weight synchronization in order to
 108 prevent an updater from deleting the data structure out from under them.
 109 This is because lock-based updaters typically update data items in place,
 110 and must therefore exclude readers.  In contrast, RCU-based updaters
 111 typically take advantage of the fact that writes to single aligned
 112 pointers are atomic on modern CPUs, allowing atomic insertion, removal,
 113 and replacement of data items in a linked structure without disrupting
 114 readers.  Concurrent RCU readers can then continue accessing the old
 115 versions, and can dispense with the atomic operations, memory barriers,
 116 and communications cache misses that are so expensive on present-day
 117 SMP computer systems, even in absence of lock contention.
 118
 119 In the three-step procedure shown above, the updater is performing both
 120 the removal and the reclamation step, but it is often helpful for an
 121 entirely different thread to do the reclamation, as is in fact the case
 122 in the Linux kernel's directory-entry cache (dcache).  Even if the same
 123 thread performs both the update step (step (a) above) and the reclamation
 124 step (step (c) above), it is often helpful to think of them separately.
 125 For example, RCU readers and updaters need not communicate at all,
 126 but RCU provides implicit low-overhead communication between readers
 127 and reclaimers, namely, in step (b) above.
 128
 129 So how the heck can a reclaimer tell when a reader is done, given
 130 that readers are not doing any sort of synchronization operations???
 131 Read on to learn about how RCU's API makes this easy.
 132
 133 .. _2_whatisRCU:
 134
 135 2.  WHAT IS RCU'S CORE API?
 136 ---------------------------
 137
 138 The core RCU API is quite small:
 139
 140 a.      rcu_read_lock()
 141 b.      rcu_read_unlock()
 142 c.      synchronize_rcu() / call_rcu()
 143 d.      rcu_assign_pointer()
 144 e.      rcu_dereference()
 145
 146 There are many other members of the RCU API, but the rest can be
 147 expressed in terms of these five, though most implementations instead
 148 express synchronize_rcu() in terms of the call_rcu() callback API.
 149
 150 The five core RCU APIs are described below, the other 18 will be enumerated
 151 later.  See the kernel docbook documentation for more info, or look directly
 152 at the function header comments.
 153
 154 rcu_read_lock()
 155 ^^^^^^^^^^^^^^^
 156         void rcu_read_lock(void);
 157
 158         Used by a reader to inform the reclaimer that the reader is
 159         entering an RCU read-side critical section.  It is illegal
 160         to block while in an RCU read-side critical section, though
 161         kernels built with CONFIG_PREEMPT_RCU can preempt RCU
 162         read-side critical sections.  Any RCU-protected data structure
 163         accessed during an RCU read-side critical section is guaranteed to
 164         remain unreclaimed for the full duration of that critical section.
 165         Reference counts may be used in conjunction with RCU to maintain
 166         longer-term references to data structures.
 167
 168 rcu_read_unlock()
 169 ^^^^^^^^^^^^^^^^^
 170         void rcu_read_unlock(void);
 171
 172         Used by a reader to inform the reclaimer that the reader is
 173         exiting an RCU read-side critical section.  Note that RCU
 174         read-side critical sections may be nested and/or overlapping.
 175
 176 synchronize_rcu()
 177 ^^^^^^^^^^^^^^^^^
 178         void synchronize_rcu(void);
 179
 180         Marks the end of updater code and the beginning of reclaimer
 181         code.  It does this by blocking until all pre-existing RCU
 182         read-side critical sections on all CPUs have completed.
 183         Note that synchronize_rcu() will **not** necessarily wait for
 184         any subsequent RCU read-side critical sections to complete.
 185         For example, consider the following sequence of events::
 186
 187                  CPU 0                  CPU 1                 CPU 2
 188              ----------------- ------------------------- ---------------
 189          1.  rcu_read_lock()
 190          2.                    enters synchronize_rcu()
 191          3.                                               rcu_read_lock()
 192          4.  rcu_read_unlock()
 193          5.                     exits synchronize_rcu()
 194          6.                                              rcu_read_unlock()
 195
 196         To reiterate, synchronize_rcu() waits only for ongoing RCU
 197         read-side critical sections to complete, not necessarily for
 198         any that begin after synchronize_rcu() is invoked.
 199
 200         Of course, synchronize_rcu() does not necessarily return
 201         **immediately** after the last pre-existing RCU read-side critical
 202         section completes.  For one thing, there might well be scheduling
 203         delays.  For another thing, many RCU implementations process
 204         requests in batches in order to improve efficiencies, which can
 205         further delay synchronize_rcu().
 206
 207         Since synchronize_rcu() is the API that must figure out when
 208         readers are done, its implementation is key to RCU.  For RCU
 209         to be useful in all but the most read-intensive situations,
 210         synchronize_rcu()'s overhead must also be quite small.
 211
 212         The call_rcu() API is a callback form of synchronize_rcu(),
 213         and is described in more detail in a later section.  Instead of
 214         blocking, it registers a function and argument which are invoked
 215         after all ongoing RCU read-side critical sections have completed.
 216         This callback variant is particularly useful in situations where
 217         it is illegal to block or where update-side performance is
 218         critically important.
 219
 220         However, the call_rcu() API should not be used lightly, as use
 221         of the synchronize_rcu() API generally results in simpler code.
 222         In addition, the synchronize_rcu() API has the nice property
 223         of automatically limiting update rate should grace periods
 224         be delayed.  This property results in system resilience in face
 225         of denial-of-service attacks.  Code using call_rcu() should limit
 226         update rate in order to gain this same sort of resilience.  See
 227         checklist.rst for some approaches to limiting the update rate.
 228
 229 rcu_assign_pointer()
 230 ^^^^^^^^^^^^^^^^^^^^
 231         void rcu_assign_pointer(p, typeof(p) v);
 232
 233         Yes, rcu_assign_pointer() **is** implemented as a macro, though it
 234         would be cool to be able to declare a function in this manner.
 235         (Compiler experts will no doubt disagree.)
 236
 237         The updater uses this function to assign a new value to an
 238         RCU-protected pointer, in order to safely communicate the change
 239         in value from the updater to the reader.  This macro does not
 240         evaluate to an rvalue, but it does execute any memory-barrier
 241         instructions required for a given CPU architecture.
 242
 243         Perhaps just as important, it serves to document (1) which
 244         pointers are protected by RCU and (2) the point at which a
 245         given structure becomes accessible to other CPUs.  That said,
 246         rcu_assign_pointer() is most frequently used indirectly, via
 247         the _rcu list-manipulation primitives such as list_add_rcu().
 248
 249 rcu_dereference()
 250 ^^^^^^^^^^^^^^^^^
 251         typeof(p) rcu_dereference(p);
 252
 253         Like rcu_assign_pointer(), rcu_dereference() must be implemented
 254         as a macro.
 255
 256         The reader uses rcu_dereference() to fetch an RCU-protected
 257         pointer, which returns a value that may then be safely
 258         dereferenced.  Note that rcu_dereference() does not actually
 259         dereference the pointer, instead, it protects the pointer for
 260         later dereferencing.  It also executes any needed memory-barrier
 261         instructions for a given CPU architecture.  Currently, only Alpha
 262         needs memory barriers within rcu_dereference() -- on other CPUs,
 263         it compiles to nothing, not even a compiler directive.
 264
 265         Common coding practice uses rcu_dereference() to copy an
 266         RCU-protected pointer to a local variable, then dereferences
 267         this local variable, for example as follows::
 268
 269                 p = rcu_dereference(head.next);
 270                 return p->data;
 271
 272         However, in this case, one could just as easily combine these
 273         into one statement::
 274
 275                 return rcu_dereference(head.next)->data;
 276
 277         If you are going to be fetching multiple fields from the
 278         RCU-protected structure, using the local variable is of
 279         course preferred.  Repeated rcu_dereference() calls look
 280         ugly, do not guarantee that the same pointer will be returned
 281         if an update happened while in the critical section, and incur
 282         unnecessary overhead on Alpha CPUs.
 283
 284         Note that the value returned by rcu_dereference() is valid
 285         only within the enclosing RCU read-side critical section [1]_.
 286         For example, the following is **not** legal::
 287
 288                 rcu_read_lock();
 289                 p = rcu_dereference(head.next);
 290                 rcu_read_unlock();
 291                 x = p->address; /* BUG!!! */
 292                 rcu_read_lock();
 293                 y = p->data;    /* BUG!!! */
 294                 rcu_read_unlock();
 295
 296         Holding a reference from one RCU read-side critical section
 297         to another is just as illegal as holding a reference from
 298         one lock-based critical section to another!  Similarly,
 299         using a reference outside of the critical section in which
 300         it was acquired is just as illegal as doing so with normal
 301         locking.
 302
 303         As with rcu_assign_pointer(), an important function of
 304         rcu_dereference() is to document which pointers are protected by
 305         RCU, in particular, flagging a pointer that is subject to changing
 306         at any time, including immediately after the rcu_dereference().
 307         And, again like rcu_assign_pointer(), rcu_dereference() is
 308         typically used indirectly, via the _rcu list-manipulation
 309         primitives, such as list_for_each_entry_rcu() [2]_.
 310
 311 ..      [1] The variant rcu_dereference_protected() can be used outside
 312         of an RCU read-side critical section as long as the usage is
 313         protected by locks acquired by the update-side code.  This variant
 314         avoids the lockdep warning that would happen when using (for
 315         example) rcu_dereference() without rcu_read_lock() protection.
 316         Using rcu_dereference_protected() also has the advantage
 317         of permitting compiler optimizations that rcu_dereference()
 318         must prohibit.  The rcu_dereference_protected() variant takes
 319         a lockdep expression to indicate which locks must be acquired
 320         by the caller. If the indicated protection is not provided,
 321         a lockdep splat is emitted.  See Design/Requirements/Requirements.rst
 322         and the API's code comments for more details and example usage.
 323
 324 ..      [2] If the list_for_each_entry_rcu() instance might be used by
 325         update-side code as well as by RCU readers, then an additional
 326         lockdep expression can be added to its list of arguments.
 327         For example, given an additional "lock_is_held(&mylock)" argument,
 328         the RCU lockdep code would complain only if this instance was
 329         invoked outside of an RCU read-side critical section and without
 330         the protection of mylock.
 331
 332 The following diagram shows how each API communicates among the
 333 reader, updater, and reclaimer.
 334 ::
 335
 336
 337             rcu_assign_pointer()
 338                                     +--------+
 339             +---------------------->| reader |---------+
 340             |                       +--------+         |
 341             |                           |              |
 342             |                           |              | Protect:
 343             |                           |              | rcu_read_lock()
 344             |                           |              | rcu_read_unlock()
 345             |        rcu_dereference()  |              |
 346             +---------+                 |              |
 347             | updater |<----------------+              |
 348             +---------+                                V
 349             |                                    +-----------+
 350             +----------------------------------->| reclaimer |
 351                                                  +-----------+
 352               Defer:
 353               synchronize_rcu() & call_rcu()
 354
 355
 356 The RCU infrastructure observes the time sequence of rcu_read_lock(),
 357 rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
 358 order to determine when (1) synchronize_rcu() invocations may return
 359 to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
 360 implementations of the RCU infrastructure make heavy use of batching in
 361 order to amortize their overhead over many uses of the corresponding APIs.
 362
 363 There are at least three flavors of RCU usage in the Linux kernel. The diagram
 364 above shows the most common one. On the updater side, the rcu_assign_pointer(),
 365 synchronize_rcu() and call_rcu() primitives used are the same for all three
 366 flavors. However for protection (on the reader side), the primitives used vary
 367 depending on the flavor:
 368
 369 a.      rcu_read_lock() / rcu_read_unlock()
 370         rcu_dereference()
 371
 372 b.      rcu_read_lock_bh() / rcu_read_unlock_bh()
 373         local_bh_disable() / local_bh_enable()
 374         rcu_dereference_bh()
 375
 376 c.      rcu_read_lock_sched() / rcu_read_unlock_sched()
 377         preempt_disable() / preempt_enable()
 378         local_irq_save() / local_irq_restore()
 379         hardirq enter / hardirq exit
 380         NMI enter / NMI exit
 381         rcu_dereference_sched()
 382
 383 These three flavors are used as follows:
 384
 385 a.      RCU applied to normal data structures.
 386
 387 b.      RCU applied to networking data structures that may be subjected
 388         to remote denial-of-service attacks.
 389
 390 c.      RCU applied to scheduler and interrupt/NMI-handler tasks.
 391
 392 Again, most uses will be of (a).  The (b) and (c) cases are important
 393 for specialized uses, but are relatively uncommon.
 394
 395 .. _3_whatisRCU:
 396
 397 3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
 398 -----------------------------------------------
 399
 400 This section shows a simple use of the core RCU API to protect a
 401 global pointer to a dynamically allocated structure.  More-typical
 402 uses of RCU may be found in listRCU.rst, arrayRCU.rst, and NMI-RCU.rst.
 403 ::
 404
 405         struct foo {
 406                 int a;
 407                 char b;
 408                 long c;
 409         };
 410         DEFINE_SPINLOCK(foo_mutex);
 411
 412         struct foo __rcu *gbl_foo;
 413
 414         /*
 415          * Create a new struct foo that is the same as the one currently
 416          * pointed to by gbl_foo, except that field "a" is replaced
 417          * with "new_a".  Points gbl_foo to the new structure, and
 418          * frees up the old structure after a grace period.
 419          *
 420          * Uses rcu_assign_pointer() to ensure that concurrent readers
 421          * see the initialized version of the new structure.
 422          *
 423          * Uses synchronize_rcu() to ensure that any readers that might
 424          * have references to the old structure complete before freeing
 425          * the old structure.
 426          */
 427         void foo_update_a(int new_a)
 428         {
 429                 struct foo *new_fp;
 430                 struct foo *old_fp;
 431
 432                 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
 433                 spin_lock(&foo_mutex);
 434                 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
 435                 *new_fp = *old_fp;
 436                 new_fp->a = new_a;
 437                 rcu_assign_pointer(gbl_foo, new_fp);
 438                 spin_unlock(&foo_mutex);
 439                 synchronize_rcu();
 440                 kfree(old_fp);
 441         }
 442
 443         /*
 444          * Return the value of field "a" of the current gbl_foo
 445          * structure.  Use rcu_read_lock() and rcu_read_unlock()
 446          * to ensure that the structure does not get deleted out
 447          * from under us, and use rcu_dereference() to ensure that
 448          * we see the initialized version of the structure (important
 449          * for DEC Alpha and for people reading the code).
 450          */
 451         int foo_get_a(void)
 452         {
 453                 int retval;
 454
 455                 rcu_read_lock();
 456                 retval = rcu_dereference(gbl_foo)->a;
 457                 rcu_read_unlock();
 458                 return retval;
 459         }
 460
 461 So, to sum up:
 462
 463 -       Use rcu_read_lock() and rcu_read_unlock() to guard RCU
 464         read-side critical sections.
 465
 466 -       Within an RCU read-side critical section, use rcu_dereference()
 467         to dereference RCU-protected pointers.
 468
 469 -       Use some solid scheme (such as locks or semaphores) to
 470         keep concurrent updates from interfering with each other.
 471
 472 -       Use rcu_assign_pointer() to update an RCU-protected pointer.
 473         This primitive protects concurrent readers from the updater,
 474         **not** concurrent updates from each other!  You therefore still
 475         need to use locking (or something similar) to keep concurrent
 476         rcu_assign_pointer() primitives from interfering with each other.
 477
 478 -       Use synchronize_rcu() **after** removing a data element from an
 479         RCU-protected data structure, but **before** reclaiming/freeing
 480         the data element, in order to wait for the completion of all
 481         RCU read-side critical sections that might be referencing that
 482         data item.
 483
 484 See checklist.rst for additional rules to follow when using RCU.
 485 And again, more-typical uses of RCU may be found in listRCU.rst,
 486 arrayRCU.rst, and NMI-RCU.rst.
 487
 488 .. _4_whatisRCU:
 489
 490 4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
 491 --------------------------------------------
 492
 493 In the example above, foo_update_a() blocks until a grace period elapses.
 494 This is quite simple, but in some cases one cannot afford to wait so
 495 long -- there might be other high-priority work to be done.
 496
 497 In such cases, one uses call_rcu() rather than synchronize_rcu().
 498 The call_rcu() API is as follows::
 499
 500         void call_rcu(struct rcu_head *head, rcu_callback_t func);
 501
 502 This function invokes func(head) after a grace period has elapsed.
 503 This invocation might happen from either softirq or process context,
 504 so the function is not permitted to block.  The foo struct needs to
 505 have an rcu_head structure added, perhaps as follows::
 506
 507         struct foo {
 508                 int a;
 509                 char b;
 510                 long c;
 511                 struct rcu_head rcu;
 512         };
 513
 514 The foo_update_a() function might then be written as follows::
 515
 516         /*
 517          * Create a new struct foo that is the same as the one currently
 518          * pointed to by gbl_foo, except that field "a" is replaced
 519          * with "new_a".  Points gbl_foo to the new structure, and
 520          * frees up the old structure after a grace period.
 521          *
 522          * Uses rcu_assign_pointer() to ensure that concurrent readers
 523          * see the initialized version of the new structure.
 524          *
 525          * Uses call_rcu() to ensure that any readers that might have
 526          * references to the old structure complete before freeing the
 527          * old structure.
 528          */
 529         void foo_update_a(int new_a)
 530         {
 531                 struct foo *new_fp;
 532                 struct foo *old_fp;
 533
 534                 new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
 535                 spin_lock(&foo_mutex);
 536                 old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
 537                 *new_fp = *old_fp;
 538                 new_fp->a = new_a;
 539                 rcu_assign_pointer(gbl_foo, new_fp);
 540                 spin_unlock(&foo_mutex);
 541                 call_rcu(&old_fp->rcu, foo_reclaim);
 542         }
 543
 544 The foo_reclaim() function might appear as follows::
 545
 546         void foo_reclaim(struct rcu_head *rp)
 547         {
 548                 struct foo *fp = container_of(rp, struct foo, rcu);
 549
 550                 foo_cleanup(fp->a);
 551
 552                 kfree(fp);
 553         }
 554
 555 The container_of() primitive is a macro that, given a pointer into a
 556 struct, the type of the struct, and the pointed-to field within the
 557 struct, returns a pointer to the beginning of the struct.
 558
 559 The use of call_rcu() permits the caller of foo_update_a() to
 560 immediately regain control, without needing to worry further about the
 561 old version of the newly updated element.  It also clearly shows the
 562 RCU distinction between updater, namely foo_update_a(), and reclaimer,
 563 namely foo_reclaim().
 564
 565 The summary of advice is the same as for the previous section, except
 566 that we are now using call_rcu() rather than synchronize_rcu():
 567
 568 -       Use call_rcu() **after** removing a data element from an
 569         RCU-protected data structure in order to register a callback
 570         function that will be invoked after the completion of all RCU
 571         read-side critical sections that might be referencing that
 572         data item.
 573
 574 If the callback for call_rcu() is not doing anything more than calling
 575 kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
 576 to avoid having to write your own callback::
 577
 578         kfree_rcu(old_fp, rcu);
 579
 580 Again, see checklist.rst for additional rules governing the use of RCU.
 581
 582 .. _5_whatisRCU:
 583
 584 5.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
 585 ------------------------------------------------
 586
 587 One of the nice things about RCU is that it has extremely simple "toy"
 588 implementations that are a good first step towards understanding the
 589 production-quality implementations in the Linux kernel.  This section
 590 presents two such "toy" implementations of RCU, one that is implemented
 591 in terms of familiar locking primitives, and another that more closely
 592 resembles "classic" RCU.  Both are way too simple for real-world use,
 593 lacking both functionality and performance.  However, they are useful
 594 in getting a feel for how RCU works.  See kernel/rcu/update.c for a
 595 production-quality implementation, and see:
 596
 597         http://www.rdrop.com/users/paulmck/RCU
 598
 599 for papers describing the Linux kernel RCU implementation.  The OLS'01
 600 and OLS'02 papers are a good introduction, and the dissertation provides
 601 more details on the current implementation as of early 2004.
 602
 603
 604 5A.  "TOY" IMPLEMENTATION #1: LOCKING
 605 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 606 This section presents a "toy" RCU implementation that is based on
 607 familiar locking primitives.  Its overhead makes it a non-starter for
 608 real-life use, as does its lack of scalability.  It is also unsuitable
 609 for realtime use, since it allows scheduling latency to "bleed" from
 610 one read-side critical section to another.  It also assumes recursive
 611 reader-writer locks:  If you try this with non-recursive locks, and
 612 you allow nested rcu_read_lock() calls, you can deadlock.
 613
 614 However, it is probably the easiest implementation to relate to, so is
 615 a good starting point.
 616
 617 It is extremely simple::
 618
 619         static DEFINE_RWLOCK(rcu_gp_mutex);
 620
 621         void rcu_read_lock(void)
 622         {
 623                 read_lock(&rcu_gp_mutex);
 624         }
 625
 626         void rcu_read_unlock(void)
 627         {
 628                 read_unlock(&rcu_gp_mutex);
 629         }
 630
 631         void synchronize_rcu(void)
 632         {
 633                 write_lock(&rcu_gp_mutex);
 634                 smp_mb__after_spinlock();
 635                 write_unlock(&rcu_gp_mutex);
 636         }
 637
 638 [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
 639 much.  But here are simplified versions anyway.  And whatever you do,
 640 don't forget about them when submitting patches making use of RCU!]::
 641
 642         #define rcu_assign_pointer(p, v) \
 643         ({ \
 644                 smp_store_release(&(p), (v)); \
 645         })
 646
 647         #define rcu_dereference(p) \
 648         ({ \
 649                 typeof(p) _________p1 = READ_ONCE(p); \
 650                 (_________p1); \
 651         })
 652
 653
 654 The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
 655 and release a global reader-writer lock.  The synchronize_rcu()
 656 primitive write-acquires this same lock, then releases it.  This means
 657 that once synchronize_rcu() exits, all RCU read-side critical sections
 658 that were in progress before synchronize_rcu() was called are guaranteed
 659 to have completed -- there is no way that synchronize_rcu() would have
 660 been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
 661 promotes synchronize_rcu() to a full memory barrier in compliance with
 662 the "Memory-Barrier Guarantees" listed in:
 663
 664         Design/Requirements/Requirements.rst
 665
 666 It is possible to nest rcu_read_lock(), since reader-writer locks may
 667 be recursively acquired.  Note also that rcu_read_lock() is immune
 668 from deadlock (an important property of RCU).  The reason for this is
 669 that the only thing that can block rcu_read_lock() is a synchronize_rcu().
 670 But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
 671 so there can be no deadlock cycle.
 672
 673 .. _quiz_1:
 674
 675 Quick Quiz #1:
 676                 Why is this argument naive?  How could a deadlock
 677                 occur when using this algorithm in a real-world Linux
 678                 kernel?  How could this deadlock be avoided?
 679
 680 :ref:`Answers to Quick Quiz <9_whatisRCU>`
 681
 682 5B.  "TOY" EXAMPLE #2: CLASSIC RCU
 683 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 684 This section presents a "toy" RCU implementation that is based on
 685 "classic RCU".  It is also short on performance (but only for updates) and
 686 on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
 687 kernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
 688 are the same as those shown in the preceding section, so they are omitted.
 689 ::
 690
 691         void rcu_read_lock(void) { }
 692
 693         void rcu_read_unlock(void) { }
 694
 695         void synchronize_rcu(void)
 696         {
 697                 int cpu;
 698
 699                 for_each_possible_cpu(cpu)
 700                         run_on(cpu);
 701         }
 702
 703 Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
 704 This is the great strength of classic RCU in a non-preemptive kernel:
 705 read-side overhead is precisely zero, at least on non-Alpha CPUs.
 706 And there is absolutely no way that rcu_read_lock() can possibly
 707 participate in a deadlock cycle!
 708
 709 The implementation of synchronize_rcu() simply schedules itself on each
 710 CPU in turn.  The run_on() primitive can be implemented straightforwardly
 711 in terms of the sched_setaffinity() primitive.  Of course, a somewhat less
 712 "toy" implementation would restore the affinity upon completion rather
 713 than just leaving all tasks running on the last CPU, but when I said
 714 "toy", I meant **toy**!
 715
 716 So how the heck is this supposed to work???
 717
 718 Remember that it is illegal to block while in an RCU read-side critical
 719 section.  Therefore, if a given CPU executes a context switch, we know
 720 that it must have completed all preceding RCU read-side critical sections.
 721 Once **all** CPUs have executed a context switch, then **all** preceding
 722 RCU read-side critical sections will have completed.
 723
 724 So, suppose that we remove a data item from its structure and then invoke
 725 synchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
 726 that there are no RCU read-side critical sections holding a reference
 727 to that data item, so we can safely reclaim it.
 728
 729 .. _quiz_2:
 730
 731 Quick Quiz #2:
 732                 Give an example where Classic RCU's read-side
 733                 overhead is **negative**.
 734
 735 :ref:`Answers to Quick Quiz <9_whatisRCU>`
 736
 737 .. _quiz_3:
 738
 739 Quick Quiz #3:
 740                 If it is illegal to block in an RCU read-side
 741                 critical section, what the heck do you do in
 742                 CONFIG_PREEMPT_RT, where normal spinlocks can block???
 743
 744 :ref:`Answers to Quick Quiz <9_whatisRCU>`
 745
 746 .. _6_whatisRCU:
 747
 748 6.  ANALOGY WITH READER-WRITER LOCKING
 749 --------------------------------------
 750
 751 Although RCU can be used in many different ways, a very common use of
 752 RCU is analogous to reader-writer locking.  The following unified
 753 diff shows how closely related RCU and reader-writer locking can be.
 754 ::
 755
 756         @@ -5,5 +5,5 @@ struct el {
 757                 int data;
 758                 /* Other data fields */
 759          };
 760         -rwlock_t listmutex;
 761         +spinlock_t listmutex;
 762          struct el head;
 763
 764         @@ -13,15 +14,15 @@
 765                 struct list_head *lp;
 766                 struct el *p;
 767
 768         -       read_lock(&listmutex);
 769         -       list_for_each_entry(p, head, lp) {
 770         +       rcu_read_lock();
 771         +       list_for_each_entry_rcu(p, head, lp) {
 772                         if (p->key == key) {
 773                                 *result = p->data;
 774         -                       read_unlock(&listmutex);
 775         +                       rcu_read_unlock();
 776                                 return 1;
 777                         }
 778                 }
 779         -       read_unlock(&listmutex);
 780         +       rcu_read_unlock();
 781                 return 0;
 782          }
 783
 784         @@ -29,15 +30,16 @@
 785          {
 786                 struct el *p;
 787
 788         -       write_lock(&listmutex);
 789         +       spin_lock(&listmutex);
 790                 list_for_each_entry(p, head, lp) {
 791                         if (p->key == key) {
 792         -                       list_del(&p->list);
 793         -                       write_unlock(&listmutex);
 794         +                       list_del_rcu(&p->list);
 795         +                       spin_unlock(&listmutex);
 796         +                       synchronize_rcu();
 797                                 kfree(p);
 798                                 return 1;
 799                         }
 800                 }
 801         -       write_unlock(&listmutex);
 802         +       spin_unlock(&listmutex);
 803                 return 0;
 804          }
 805
 806 Or, for those who prefer a side-by-side listing::
 807
 808  1 struct el {                          1 struct el {
 809  2   struct list_head list;             2   struct list_head list;
 810  3   long key;                          3   long key;
 811  4   spinlock_t mutex;                  4   spinlock_t mutex;
 812  5   int data;                          5   int data;
 813  6   /* Other data fields */            6   /* Other data fields */
 814  7 };                                   7 };
 815  8 rwlock_t listmutex;                  8 spinlock_t listmutex;
 816  9 struct el head;                      9 struct el head;
 817
 818 ::
 819
 820   1 int search(long key, int *result)    1 int search(long key, int *result)
 821   2 {                                    2 {
 822   3   struct list_head *lp;              3   struct list_head *lp;
 823   4   struct el *p;                      4   struct el *p;
 824   5                                      5
 825   6   read_lock(&listmutex);             6   rcu_read_lock();
 826   7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
 827   8     if (p->key == key) {             8     if (p->key == key) {
 828   9       *result = p->data;             9       *result = p->data;
 829  10       read_unlock(&listmutex);      10       rcu_read_unlock();
 830  11       return 1;                     11       return 1;
 831  12     }                               12     }
 832  13   }                                 13   }
 833  14   read_unlock(&listmutex);          14   rcu_read_unlock();
 834  15   return 0;                         15   return 0;
 835  16 }                                   16 }
 836
 837 ::
 838
 839   1 int delete(long key)                 1 int delete(long key)
 840   2 {                                    2 {
 841   3   struct el *p;                      3   struct el *p;
 842   4                                      4
 843   5   write_lock(&listmutex);            5   spin_lock(&listmutex);
 844   6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
 845   7     if (p->key == key) {             7     if (p->key == key) {
 846   8       list_del(&p->list);            8       list_del_rcu(&p->list);
 847   9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
 848                                         10       synchronize_rcu();
 849  10       kfree(p);                     11       kfree(p);
 850  11       return 1;                     12       return 1;
 851  12     }                               13     }
 852  13   }                                 14   }
 853  14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
 854  15   return 0;                         16   return 0;
 855  16 }                                   17 }
 856
 857 Either way, the differences are quite small.  Read-side locking moves
 858 to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
 859 a reader-writer lock to a simple spinlock, and a synchronize_rcu()
 860 precedes the kfree().
 861
 862 However, there is one potential catch: the read-side and update-side
 863 critical sections can now run concurrently.  In many cases, this will
 864 not be a problem, but it is necessary to check carefully regardless.
 865 For example, if multiple independent list updates must be seen as
 866 a single atomic update, converting to RCU will require special care.
 867
 868 Also, the presence of synchronize_rcu() means that the RCU version of
 869 delete() can now block.  If this is a problem, there is a callback-based
 870 mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
 871 be used in place of synchronize_rcu().
 872
 873 .. _7_whatisRCU:
 874
 875 7.  ANALOGY WITH REFERENCE COUNTING
 876 -----------------------------------
 877
 878 The reader-writer analogy (illustrated by the previous section) is not
 879 always the best way to think about using RCU.  Another helpful analogy
 880 considers RCU an effective reference count on everything which is
 881 protected by RCU.
 882
 883 A reference count typically does not prevent the referenced object's
 884 values from changing, but does prevent changes to type -- particularly the
 885 gross change of type that happens when that object's memory is freed and
 886 re-allocated for some other purpose.  Once a type-safe reference to the
 887 object is obtained, some other mechanism is needed to ensure consistent
 888 access to the data in the object.  This could involve taking a spinlock,
 889 but with RCU the typical approach is to perform reads with SMP-aware
 890 operations such as smp_load_acquire(), to perform updates with atomic
 891 read-modify-write operations, and to provide the necessary ordering.
 892 RCU provides a number of support functions that embed the required
 893 operations and ordering, such as the list_for_each_entry_rcu() macro
 894 used in the previous section.
 895
 896 A more focused view of the reference counting behavior is that,
 897 between rcu_read_lock() and rcu_read_unlock(), any reference taken with
 898 rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
 899 though a reference-count on that object has been temporarily increased.
 900 This prevents the object from changing type.  Exactly what this means
 901 will depend on normal expectations of objects of that type, but it
 902 typically includes that spinlocks can still be safely locked, normal
 903 reference counters can be safely manipulated, and ``__rcu`` pointers
 904 can be safely dereferenced.
 905
 906 Some operations that one might expect to see on an object for
 907 which an RCU reference is held include:
 908
 909  - Copying out data that is guaranteed to be stable by the object's type.
 910  - Using kref_get_unless_zero() or similar to get a longer-term
 911    reference.  This may fail of course.
 912  - Acquiring a spinlock in the object, and checking if the object still
 913    is the expected object and if so, manipulating it freely.
 914
 915 The understanding that RCU provides a reference that only prevents a
 916 change of type is particularly visible with objects allocated from a
 917 slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  RCU operations may yield a
 918 reference to an object from such a cache that has been concurrently
 919 freed and the memory reallocated to a completely different object,
 920 though of the same type.  In this case RCU doesn't even protect the
 921 identity of the object from changing, only its type.  So the object
 922 found may not be the one expected, but it will be one where it is safe
 923 to take a reference or spinlock and then confirm that the identity
 924 matches the expectations.
 925
 926 With traditional reference counting -- such as that implemented by the
 927 kref library in Linux -- there is typically code that runs when the last
 928 reference to an object is dropped.  With kref, this is the function
 929 passed to kref_put().  When RCU is being used, such finalization code
 930 must not be run until all ``__rcu`` pointers referencing the object have
 931 been updated, and then a grace period has passed.  Every remaining
 932 globally visible pointer to the object must be considered to be a
 933 potential counted reference, and the finalization code is typically run
 934 using call_rcu() only after all those pointers have been changed.
 935
 936 To see how to choose between these two analogies -- of RCU as a
 937 reader-writer lock and RCU as a reference counting system -- it is useful
 938 to reflect on the scale of the thing being protected.  The reader-writer
 939 lock analogy looks at larger multi-part objects such as a linked list
 940 and shows how RCU can facilitate concurrency while elements are added
 941 to, and removed from, the list.  The reference-count analogy looks at
 942 the individual objects and looks at how they can be accessed safely
 943 within whatever whole they are a part of.
 944
 945 .. _8_whatisRCU:
 946
 947 8.  FULL LIST OF RCU APIs
 948 -------------------------
 949
 950 The RCU APIs are documented in docbook-format header comments in the
 951 Linux-kernel source code, but it helps to have a full list of the
 952 APIs, since there does not appear to be a way to categorize them
 953 in docbook.  Here is the list, by category.
 954
 955 RCU list traversal::
 956
 957         list_entry_rcu
 958         list_entry_lockless
 959         list_first_entry_rcu
 960         list_next_rcu
 961         list_for_each_entry_rcu
 962         list_for_each_entry_continue_rcu
 963         list_for_each_entry_from_rcu
 964         list_first_or_null_rcu
 965         list_next_or_null_rcu
 966         hlist_first_rcu
 967         hlist_next_rcu
 968         hlist_pprev_rcu
 969         hlist_for_each_entry_rcu
 970         hlist_for_each_entry_rcu_bh
 971         hlist_for_each_entry_from_rcu
 972         hlist_for_each_entry_continue_rcu
 973         hlist_for_each_entry_continue_rcu_bh
 974         hlist_nulls_first_rcu
 975         hlist_nulls_for_each_entry_rcu
 976         hlist_bl_first_rcu
 977         hlist_bl_for_each_entry_rcu
 978
 979 RCU pointer/list update::
 980
 981         rcu_assign_pointer
 982         list_add_rcu
 983         list_add_tail_rcu
 984         list_del_rcu
 985         list_replace_rcu
 986         hlist_add_behind_rcu
 987         hlist_add_before_rcu
 988         hlist_add_head_rcu
 989         hlist_add_tail_rcu
 990         hlist_del_rcu
 991         hlist_del_init_rcu
 992         hlist_replace_rcu
 993         list_splice_init_rcu
 994         list_splice_tail_init_rcu
 995         hlist_nulls_del_init_rcu
 996         hlist_nulls_del_rcu
 997         hlist_nulls_add_head_rcu
 998         hlist_bl_add_head_rcu
 999         hlist_bl_del_init_rcu
1000         hlist_bl_del_rcu
1001         hlist_bl_set_first_rcu
1002
1003 RCU::
1004
1005         Critical sections       Grace period            Barrier
1006
1007         rcu_read_lock           synchronize_net         rcu_barrier
1008         rcu_read_unlock         synchronize_rcu
1009         rcu_dereference         synchronize_rcu_expedited
1010         rcu_read_lock_held      call_rcu
1011         rcu_dereference_check   kfree_rcu
1012         rcu_dereference_protected
1013
1014 bh::
1015
1016         Critical sections       Grace period            Barrier
1017
1018         rcu_read_lock_bh        call_rcu                rcu_barrier
1019         rcu_read_unlock_bh      synchronize_rcu
1020         [local_bh_disable]      synchronize_rcu_expedited
1021         [and friends]
1022         rcu_dereference_bh
1023         rcu_dereference_bh_check
1024         rcu_dereference_bh_protected
1025         rcu_read_lock_bh_held
1026
1027 sched::
1028
1029         Critical sections       Grace period            Barrier
1030
1031         rcu_read_lock_sched     call_rcu                rcu_barrier
1032         rcu_read_unlock_sched   synchronize_rcu
1033         [preempt_disable]       synchronize_rcu_expedited
1034         [and friends]
1035         rcu_read_lock_sched_notrace
1036         rcu_read_unlock_sched_notrace
1037         rcu_dereference_sched
1038         rcu_dereference_sched_check
1039         rcu_dereference_sched_protected
1040         rcu_read_lock_sched_held
1041
1042
1043 SRCU::
1044
1045         Critical sections       Grace period            Barrier
1046
1047         srcu_read_lock          call_srcu               srcu_barrier
1048         srcu_read_unlock        synchronize_srcu
1049         srcu_dereference        synchronize_srcu_expedited
1050         srcu_dereference_check
1051         srcu_read_lock_held
1052
1053 SRCU: Initialization/cleanup::
1054
1055         DEFINE_SRCU
1056         DEFINE_STATIC_SRCU
1057         init_srcu_struct
1058         cleanup_srcu_struct
1059
1060 All: lockdep-checked RCU-protected pointer access::
1061
1062         rcu_access_pointer
1063         rcu_dereference_raw
1064         RCU_LOCKDEP_WARN
1065         rcu_sleep_check
1066         RCU_NONIDLE
1067
1068 See the comment headers in the source code (or the docbook generated
1069 from them) for more information.
1070
1071 However, given that there are no fewer than four families of RCU APIs
1072 in the Linux kernel, how do you choose which one to use?  The following
1073 list can be helpful:
1074
1075 a.      Will readers need to block?  If so, you need SRCU.
1076
1077 b.      What about the -rt patchset?  If readers would need to block
1078         in an non-rt kernel, you need SRCU.  If readers would block
1079         in a -rt kernel, but not in a non-rt kernel, SRCU is not
1080         necessary.  (The -rt patchset turns spinlocks into sleeplocks,
1081         hence this distinction.)
1082
1083 c.      Do you need to treat NMI handlers, hardirq handlers,
1084         and code segments with preemption disabled (whether
1085         via preempt_disable(), local_irq_save(), local_bh_disable(),
1086         or some other mechanism) as if they were explicit RCU readers?
1087         If so, RCU-sched is the only choice that will work for you.
1088
1089 d.      Do you need RCU grace periods to complete even in the face
1090         of softirq monopolization of one or more of the CPUs?  For
1091         example, is your code subject to network-based denial-of-service
1092         attacks?  If so, you should disable softirq across your readers,
1093         for example, by using rcu_read_lock_bh().
1094
1095 e.      Is your workload too update-intensive for normal use of
1096         RCU, but inappropriate for other synchronization mechanisms?
1097         If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
1098         named SLAB_DESTROY_BY_RCU).  But please be careful!
1099
1100 f.      Do you need read-side critical sections that are respected
1101         even though they are in the middle of the idle loop, during
1102         user-mode execution, or on an offlined CPU?  If so, SRCU is the
1103         only choice that will work for you.
1104
1105 g.      Otherwise, use RCU.
1106
1107 Of course, this all assumes that you have determined that RCU is in fact
1108 the right tool for your job.
1109
1110 .. _9_whatisRCU:
1111
1112 9.  ANSWERS TO QUICK QUIZZES
1113 ----------------------------
1114
1115 Quick Quiz #1:
1116                 Why is this argument naive?  How could a deadlock
1117                 occur when using this algorithm in a real-world Linux
1118                 kernel?  [Referring to the lock-based "toy" RCU
1119                 algorithm.]
1120
1121 Answer:
1122                 Consider the following sequence of events:
1123
1124                 1.      CPU 0 acquires some unrelated lock, call it
1125                         "problematic_lock", disabling irq via
1126                         spin_lock_irqsave().
1127
1128                 2.      CPU 1 enters synchronize_rcu(), write-acquiring
1129                         rcu_gp_mutex.
1130
1131                 3.      CPU 0 enters rcu_read_lock(), but must wait
1132                         because CPU 1 holds rcu_gp_mutex.
1133
1134                 4.      CPU 1 is interrupted, and the irq handler
1135                         attempts to acquire problematic_lock.
1136
1137                 The system is now deadlocked.
1138
1139                 One way to avoid this deadlock is to use an approach like
1140                 that of CONFIG_PREEMPT_RT, where all normal spinlocks
1141                 become blocking locks, and all irq handlers execute in
1142                 the context of special tasks.  In this case, in step 4
1143                 above, the irq handler would block, allowing CPU 1 to
1144                 release rcu_gp_mutex, avoiding the deadlock.
1145
1146                 Even in the absence of deadlock, this RCU implementation
1147                 allows latency to "bleed" from readers to other
1148                 readers through synchronize_rcu().  To see this,
1149                 consider task A in an RCU read-side critical section
1150                 (thus read-holding rcu_gp_mutex), task B blocked
1151                 attempting to write-acquire rcu_gp_mutex, and
1152                 task C blocked in rcu_read_lock() attempting to
1153                 read_acquire rcu_gp_mutex.  Task A's RCU read-side
1154                 latency is holding up task C, albeit indirectly via
1155                 task B.
1156
1157                 Realtime RCU implementations therefore use a counter-based
1158                 approach where tasks in RCU read-side critical sections
1159                 cannot be blocked by tasks executing synchronize_rcu().
1160
1161 :ref:`Back to Quick Quiz #1 <quiz_1>`
1162
1163 Quick Quiz #2:
1164                 Give an example where Classic RCU's read-side
1165                 overhead is **negative**.
1166
1167 Answer:
1168                 Imagine a single-CPU system with a non-CONFIG_PREEMPTION
1169                 kernel where a routing table is used by process-context
1170                 code, but can be updated by irq-context code (for example,
1171                 by an "ICMP REDIRECT" packet).  The usual way of handling
1172                 this would be to have the process-context code disable
1173                 interrupts while searching the routing table.  Use of
1174                 RCU allows such interrupt-disabling to be dispensed with.
1175                 Thus, without RCU, you pay the cost of disabling interrupts,
1176                 and with RCU you don't.
1177
1178                 One can argue that the overhead of RCU in this
1179                 case is negative with respect to the single-CPU
1180                 interrupt-disabling approach.  Others might argue that
1181                 the overhead of RCU is merely zero, and that replacing
1182                 the positive overhead of the interrupt-disabling scheme
1183                 with the zero-overhead RCU scheme does not constitute
1184                 negative overhead.
1185
1186                 In real life, of course, things are more complex.  But
1187                 even the theoretical possibility of negative overhead for
1188                 a synchronization primitive is a bit unexpected.  ;-)
1189
1190 :ref:`Back to Quick Quiz #2 <quiz_2>`
1191
1192 Quick Quiz #3:
1193                 If it is illegal to block in an RCU read-side
1194                 critical section, what the heck do you do in
1195                 CONFIG_PREEMPT_RT, where normal spinlocks can block???
1196
1197 Answer:
1198                 Just as CONFIG_PREEMPT_RT permits preemption of spinlock
1199                 critical sections, it permits preemption of RCU
1200                 read-side critical sections.  It also permits
1201                 spinlocks blocking while in RCU read-side critical
1202                 sections.
1203
1204                 Why the apparent inconsistency?  Because it is
1205                 possible to use priority boosting to keep the RCU
1206                 grace periods short if need be (for example, if running
1207                 short of memory).  In contrast, if blocking waiting
1208                 for (say) network reception, there is no way to know
1209                 what should be boosted.  Especially given that the
1210                 process we need to boost might well be a human being
1211                 who just went out for a pizza or something.  And although
1212                 a computer-operated cattle prod might arouse serious
1213                 interest, it might also provoke serious objections.
1214                 Besides, how does the computer know what pizza parlor
1215                 the human being went to???
1216
1217 :ref:`Back to Quick Quiz #3 <quiz_3>`
1218
1219 ACKNOWLEDGEMENTS
1220
1221 My thanks to the people who helped make this human-readable, including
1222 Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
1223
1224
1225 For more information, see http://www.rdrop.com/users/paulmck/RCU.