Documentation/admin-guide/hw-vuln/srso.rst

   1 .. SPDX-License-Identifier: GPL-2.0
   2
   3 Speculative Return Stack Overflow (SRSO)
   4 ========================================
   5
   6 This is a mitigation for the speculative return stack overflow (SRSO)
   7 vulnerability found on AMD processors. The mechanism is by now the well
   8 known scenario of poisoning CPU functional units - the Branch Target
   9 Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
  10 tricking the elevated privilege domain (the kernel) into leaking
  11 sensitive data.
  12
  13 AMD CPUs predict RET instructions using a Return Address Predictor (aka
  14 Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
  15 CALL instruction (i.e., an instruction predicted to be a CALL but is
  16 not actually a CALL) can create an entry in the RAP which may be used
  17 to predict the target of a subsequent RET instruction.
  18
  19 The specific circumstances that lead to this varies by microarchitecture
  20 but the concern is that an attacker can mis-train the CPU BTB to predict
  21 non-architectural CALL instructions in kernel space and use this to
  22 control the speculative target of a subsequent kernel RET, potentially
  23 leading to information disclosure via a speculative side-channel.
  24
  25 The issue is tracked under CVE-2023-20569.
  26
  27 Affected processors
  28 -------------------
  29
  30 AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
  31 processors have not been investigated.
  32
  33 System information and options
  34 ------------------------------
  35
  36 First of all, it is required that the latest microcode be loaded for
  37 mitigations to be effective.
  38
  39 The sysfs file showing SRSO mitigation status is:
  40
  41   /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
  42
  43 The possible values in this file are:
  44
  45  * 'Not affected':
  46
  47    The processor is not vulnerable
  48
  49 * 'Vulnerable':
  50
  51    The processor is vulnerable and no mitigations have been applied.
  52
  53  * 'Vulnerable: No microcode':
  54
  55    The processor is vulnerable, no microcode extending IBPB
  56    functionality to address the vulnerability has been applied.
  57
  58  * 'Vulnerable: Safe RET, no microcode':
  59
  60    The "Safe RET" mitigation (see below) has been applied to protect the
  61    kernel, but the IBPB-extending microcode has not been applied.  User
  62    space tasks may still be vulnerable.
  63
  64  * 'Vulnerable: Microcode, no safe RET':
  65
  66    Extended IBPB functionality microcode patch has been applied. It does
  67    not address User->Kernel and Guest->Host transitions protection but it
  68    does address User->User and VM->VM attack vectors.
  69
  70    Note that User->User mitigation is controlled by how the IBPB aspect in
  71    the Spectre v2 mitigation is selected:
  72
  73     * conditional IBPB:
  74
  75       where each process can select whether it needs an IBPB issued
  76       around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre`
  77
  78     * strict:
  79
  80       i.e., always on - by supplying spectre_v2_user=on on the kernel
  81       command line
  82
  83    (spec_rstack_overflow=microcode)
  84
  85  * 'Mitigation: Safe RET':
  86
  87    Combined microcode/software mitigation. It complements the
  88    extended IBPB microcode patch functionality by addressing
  89    User->Kernel and Guest->Host transitions protection.
  90
  91    Selected by default or by spec_rstack_overflow=safe-ret
  92
  93  * 'Mitigation: IBPB':
  94
  95    Similar protection as "safe RET" above but employs an IBPB barrier on
  96    privilege domain crossings (User->Kernel, Guest->Host).
  97
  98   (spec_rstack_overflow=ibpb)
  99
 100  * 'Mitigation: IBPB on VMEXIT':
 101
 102    Mitigation addressing the cloud provider scenario - the Guest->Host
 103    transitions only.
 104
 105    (spec_rstack_overflow=ibpb-vmexit)
 106
 107
 108
 109 In order to exploit vulnerability, an attacker needs to:
 110
 111  - gain local access on the machine
 112
 113  - break kASLR
 114
 115  - find gadgets in the running kernel in order to use them in the exploit
 116
 117  - potentially create and pin an additional workload on the sibling
 118    thread, depending on the microarchitecture (not necessary on fam 0x19)
 119
 120  - run the exploit
 121
 122 Considering the performance implications of each mitigation type, the
 123 default one is 'Mitigation: safe RET' which should take care of most
 124 attack vectors, including the local User->Kernel one.
 125
 126 As always, the user is advised to keep her/his system up-to-date by
 127 applying software updates regularly.
 128
 129 The default setting will be reevaluated when needed and especially when
 130 new attack vectors appear.
 131
 132 As one can surmise, 'Mitigation: safe RET' does come at the cost of some
 133 performance depending on the workload. If one trusts her/his userspace
 134 and does not want to suffer the performance impact, one can always
 135 disable the mitigation with spec_rstack_overflow=off.
 136
 137 Similarly, 'Mitigation: IBPB' is another full mitigation type employing
 138 an indrect branch prediction barrier after having applied the required
 139 microcode patch for one's system. This mitigation comes also at
 140 a performance cost.
 141
 142 Mitigation: Safe RET
 143 --------------------
 144
 145 The mitigation works by ensuring all RET instructions speculate to
 146 a controlled location, similar to how speculation is controlled in the
 147 retpoline sequence.  To accomplish this, the __x86_return_thunk forces
 148 the CPU to mispredict every function return using a 'safe return'
 149 sequence.
 150
 151 To ensure the safety of this mitigation, the kernel must ensure that the
 152 safe return sequence is itself free from attacker interference.  In Zen3
 153 and Zen4, this is accomplished by creating a BTB alias between the
 154 untraining function srso_alias_untrain_ret() and the safe return
 155 function srso_alias_safe_ret() which results in evicting a potentially
 156 poisoned BTB entry and using that safe one for all function returns.
 157
 158 In older Zen1 and Zen2, this is accomplished using a reinterpretation
 159 technique similar to Retbleed one: srso_untrain_ret() and
 160 srso_safe_ret().