Documentation/gpu/drm-vm-bind-async.rst

   1 .. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
   2
   3 ====================
   4 Asynchronous VM_BIND
   5 ====================
   6
   7 Nomenclature:
   8 =============
   9
  10 * ``VRAM``: On-device memory. Sometimes referred to as device local memory.
  11
  12 * ``gpu_vm``: A virtual GPU address space. Typically per process, but
  13   can be shared by multiple processes.
  14
  15 * ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
  16   an IOCTL. The operations include mapping and unmapping system- or
  17   VRAM memory.
  18
  19 * ``syncobj``: A container that abstracts synchronization objects. The
  20   synchronization objects can be either generic, like dma-fences or
  21   driver specific. A syncobj typically indicates the type of the
  22   underlying synchronization object.
  23
  24 * ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
  25   for these before starting.
  26
  27 * ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
  28   signals these when the bind operation is complete.
  29
  30 * ``dma-fence``: A cross-driver synchronization object. A basic
  31   understanding of dma-fences is required to digest this
  32   document. Please refer to the ``DMA Fences`` section of the
  33   :doc:`dma-buf doc </driver-api/dma-buf>`.
  34
  35 * ``memory fence``: A synchronization object, different from a dma-fence.
  36   A memory fence uses the value of a specified memory location to determine
  37   signaled status. A memory fence can be awaited and signaled by both
  38   the GPU and CPU. Memory fences are sometimes referred to as
  39   user-fences, userspace-fences or gpu futexes and do not necessarily obey
  40   the dma-fence rule of signaling within a "reasonable amount of time".
  41   The kernel should thus avoid waiting for memory fences with locks held.
  42
  43 * ``long-running workload``: A workload that may take more than the
  44   current stipulated dma-fence maximum signal delay to complete and
  45   which therefore needs to set the gpu_vm or the GPU execution context in
  46   a certain mode that disallows completion dma-fences.
  47
  48 * ``exec function``: An exec function is a function that revalidates all
  49   affected gpu_vmas, submits a GPU command batch and registers the
  50   dma_fence representing the GPU command's activity with all affected
  51   dma_resvs. For completeness, although not covered by this document,
  52   it's worth mentioning that an exec function may also be the
  53   revalidation worker that is used by some drivers in compute /
  54   long-running mode.
  55
  56 * ``bind context``: A context identifier used for the VM_BIND
  57   operation. VM_BIND operations that use the same bind context can be
  58   assumed, where it matters, to complete in order of submission. No such
  59   assumptions can be made for VM_BIND operations using separate bind contexts.
  60
  61 * ``UMD``: User-mode driver.
  62
  63 * ``KMD``: Kernel-mode driver.
  64
  65
  66 Synchronous / Asynchronous VM_BIND operation
  67 ============================================
  68
  69 Synchronous VM_BIND
  70 ___________________
  71 With Synchronous VM_BIND, the VM_BIND operations all complete before the
  72 IOCTL returns. A synchronous VM_BIND takes neither in-fences nor
  73 out-fences. Synchronous VM_BIND may block and wait for GPU operations;
  74 for example swap-in or clearing, or even previous binds.
  75
  76 Asynchronous VM_BIND
  77 ____________________
  78 Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the
  79 IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs
  80 before modifying the GPU page-tables, and signal the out-syncobjs when
  81 the modification is done in the sense that the next exec function that
  82 awaits for the out-syncobjs will see the change. Errors are reported
  83 synchronously.
  84 In low-memory situations the implementation may block, performing the
  85 VM_BIND synchronously, because there might not be enough memory
  86 immediately available for preparing the asynchronous operation.
  87
  88 If the VM_BIND IOCTL takes a list or an array of operations as an argument,
  89 the in-syncobjs needs to signal before the first operation starts to
  90 execute, and the out-syncobjs signal after the last operation
  91 completes. Operations in the operation list can be assumed, where it
  92 matters, to complete in order.
  93
  94 Since asynchronous VM_BIND operations may use dma-fences embedded in
  95 out-syncobjs and internally in KMD to signal bind completion,  any
  96 memory fences given as VM_BIND in-fences need to be awaited
  97 synchronously before the VM_BIND ioctl returns, since dma-fences,
  98 required to signal in a reasonable amount of time, can never be made
  99 to depend on memory fences that don't have such a restriction.
 100
 101 The purpose of an Asynchronous VM_BIND operation is for user-mode
 102 drivers to be able to pipeline interleaved gpu_vm modifications and
 103 exec functions. For long-running workloads, such pipelining of a bind
 104 operation is not allowed and any in-fences need to be awaited
 105 synchronously. The reason for this is twofold. First, any memory
 106 fences gated by a long-running workload and used as in-syncobjs for the
 107 VM_BIND operation will need to be awaited synchronously anyway (see
 108 above). Second, any dma-fences used as in-syncobjs for VM_BIND
 109 operations for long-running workloads will not allow for pipelining
 110 anyway since long-running workloads don't allow for dma-fences as
 111 out-syncobjs, so while theoretically possible the use of them is
 112 questionable and should be rejected until there is a valuable use-case.
 113 Note that this is not a limitation imposed by dma-fence rules, but
 114 rather a limitation imposed to keep KMD implementation simple. It does
 115 not affect using dma-fences as dependencies for the long-running
 116 workload itself, which is allowed by dma-fence rules, but rather for
 117 the VM_BIND operation only.
 118
 119 An asynchronous VM_BIND operation may take substantial time to
 120 complete and signal the out_fence. In particular if the operation is
 121 deeply pipelined behind other VM_BIND operations and workloads
 122 submitted using exec functions. In that case, UMD might want to avoid a
 123 subsequent VM_BIND operation to be queued behind the first one if
 124 there are no explicit dependencies. In order to circumvent such a queue-up, a
 125 VM_BIND implementation may allow for VM_BIND contexts to be
 126 created. For each context, VM_BIND operations will be guaranteed to
 127 complete in the order they were submitted, but that is not the case
 128 for VM_BIND operations executing on separate VM_BIND contexts. Instead
 129 KMD will attempt to execute such VM_BIND operations in parallel but
 130 leaving no guarantee that they will actually be executed in
 131 parallel. There may be internal implicit dependencies that only KMD knows
 132 about, for example page-table structure changes. A way to attempt
 133 to avoid such internal dependencies is to have different VM_BIND
 134 contexts use separate regions of a VM.
 135
 136 Also for VM_BINDS for long-running gpu_vms the user-mode driver should typically
 137 select memory fences as out-fences since that gives greater flexibility for
 138 the kernel mode driver to inject other operations into the bind /
 139 unbind operations. Like for example inserting breakpoints into batch
 140 buffers. The workload execution can then easily be pipelined behind
 141 the bind completion using the memory out-fence as the signal condition
 142 for a GPU semaphore embedded by UMD in the workload.
 143
 144 There is no difference in the operations supported or in
 145 multi-operation support between asynchronous VM_BIND and synchronous VM_BIND.
 146
 147 Multi-operation VM_BIND IOCTL error handling and interrupts
 148 ===========================================================
 149
 150 The VM_BIND operations of the IOCTL may error for various reasons, for
 151 example due to lack of resources to complete and due to interrupted
 152 waits.
 153 In these situations UMD should preferably restart the IOCTL after
 154 taking suitable action.
 155 If UMD has over-committed a memory resource, an -ENOSPC error will be
 156 returned, and UMD may then unbind resources that are not used at the
 157 moment and rerun the IOCTL. On -EINTR, UMD should simply rerun the
 158 IOCTL and on -ENOMEM user-space may either attempt to free known
 159 system memory resources or fail. In case of UMD deciding to fail a
 160 bind operation, due to an error return, no additional action is needed
 161 to clean up the failed operation, and the VM is left in the same state
 162 as it was before the failing IOCTL.
 163 Unbind operations are guaranteed not to return any errors due to
 164 resource constraints, but may return errors due to, for example,
 165 invalid arguments or the gpu_vm being banned.
 166 In the case an unexpected error happens during the asynchronous bind
 167 process, the gpu_vm will be banned, and attempts to use it after banning
 168 will return -ENOENT.
 169
 170 Example: The Xe VM_BIND uAPI
 171 ============================
 172
 173 Starting with the VM_BIND operation struct, the IOCTL call can take
 174 zero, one or many such operations. A zero number means only the
 175 synchronization part of the IOCTL is carried out: an asynchronous
 176 VM_BIND updates the syncobjects, whereas a sync VM_BIND waits for the
 177 implicit dependencies to be fulfilled.
 178
 179 .. code-block:: c
 180
 181    struct drm_xe_vm_bind_op {
 182         /**
 183          * @obj: GEM object to operate on, MBZ for MAP_USERPTR, MBZ for UNMAP
 184          */
 185         __u32 obj;
 186
 187         /** @pad: MBZ */
 188         __u32 pad;
 189
 190         union {
 191                 /**
 192                  * @obj_offset: Offset into the object for MAP.
 193                  */
 194                 __u64 obj_offset;
 195
 196                 /** @userptr: user virtual address for MAP_USERPTR */
 197                 __u64 userptr;
 198         };
 199
 200         /**
 201          * @range: Number of bytes from the object to bind to addr, MBZ for UNMAP_ALL
 202          */
 203         __u64 range;
 204
 205         /** @addr: Address to operate on, MBZ for UNMAP_ALL */
 206         __u64 addr;
 207
 208         /**
 209          * @tile_mask: Mask for which tiles to create binds for, 0 == All tiles,
 210          * only applies to creating new VMAs
 211          */
 212         __u64 tile_mask;
 213
 214        /* Map (parts of) an object into the GPU virtual address range.
 215     #define XE_VM_BIND_OP_MAP           0x0
 216         /* Unmap a GPU virtual address range */
 217     #define XE_VM_BIND_OP_UNMAP         0x1
 218         /*
 219          * Map a CPU virtual address range into a GPU virtual
 220          * address range.
 221          */
 222     #define XE_VM_BIND_OP_MAP_USERPTR   0x2
 223         /* Unmap a gem object from the VM. */
 224     #define XE_VM_BIND_OP_UNMAP_ALL     0x3
 225         /*
 226          * Make the backing memory of an address range resident if
 227          * possible. Note that this doesn't pin backing memory.
 228          */
 229     #define XE_VM_BIND_OP_PREFETCH      0x4
 230
 231         /* Make the GPU map readonly. */
 232     #define XE_VM_BIND_FLAG_READONLY    (0x1 << 16)
 233         /*
 234          * Valid on a faulting VM only, do the MAP operation immediately rather
 235          * than deferring the MAP to the page fault handler.
 236          */
 237     #define XE_VM_BIND_FLAG_IMMEDIATE   (0x1 << 17)
 238         /*
 239          * When the NULL flag is set, the page tables are setup with a special
 240          * bit which indicates writes are dropped and all reads return zero.  In
 241          * the future, the NULL flags will only be valid for XE_VM_BIND_OP_MAP
 242          * operations, the BO handle MBZ, and the BO offset MBZ. This flag is
 243          * intended to implement VK sparse bindings.
 244          */
 245     #define XE_VM_BIND_FLAG_NULL        (0x1 << 18)
 246         /** @op: Operation to perform (lower 16 bits) and flags (upper 16 bits) */
 247         __u32 op;
 248
 249         /** @mem_region: Memory region to prefetch VMA to, instance not a mask */
 250         __u32 region;
 251
 252         /** @reserved: Reserved */
 253         __u64 reserved[2];
 254    };
 255
 256
 257 The VM_BIND IOCTL argument itself, looks like follows. Note that for
 258 synchronous VM_BIND, the num_syncs and syncs fields must be zero. Here
 259 the ``exec_queue_id`` field is the VM_BIND context discussed previously
 260 that is used to facilitate out-of-order VM_BINDs.
 261
 262 .. code-block:: c
 263
 264     struct drm_xe_vm_bind {
 265         /** @extensions: Pointer to the first extension struct, if any */
 266         __u64 extensions;
 267
 268         /** @vm_id: The ID of the VM to bind to */
 269         __u32 vm_id;
 270
 271         /**
 272          * @exec_queue_id: exec_queue_id, must be of class DRM_XE_ENGINE_CLASS_VM_BIND
 273          * and exec queue must have same vm_id. If zero, the default VM bind engine
 274          * is used.
 275          */
 276         __u32 exec_queue_id;
 277
 278         /** @num_binds: number of binds in this IOCTL */
 279         __u32 num_binds;
 280
 281         /* If set, perform an async VM_BIND, if clear a sync VM_BIND */
 282     #define XE_VM_BIND_IOCTL_FLAG_ASYNC (0x1 << 0)
 283
 284         /** @flag: Flags controlling all operations in this ioctl. */
 285         __u32 flags;
 286
 287         union {
 288                 /** @bind: used if num_binds == 1 */
 289                 struct drm_xe_vm_bind_op bind;
 290
 291                 /**
 292                  * @vector_of_binds: userptr to array of struct
 293                  * drm_xe_vm_bind_op if num_binds > 1
 294                  */
 295                 __u64 vector_of_binds;
 296         };
 297
 298         /** @num_syncs: amount of syncs to wait for or to signal on completion. */
 299         __u32 num_syncs;
 300
 301         /** @pad2: MBZ */
 302         __u32 pad2;
 303
 304         /** @syncs: pointer to struct drm_xe_sync array */
 305         __u64 syncs;
 306
 307         /** @reserved: Reserved */
 308         __u64 reserved[2];
 309     };