Documentation/admin-guide/cgroup-v1/hugetlb.rst

   1 ==================
   2 HugeTLB Controller
   3 ==================
   4
   5 HugeTLB controller can be created by first mounting the cgroup filesystem.
   6
   7 # mount -t cgroup -o hugetlb none /sys/fs/cgroup
   8
   9 With the above step, the initial or the parent HugeTLB group becomes
  10 visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
  11 the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
  12
  13 New groups can be created under the parent group /sys/fs/cgroup::
  14
  15   # cd /sys/fs/cgroup
  16   # mkdir g1
  17   # echo $$ > g1/tasks
  18
  19 The above steps create a new group g1 and move the current shell
  20 process (bash) into it.
  21
  22 Brief summary of control files::
  23
  24  hugetlb.<hugepagesize>.rsvd.limit_in_bytes            # set/show limit of "hugepagesize" hugetlb reservations
  25  hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes        # show max "hugepagesize" hugetlb reservations and no-reserve faults
  26  hugetlb.<hugepagesize>.rsvd.usage_in_bytes            # show current reservations and no-reserve faults for "hugepagesize" hugetlb
  27  hugetlb.<hugepagesize>.rsvd.failcnt                   # show the number of allocation failure due to HugeTLB reservation limit
  28  hugetlb.<hugepagesize>.limit_in_bytes                 # set/show limit of "hugepagesize" hugetlb faults
  29  hugetlb.<hugepagesize>.max_usage_in_bytes             # show max "hugepagesize" hugetlb  usage recorded
  30  hugetlb.<hugepagesize>.usage_in_bytes                 # show current usage for "hugepagesize" hugetlb
  31  hugetlb.<hugepagesize>.failcnt                        # show the number of allocation failure due to HugeTLB usage limit
  32  hugetlb.<hugepagesize>.numa_stat                      # show the numa information of the hugetlb memory charged to this cgroup
  33
  34 For a system supporting three hugepage sizes (64k, 32M and 1G), the control
  35 files include::
  36
  37   hugetlb.1GB.limit_in_bytes
  38   hugetlb.1GB.max_usage_in_bytes
  39   hugetlb.1GB.numa_stat
  40   hugetlb.1GB.usage_in_bytes
  41   hugetlb.1GB.failcnt
  42   hugetlb.1GB.rsvd.limit_in_bytes
  43   hugetlb.1GB.rsvd.max_usage_in_bytes
  44   hugetlb.1GB.rsvd.usage_in_bytes
  45   hugetlb.1GB.rsvd.failcnt
  46   hugetlb.64KB.limit_in_bytes
  47   hugetlb.64KB.max_usage_in_bytes
  48   hugetlb.64KB.numa_stat
  49   hugetlb.64KB.usage_in_bytes
  50   hugetlb.64KB.failcnt
  51   hugetlb.64KB.rsvd.limit_in_bytes
  52   hugetlb.64KB.rsvd.max_usage_in_bytes
  53   hugetlb.64KB.rsvd.usage_in_bytes
  54   hugetlb.64KB.rsvd.failcnt
  55   hugetlb.32MB.limit_in_bytes
  56   hugetlb.32MB.max_usage_in_bytes
  57   hugetlb.32MB.numa_stat
  58   hugetlb.32MB.usage_in_bytes
  59   hugetlb.32MB.failcnt
  60   hugetlb.32MB.rsvd.limit_in_bytes
  61   hugetlb.32MB.rsvd.max_usage_in_bytes
  62   hugetlb.32MB.rsvd.usage_in_bytes
  63   hugetlb.32MB.rsvd.failcnt
  64
  65
  66 1. Page fault accounting
  67
  68 hugetlb.<hugepagesize>.limit_in_bytes
  69 hugetlb.<hugepagesize>.max_usage_in_bytes
  70 hugetlb.<hugepagesize>.usage_in_bytes
  71 hugetlb.<hugepagesize>.failcnt
  72
  73 The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
  74 control group and enforces the limit during page fault. Since HugeTLB
  75 doesn't support page reclaim, enforcing the limit at page fault time implies
  76 that, the application will get SIGBUS signal if it tries to fault in HugeTLB
  77 pages beyond its limit. Therefore the application needs to know exactly how many
  78 HugeTLB pages it uses before hand, and the sysadmin needs to make sure that
  79 there are enough available on the machine for all the users to avoid processes
  80 getting SIGBUS.
  81
  82
  83 2. Reservation accounting
  84
  85 hugetlb.<hugepagesize>.rsvd.limit_in_bytes
  86 hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
  87 hugetlb.<hugepagesize>.rsvd.usage_in_bytes
  88 hugetlb.<hugepagesize>.rsvd.failcnt
  89
  90 The HugeTLB controller allows to limit the HugeTLB reservations per control
  91 group and enforces the controller limit at reservation time and at the fault of
  92 HugeTLB memory for which no reservation exists. Since reservation limits are
  93 enforced at reservation time (on mmap or shget), reservation limits never causes
  94 the application to get SIGBUS signal if the memory was reserved before hand. For
  95 MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
  96 limit, enforcing memory usage at fault time and causing the application to
  97 receive a SIGBUS if it's crossing its limit.
  98
  99 Reservation limits are superior to page fault limits described above, since
 100 reservation limits are enforced at reservation time (on mmap or shget), and
 101 never causes the application to get SIGBUS signal if the memory was reserved
 102 before hand. This allows for easier fallback to alternatives such as
 103 non-HugeTLB memory for example. In the case of page fault accounting, it's very
 104 hard to avoid processes getting SIGBUS since the sysadmin needs precisely know
 105 the HugeTLB usage of all the tasks in the system and make sure there is enough
 106 pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited
 107 systems is practically impossible with page fault accounting.
 108
 109
 110 3. Caveats with shared memory
 111
 112 For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
 113 to the first task that causes the memory to be reserved or faulted, and all
 114 subsequent uses of this reserved or faulted memory is done without charging.
 115
 116 Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
 117 This is usually when the HugeTLB file is deleted, and not when the task that
 118 caused the reservation or fault has exited.
 119
 120
 121 4. Caveats with HugeTLB cgroup offline.
 122
 123 When a HugeTLB cgroup goes offline with some reservations or faults still
 124 charged to it, the behavior is as follows:
 125
 126 - The fault charges are charged to the parent HugeTLB cgroup (reparented),
 127 - the reservation charges remain on the offline HugeTLB cgroup.
 128
 129 This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
 130 reservations charged to it, that cgroup persists as a zombie until all HugeTLB
 131 reservations are uncharged. HugeTLB reservations behave in this manner to match
 132 the memory controller whose cgroups also persist as zombie until all charged
 133 memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
 134 complex compared to the tracking of HugeTLB faults, so it is significantly
 135 harder to reparent reservations at offline time.