1 .. Copyright © 2018 ANSSI.
2 CLIP OS is a trademark of the French Republic.
3 Content licensed under the Open License version 2.0 as published by Etalab
4 (French task force for Open Data).
11 The CLIP OS kernel is based on Linux. It also integrates:
13 * existing hardening patches that are not upstream yet and that we consider
14 relevant to our security model;
15 * developments made for previous CLIP OS versions that we have not upstreamed
16 yet (or that cannot be);
17 * entirely new functionalities that have not been upstreamed yet (or that
23 As the core of a hardened operating system, the CLIP OS kernel is particularly
26 * providing **robust security mechanisms** to higher levels of the operating
27 system, such as reliable isolation primitives;
28 * maintaining maximal **trust in hardware resources**;
29 * guaranteeing its **own protection** against various threats.
34 In this section we discuss our security-relevant configuration choices for
35 the CLIP OS kernel. Before starting, it is worth mentioning that:
37 * We do our best to **limit the number of kernel modules**.
39 In other words, as many modules as possible should be built-in. Modules are
40 only used when needed either for the initramfs or to ease the automation of
41 the deployment of CLIP OS on multiple different machines (for the moment, we
42 only target a QEMU-KVM guest). This is particularly important as module
43 loading is disabled after CLIP OS startup.
45 * We **focus on a secure configuration**. The remaining of the configuration
46 is minimal and it is your job to tune it for your machines and use cases.
48 * CLIP OS only supports the x86-64 architecture for now.
50 * Running 32-bit programs is voluntarily unsupported. Should you change that
51 in your custom kernel, keep in mind that it requires further attention when
52 configuring it (e.g., ensure that ``CONFIG_COMPAT_VDSO=n``).
54 * Many options that are not useful to us are disabled in order to cut attack
55 surface. As they are not all detailed below, please see
56 ``src/portage/clip/sys-kernel/clipos-kernel/files/config.d/blacklist`` for an
57 exhaustive list of the ones we **explicitly** disable.
62 .. describe:: CONFIG_AUDIT=y
64 CLIP OS will need the auditing infrastructure.
66 .. describe:: CONFIG_IKCONFIG=n
69 We do not need ``.config`` to be available at runtime, neither do we need
70 access to kernel headers through *sysfs*.
72 .. describe:: CONFIG_KALLSYMS=n
74 Symbols are only useful for debug and attack purposes.
76 .. describe:: CONFIG_USERFAULTFD=n
78 The ``userfaultfd()`` system call adds attack surface and can `make heap
79 sprays easier <https://duasynt.com/blog/linux-kernel-heap-spray>`_. Note
80 that the ``vm.unprivileged_userfaultfd`` sysctl can also be used to restrict
81 the use of this system call to privileged users.
83 .. describe:: CONFIG_EXPERT=y
85 This unlocks additional configuration options we need.
89 .. describe:: CONFIG_USER_NS=n
91 User namespaces can be useful for some use cases but even more to an
92 attacker. We choose to disable them for the moment, but we could also enable
93 them and use the ``kernel.unprivileged_userns_clone`` sysctl provided by
94 linux-hardened to disable their unprivileged use.
98 .. describe:: CONFIG_SLUB_DEBUG=y
100 Allow allocator validation checking to be enabled.
102 .. describe:: CONFIG_SLAB_MERGE_DEFAULT=n
104 Merging SLAB caches can make heap exploitation easier.
106 .. describe:: CONFIG_SLAB_FREELIST_RANDOM=y
108 Randomize allocator freelists
110 .. describe:: CONFIG_SLAB_FREELIST_HARDENED=y
114 .. describe:: CONFIG_SLAB_CANARY=y
116 Place canaries at the end of slab allocations. [linux-hardened]_
120 .. describe:: CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
122 Page allocator randomization is primarily a performance improvement for
123 direct-mapped memory-side-cache utilization, but it does reduce the
124 predictability of page allocations and thus complements
125 ``SLAB_FREELIST_RANDOM``. The ``page_alloc.shuffle=1`` parameter needs to be
126 added to the kernel command line.
130 .. describe:: CONFIG_COMPAT_BRK=n
132 Enabling this would disable brk ASLR.
136 .. describe:: CONFIG_GCC_PLUGINS=y
138 Enable GCC plugins, some of which are security-relevant; GCC 4.7 at least is
141 .. describe:: CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y
143 Instrument some kernel code to gather additional (but not
144 cryptographically secure) entropy at boot time.
146 .. describe:: CONFIG_GCC_PLUGIN_STRUCTLEAK=y
147 CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
149 Prevent potential information leakage by forcing zero-initialization of:
151 - structures on the stack containing userspace addresses;
152 - any stack variable (thus including structures) that may be passed by
153 reference and has not already been explicitly initialized.
155 This is particularly important to prevent trivial bypassing of KASLR.
157 .. describe:: CONFIG_GCC_PLUGIN_RANDSTRUCT=y
159 Randomize layout of sensitive kernel structures. Exploits targeting such
160 structures then require an additional information leak vulnerability.
162 .. describe:: CONFIG_GCC_PLUGIN_RANDSTRUCT_PERFORMANCE=n
164 Do not weaken structure randomization
168 .. describe:: CONFIG_ARCH_MMAP_RND_BITS=32
170 Use maximum number of randomized bits for the mmap base address on x86_64.
171 Note that thanks to a linux-hardened patch, this also impacts the number of
172 randomized bits for the stack base address.
176 .. describe:: CONFIG_STACKPROTECTOR=y
177 CONFIG_STACKPROTECTOR_STRONG=y
179 Use ``-fstack-protector-strong`` for best stack canary coverage; GCC 4.9 at
182 .. describe:: CONFIG_VMAP_STACK=y
184 Virtually-mapped stacks benefit from guard pages, thus making kernel stack
185 overflows harder to exploit.
187 .. describe:: CONFIG_REFCOUNT_FULL=y
189 Do extensive checks on reference counting to prevent use-after-free
190 conditions. Without this option, on x86, there already is a fast
191 assembly-based protection based on the PaX implementation but it does not
196 .. describe:: CONFIG_STRICT_MODULE_RWX=y
198 Enforce strict memory mappings permissions for loadable kernel modules.
202 Although CLIP OS stores kernel modules in a read-only rootfs whose integrity is
203 guaranteed by dm-verity, we still enable and enforce module signing as an
204 additional layer of security:
206 .. describe:: CONFIG_MODULE_SIG=y
207 CONFIG_MODULE_SIG_FORCE=y
208 CONFIG_MODULE_SIG_ALL=y
209 CONFIG_MODULE_SIG_SHA512=y
210 CONFIG_MODULE_SIG_HASH="sha512"
214 .. describe:: CONFIG_INIT_STACK_ALL=n
216 This option requires compiler support that is currently only available in
219 Processor type and features
220 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
222 .. describe:: CONFIG_RETPOLINE=y
224 Retpolines are needed to protect against Spectre v2. GCC 7.3.0 or higher is
227 .. describe:: CONFIG_LEGACY_VSYSCALL_NONE=y
229 The vsyscall table is not required anymore by libc and is a fixed-position
230 potential source of ROP gadgets.
232 .. describe:: CONFIG_X86_VSYSCALL_EMULATE=n
233 CONFIG_LEGACY_VSYSCALL_XONLY=n
237 .. describe:: CONFIG_MICROCODE=y
239 Needed to benefit from microcode updates and thus security fixes (e.g.,
240 additional Intel pseudo-MSRs to be used by the kernel as a mitigation for
241 various speculative execution vulnerabilities).
243 .. describe:: CONFIG_X86_MSR=n
246 Enabling those features would only present userspace with more attack
249 .. describe:: CONFIG_KSM=n
251 Enabling this feature can make cache side-channel attacks such as
252 FLUSH+RELOAD much easier to carry out.
256 .. describe:: CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
258 This should in particular be non-zero to prevent the exploitation of kernel
261 .. describe:: CONFIG_MTRR=y
263 Memory Type Range Registers can make speculative execution bugs a bit harder
266 .. describe:: CONFIG_X86_PAT=y
268 Page Attribute Tables are the modern equivalents of MTRRs, which we
271 .. describe:: CONFIG_ARCH_RANDOM=y
273 Enable the RDRAND instruction to benefit from a secure hardware RNG if
274 supported. See also ``CONFIG_RANDOM_TRUST_CPU``.
276 .. describe:: CONFIG_X86_SMAP=y
278 Enable Supervisor Mode Access Prevention to prevent ret2usr exploitation
281 .. describe:: CONFIG_X86_INTEL_UMIP=y
283 Enable User Mode Instruction Prevention. Note that hardware supporting this
284 feature is not common yet.
286 .. describe:: CONFIG_X86_INTEL_MPX=n
288 Intel Memory Protection Extensions add hardware assistance to memory
289 protection. Compiler support is required but is deprecated in GCC 8 and will
290 probably be dropped in GCC 9.
292 .. describe:: CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n
294 Memory Protection Keys are a promising feature but they are still not
295 supported on current hardware.
297 .. describe:: CONFIG_X86_INTEL_TSX_MODE_OFF=y
299 Set the default value of the ``tsx`` kernel parameter to ``off``.
303 Enable the **seccomp** BPF userspace API for syscall attack surface reduction:
305 .. describe:: CONFIG_SECCOMP=y
306 CONFIG_SECCOMP_FILTER=y
310 .. describe:: CONFIG_RANDOMIZE_BASE=y
312 While this may be seen as a `controversial
313 <https://grsecurity.net/kaslr_an_exercise_in_cargo_cult_security.php>`_
314 feature, it makes sense for CLIP OS. Indeed, KASLR may be defeated thanks to
315 the kernel interfaces that are available to an attacker, or through attacks
316 leveraging hardware vulnerabilities such as speculative and out-of-order
317 execution ones. However, CLIP OS follows the *defense in depth* principle
318 and an attack surface reduction approach. Thus, the following points make
319 KASLR relevant in the CLIP OS kernel:
321 * KASLR was initially designed to counter remote attacks but the strong
322 security model of CLIP OS (e.g., no sysfs mounts in most containers,
323 minimal procfs, no arbitrary code execution) makes a local attack
324 more complex to carry out.
325 * STRUCTLEAK, STACKLEAK, kptr_restrict and
326 ``CONFIG_SECURITY_DMESG_RESTRICT`` are enabled in CLIP OS.
327 * The CLIP OS kernel is custom-compiled (at least for a given deployment),
328 its image is unreadable to all users including privileged ones and updates
329 are end-to-end encrypted. This makes both the content and addresses of the
330 kernel image secret. Note that, however, the production kernel image is
331 currently part of an EFI binary and is not encrypted, causing it to be
332 accessible to a physical attacker. This will change in the future as we
333 will only use the kernel included in the EFI binary to boot and then
334 *kexec* to the real production kernel whose image will be located on an
335 encrypted disk partition.
336 * We enable ``CONFIG_PANIC_ON_OOPS`` by default so that the kernel
337 cannot recover from failed exploit attempts, thus preventing any brute
339 * We enable Kernel Page Table Isolation, mitigating Meltdown and potential
340 other hardware information leakage. Variante 3a (Rogue System Register
341 Read) however remains an important threat to KASLR.
345 .. describe:: CONFIG_RANDOMIZE_MEMORY=y
347 Most of the above explanations stand for that feature.
349 .. describe:: CONFIG_KEXEC=n
352 Disable the ``kexec()`` system call to prevent an already-root attacker from
353 rebooting on an untrusted kernel.
355 .. describe:: CONFIG_CRASH_DUMP=n
357 A crash dump can potentially provide an attacker with useful information.
358 However we disabled ``kexec()`` syscalls above thus this configuration
359 option should have no impact anyway.
363 .. describe:: CONFIG_MODIFY_LDT_SYSCALL=n
365 This is not supposed to be needed by userspace applications and only
366 increases the kernel attack surface.
368 Power management and ACPI options
369 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
371 .. describe:: CONFIG_HIBERNATION=n
373 The CLIP OS swap partition is encrypted with an ephemeral key and thus
374 cannot support suspend to disk.
379 .. describe:: CONFIG_RESET_ATTACK_MITIGATION=n
381 In order to work properly, this mitigation requires userspace support that
382 is currently not available in CLIP OS. Moreover, due to our use of Secure
383 Boot, Trusted Boot and the fact that machines running CLIP OS are expected
384 to lock their BIOS with a password, the type of *cold boot attacks* this
385 mitigation is supposed to thwart should not be an issue.
387 Executable file formats / Emulations
388 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
390 .. describe:: CONFIG_BINFMT_MISC=n
392 We do not want our kernel to support miscellaneous binary classes. ELF
393 binaries and interpreted scripts starting with a shebang are enough.
395 .. describe:: CONFIG_COREDUMP=n
397 Core dumps can provide an attacker with useful information.
402 .. describe:: CONFIG_SYN_COOKIES=y
404 Enable TCP syncookies.
409 .. describe:: CONFIG_HW_RANDOM_TPM=y
411 Expose the TPM's Random Number Generator (RNG) as a Hardware RNG (HWRNG)
412 device, allowing the kernel to collect randomness from it. See documentation
413 of ``CONFIG_RANDOM_TRUST_CPU`` and the ``rng_core.default_quality`` command
414 line parameter for supplementary information.
416 .. describe:: CONFIG_TCG_TPM=y
418 CLIP OS leverages the TPM to ensure :ref:`boot integrity <trusted_boot>`.
420 .. describe:: CONFIG_DEVMEM=n
422 The ``/dev/mem`` device should not be required by any user application
427 If you must enable it, at least enable ``CONFIG_STRICT_DEVMEM`` and
428 ``CONFIG_IO_STRICT_DEVMEM`` to restrict at best access to this device.
430 .. describe:: CONFIG_DEVKMEM=n
432 This virtual device is only useful for debug purposes and is very dangerous
433 as it allows direct kernel memory writing (particularly useful for
436 .. describe:: CONFIG_LEGACY_PTYS=n
438 Use the modern PTY interface only.
440 .. describe:: CONFIG_LDISC_AUTOLOAD=n
442 Do not automatically load any line discipline that is in a kernel module
443 when an unprivileged user asks for it.
445 .. describe:: CONFIG_DEVPORT=n
447 The ``/dev/port`` device should not be used anymore by userspace, and it
448 could increase the kernel attack surface.
450 .. describe:: CONFIG_RANDOM_TRUST_CPU=n
452 Do not **credit** entropy generated by the CPU manufacturer's HWRNG and
453 included in Linux's entropy pool. Fast and robust initialization of Linux's
454 CSPRNG is instead achieved thanks to the TPM's HWRNG (see documentation of
455 ``CONFIG_HW_RANDOM_TPM`` and the ``rng_core.default_quality`` command line
458 The IOMMU allows for protecting the system's main memory from arbitrary
459 accesses from devices (e.g., DMA attacks). Note that this is related to
460 hardware features. On a recent Intel machine, we enable the following:
462 .. describe:: CONFIG_IOMMU_SUPPORT=y
464 CONFIG_INTEL_IOMMU_SVM=y
465 CONFIG_INTEL_IOMMU_DEFAULT_ON=y
470 .. describe:: CONFIG_PROC_KCORE=n
472 Enabling this would provide an attacker with precious information on the
478 .. describe:: CONFIG_MAGIC_SYSRQ=n
480 This should only be needed for debugging.
482 .. describe:: CONFIG_DEBUG_KERNEL=y
484 This is useful even in a production kernel to enable further configuration
485 options that have security benefits.
487 .. describe:: CONFIG_DEBUG_VIRTUAL=y
489 Enable sanity checks in virtual to page code.
491 .. describe:: CONFIG_STRICT_KERNEL_RWX=y
493 Ensure kernel page tables have strict permissions.
495 .. describe:: CONFIG_DEBUG_WX=y
497 Check and report any dangerous memory mapping permissions, i.e., both
498 writable and executable kernel pages.
500 .. describe:: CONFIG_DEBUG_FS=n
502 The debugfs virtual file system is only useful for debugging and protecting
503 it would require additional work.
505 .. describe:: CONFIG_SLUB_DEBUG_ON=n
507 Using the ``slub_debug`` command line parameter provides more fine grained
510 .. describe:: CONFIG_PANIC_ON_OOPS=y
511 CONFIG_PANIC_TIMEOUT=-1
513 Prevent potential further exploitation of a bug by immediately panicking the
516 The following options add additional checks and validation for various
517 commonly targeted kernel structures:
519 .. describe:: CONFIG_DEBUG_CREDENTIALS=y
520 CONFIG_DEBUG_NOTIFIERS=y
523 .. describe:: CONFIG_BUG_ON_DATA_CORRUPTION=y
525 Note that linux-hardened patches add more places where this configuration
526 option has an impact.
528 .. describe:: CONFIG_SCHED_STACK_END_CHECK=y
529 .. describe:: CONFIG_PAGE_POISONING=n
531 We choose to poison pages with zeroes and thus prefer using
532 ``init_on_free`` in combination with linux-hardened's
533 ``PAGE_SANITIZE_VERIFY``.
538 .. describe:: CONFIG_SECURITY_DMESG_RESTRICT=y
540 Prevent unprivileged users from gathering information from the kernel log
541 buffer via ``dmesg(8)``. Note that this still can be overridden through the
542 ``kernel.dmesg_restrict`` sysctl.
544 .. describe:: CONFIG_PAGE_TABLE_ISOLATION=y
546 Enable KPTI to prevent Meltdown attacks and, more generally, reduce the
547 number of hardware side channels.
551 .. describe:: CONFIG_INTEL_TXT=n
553 CLIP OS does not use Intel Trusted Execution Technology.
557 .. describe:: CONFIG_HARDENED_USERCOPY=y
559 Harden data copies between kernel and user spaces, preventing classes of
560 heap overflow exploits and information leaks.
562 .. describe:: CONFIG_HARDENED_USERCOPY_FALLBACK=n
564 Use strict whitelisting mode, i.e., do not ``WARN()``.
566 .. describe:: CONFIG_FORTIFY_SOURCE=y
568 Leverage compiler to detect buffer overflows.
570 .. describe:: CONFIG_FORTIFY_SOURCE_STRICT_STRING=n
572 This extends ``FORTIFY_SOURCE`` to intra-object overflow checking. It is
573 useful to find bugs but not recommended for a production kernel yet.
576 .. describe:: CONFIG_STATIC_USERMODEHELPER=y
578 This makes the kernel route all usermode helper calls to a single binary
579 that cannot have its name changed. Without this, the kernel can be tricked
580 into calling an attacker-controlled binary (e.g. to bypass SMAP, cf.
581 `exploitation <https://seclists.org/oss-sec/2016/q4/621>`_ of
584 .. describe:: CONFIG_STATIC_USERMODEHELPER_PATH=""
586 Currently, we have no need for usermode helpers therefore we simply
587 disable them. If we ever need some, this path will need to be set to a
588 custom trusted binary in charge of filtering and choosing what real
589 helpers should then be called.
593 .. describe:: CONFIG_SECURITY=y
595 Enable us to choose different security modules.
597 .. describe:: CONFIG_SECURITY_SELINUX=y
599 CLIP OS intends to leverage SELinux in its security model.
601 .. describe:: CONFIG_SECURITY_SELINUX_BOOTPARAM=n
603 We do not need SELinux to be disableable.
605 .. describe:: CONFIG_SECURITY_SELINUX_DISABLE=n
607 We do not want SELinux to be disabled. In addition, this would prevent LSM
608 structures such as security hooks from being marked as read-only.
610 .. describe:: CONFIG_SECURITY_SELINUX_DEVELOP=y
612 For now, but will eventually be ``n``.
616 .. describe:: CONFIG_LSM="yama"
618 SELinux shall be stacked too once CLIP OS uses it.
622 .. describe:: CONFIG_SECURITY_YAMA=y
624 The Yama LSM currently provides ptrace scope restriction (which might be
625 redundant with CLIP-LSM in the future).
629 .. describe:: CONFIG_INTEGRITY=n
631 The integrity subsystem provides several components, the security benefits
632 of which are already enforced by CLIP OS (e.g., read-only mounts for all
633 parts of the system containing executable programs).
637 .. describe:: CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
639 See documentation about the ``kernel.perf_event_paranoid`` sysctl below.
644 .. describe:: CONFIG_SECURITY_TIOCSTI_RESTRICT=y
646 This prevents unprivileged users from using the TIOCSTI ioctl to inject
647 commands into other processes that share a tty session. [linux-hardened]_
651 .. describe:: CONFIG_GCC_PLUGIN_STACKLEAK=y
652 CONFIG_STACKLEAK_TRACK_MIN_SIZE=100
653 CONFIG_STACKLEAK_METRICS=n
654 CONFIG_STACKLEAK_RUNTIME_DISABLE=n
656 ``STACKLEAK`` erases the kernel stack before returning from system calls,
657 leaving it initialized to a poison value. This both reduces the information
658 that kernel stack leak bugs can reveal and the exploitability of uninitialized
659 stack variables. However, it does not cover functions reaching the same stack
660 depth as prior functions during the same system call.
662 It used to also block kernel stack depth overflows caused by ``alloca()``, such
663 as Stack Clash attacks. We maintained this functionality for our kernel for a
664 while but eventually `dropped it
665 <https://github.com/clipos/src_external_linux/commit/3e5f9114fc2f70f6d2ae5d10db10869e0564eb03>`_.
667 .. describe:: CONFIG_INIT_ON_FREE_DEFAULT_ON=y
668 CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
670 These set ``init_on_free=1`` and ``init_on_alloc=1`` on the kernel command
671 line. See the documentation of these kernel parameters for details.
673 .. describe:: CONFIG_PAGE_SANITIZE_VERIFY=y
674 CONFIG_SLAB_SANITIZE_VERIFY=y
676 Verify that newly allocated pages and slab allocations are zeroed to detect
677 write-after-free bugs. This works in concert with ``init_on_free`` and is
678 adjusted to not be redundant with ``init_on_alloc``.
683 We incorporated most of the *Lockdown* patch series into the CLIP OS kernel,
684 though it may be merged into the mainline kernel in the near future.
685 Basically, *Lockdown* tries to disable many mechanisms that could allow the
686 superuser to eventually run untrusted code in kernel mode (note that a
687 significant portion of them are already disabled in the CLIP OS kernel due to
688 our custom configuration). This is an interesting work for CLIP OS as we want
689 to avoid persistence on a compromised machine even in the case of an
690 already-root attacker. Among the several configuration options brought by
691 *Lockdown*, we enable the following ones:
693 .. describe:: CONFIG_LOCK_DOWN_KERNEL=y
694 CONFIG_LOCK_DOWN_MANDATORY=y
700 GCC version 7.3.0 or higher is required to fully benefit from retpolines
701 (``-mindirect-branch=thunk-extern``).
704 Sysctl Security Tuning
705 ----------------------
707 Many sysctls are not security-relevant or only play a role if some kernel
708 configuration options are enabled/disabled. In other words, the following is
709 tightly related to the CLIP OS kernel configuration detailed above.
711 .. describe:: dev.tty.ldisc_autoload = 0
713 See ``CONFIG_LDISC_AUTOLOAD`` above, which serves as a default value for
716 .. describe:: kernel.kptr_restrict = 2
718 Hide kernel addresses in ``/proc`` and other interfaces, even to privileged
721 .. describe:: kernel.yama.ptrace_scope = 3
723 Enable the strictest ptrace scope restriction provided by the Yama LSM.
725 .. describe:: kernel.perf_event_paranoid = 3
727 This completely disallows unprivileged access to the ``perf_event_open()``
728 system call. This is actually not needed as we already enable
729 ``CONFIG_SECURITY_PERF_EVENTS_RESTRICT``. [linux-hardened]_
731 Note that this requires a patch included in linux-hardened (see `here
732 <https://lwn.net/Articles/696216/>`_ for the reason why it is not upstream).
733 Indeed, on a mainline kernel without such a patch, the above is equivalent
734 to setting this sysctl to ``2``, which would still allow the profiling of
737 .. describe:: kernel.tiocsti_restrict = 1
739 This is already forced by the ``CONFIG_SECURITY_TIOCSTI_RESTRICT`` kernel
740 configuration option that we enable. [linux-hardened]_
742 The following two sysctls help mitigating TOCTOU vulnerabilities by preventing
743 users from creating symbolic or hard links to files they do not own or have
744 read/write access to:
746 .. describe:: fs.protected_symlinks = 1
747 fs.protected_hardlinks = 1
749 In addition, the following other two sysctls impose restrictions on the
750 opening of FIFOs and regular files in order to make similar spoofing attacks
753 .. describe:: fs.protected_fifos = 2
754 fs.protected_regular = 2
756 We do not simply disable the BPF Just in Time compiler as CLIP OS plans on
759 .. describe:: kernel.unprivileged_bpf_disabled = 1
761 Prevent unprivileged users from using BPF.
763 .. describe:: net.core.bpf_jit_harden = 2
765 Trades off performance but helps mitigate JIT spraying.
767 .. describe:: kernel.deny_new_usb = 0
769 The management of USB devices is handled at a higher level by CLIP OS.
772 .. describe:: kernel.device_sidechannel_restrict = 1
774 Restrict device timing side channels. [linux-hardened]_
776 .. describe:: fs.suid_dumpable = 0
778 Do not create core dumps of setuid executables. Note that we already
779 disable all core dumps by setting ``CONFIG_COREDUMP=n``.
781 .. describe:: kernel.pid_max = 65536
783 Increase the space for PID values.
785 .. describe:: kernel.modules_disabled = 1
787 Disable module loading once systemd has loaded the ones required for the
788 running machine according to a profile (i.e., a predefined and
789 hardware-specific list of modules).
791 Pure network sysctls (``net.ipv4.*`` and ``net.ipv6.*``) will be detailed in a
795 Command line parameters
796 -----------------------
798 We pass the following command line parameters to the kernel:
800 .. describe:: extra_latent_entropy
802 This parameter provided by a linux-hardened patch (based on the PaX
803 implementation) enables a very simple form of latent entropy extracted
804 during system start-up and added to the entropy obtained with
805 ``GCC_PLUGIN_LATENT_ENTROPY``. [linux-hardened]_
809 This force-enables KPTI even on CPUs claiming to be safe from Meltdown.
811 .. describe:: spectre_v2=on
813 Same reasoning as above but for the Spectre v2 vulnerability. Note that this
814 implies ``spectre_v2_user=on``, which enables the mitigation against user
815 space to user space task attacks (namely IBPB and STIBP when available and
818 .. describe:: spec_store_bypass_disable=seccomp
820 Same reasoning as above but for the Spectre v4 vulnerability. Note that this
821 mitigation requires updated microcode for Intel processors.
824 .. describe:: mds=full,nosmt
826 This parameter controls optional mitigations for the Microarchitectural Data
827 Sampling (MDS) class of Intel CPU vulnerabilities. Not specifying this
828 parameter is equivalent to setting ``mds=full``, which leaves SMT enabled
829 and therefore is not a complete mitigation. Note that this mitigation
830 requires an Intel microcode update and also addresses the TSX Asynchronous
831 Abort (TAA) Intel CPU vulnerability on systems that are affected by MDS.
833 .. describe:: iommu=force
835 Even if we correctly enable the IOMMU in the kernel configuration, the
836 kernel can still decide for various reasons to not initialize it at boot.
837 Therefore, we force it with this parameter. Note that with some Intel
838 chipsets, you may need to add ``intel_iommu=igfx_off`` to allow your GPU to
839 access the physical memory directly without going through the DMA Remapping.
841 .. describe:: slub_debug=F
843 The ``F`` option adds many sanity checks to various slab operations. Other
844 interesting options that we considered but eventually chose to not use are:
846 * The ``P`` option, which enables poisoning on slab cache allocations,
847 disables the ``init_on_free`` and ``SLAB_SANITIZE_VERIFY`` features. As
848 they respectively poison with zeroes on object freeing and check the
849 zeroing on object allocations, we prefer enabling them instead of using
851 * The ``Z`` option enables red zoning, i.e., it adds extra areas around
852 slab objects that detect when one is overwritten past its real size.
853 This can help detect overflows but we already rely on ``SLAB_CANARY``
854 provided by linux-hardened. A canary is much better than a simple red
855 zone as it is supposed to be random.
857 .. describe:: page_alloc.shuffle=1
859 See ``CONFIG_SHUFFLE_PAGE_ALLOCATOR``.
861 .. describe:: rng_core.default_quality=512
863 Increase trust in the TPM's HWRNG to robustly and fastly initialize Linux's
864 CSPRNG by **crediting** half of the entropy it provides.
868 * ``slub_nomerge`` is not used as we already set
869 ``CONFIG_SLAB_MERGE_DEFAULT=n`` in the kernel configuration.
870 * ``l1tf``: The built-in PTE Inversion mitigation is sufficient to mitigate
871 the L1TF vulnerability as long as CLIP OS is not used as an hypervisor with
872 untrusted guest VMs. If it were to be someday, ``l1tf=full,force`` should be
873 used to force-enable VMX unconditional cache flushes and force-disable SMT
874 (note that an Intel microcode update is not required for this mitigation to
875 work but improves performance by providing a way to invalidate caches with a
877 * ``tsx=off``: This parameter is already set by default thanks to
878 ``CONFIG_X86_INTEL_TSX_MODE_OFF``. It deactivates the Intel TSX feature on
879 CPUs that support TSX control (i.e. are recent enough or received a microcode
880 update) and that are not already vulnerable to MDS, therefore mitigating the
881 TSX Asynchronous Abort (TAA) Intel CPU vulnerability.
882 * ``tsx_async_abort``: This parameter controls optional mitigations for the TSX
883 Asynchronous Abort (TAA) Intel CPU vulnerability. Due to our use of
884 ``mds=full,nosmt`` in addition to ``CONFIG_X86_INTEL_TSX_MODE_OFF``, CLIP OS
885 is already protected against this vulnerability as long as the CPU microcode
886 has been updated, whether or not the CPU is affected by MDS. For the record,
887 if we wanted to keep TSX activated, we could specify
888 ``tsx_async_abort=full,nosmt``. Not specifying this parameter is equivalent
889 to setting ``tsx_async_abort=full``, which leaves SMT enabled and therefore
890 is not a complete mitigation. Note that this mitigation requires an Intel
891 microcode update and has no effect on systems that are already affected by
892 MDS and enable mitigations against it, nor on systems that disable TSX.
893 * ``kvm.nx_huge_pages``: This parameter allows to control the KVM hypervisor
894 iTLB multihit mitigations. Such mitigations are not needed as long as CLIP OS
895 is not used as an hypervisor with untrusted guest VMs. If it were to be
896 someday, ``kvm.nx_huge_pages=force`` should be used to ensure that guests
897 cannot exploit the iTLB multihit erratum to crash the host.
898 * ``mitigations``: This parameter controls optional mitigations for CPU
899 vulnerabilities in an arch-independent and more coarse-grained way. For now,
900 we keep using arch-specific options for the sake of explicitness. Not setting
901 this parameter equals setting it to ``auto``, which itself does not update
903 * ``init_on_free=1`` is automatically set due to ``INIT_ON_FREE_DEFAULT_ON``. It
904 zero-fills page and slab allocations on free to reduce risks of information
905 leaks and help mitigate a subset of use-after-free vulnerabilities.
906 * ``init_on_alloc=1`` is automatically set due to ``INIT_ON_ALLOC_DEFAULT_ON``.
907 The purpose of this functionality is to eliminate several kinds of
908 *uninitialized heap memory* flaws by zero-filling:
910 * all page allocator and slab allocator memory when allocated: this is
911 already guaranteed by our use of ``init_on_free`` in combination with
912 ``PAGE_SANITIZE_VERIFY`` and ``SLAB_SANITIZE_VERIFY`` from linux-hardened,
913 and thus has no effect;
914 * a few more *special* objects when allocated: these are the ones for which
915 we enable ``init_on_alloc`` as they are not covered by the aforementioned
916 combination of ``init_on_free`` and ``SANITIZE_VERIFY`` features.
918 .. rubric:: Citations and origin of some items
921 This item is provided by the ``linux-hardened`` patches.
923 .. vim: set tw=79 ts=2 sts=2 sw=2 et: