1 .. Copyright © 2018 ANSSI.
2 CLIP OS is a trademark of the French Republic.
3 Content licensed under the Open License version 2.0 as published by Etalab
4 (French task force for Open Data).
11 The CLIP OS kernel is based on Linux. It also integrates:
13 * existing hardening patches that are not upstream yet and that we consider
14 relevant to our security model;
15 * developments made for previous CLIP OS versions that we have not upstreamed
16 yet (or that cannot be);
17 * entirely new functionalities that have not been upstreamed yet (or that
23 As the core of a hardened operating system, the CLIP OS kernel is particularly
26 * providing **robust security mechanisms** to higher levels of the operating
27 system, such as reliable isolation primitives;
28 * maintaining maximal **trust in hardware resources**;
29 * guaranteeing its **own protection** against various threats.
34 In this section we discuss our security-relevant configuration choices for
35 the CLIP OS kernel. Before starting, it is worth mentioning that:
37 * We do our best to **limit the number of kernel modules**.
39 In other words, as many modules as possible should be built-in. Modules are
40 only used when needed either for the initramfs or to ease the automation of
41 the deployment of CLIP OS on multiple different machines (for the moment, we
42 only target a QEMU-KVM guest). This is particularly important as module
43 loading is disabled after CLIP OS startup.
45 * We **focus on a secure configuration**. The remaining of the configuration
46 is minimal and it is your job to tune it for your machines and use cases.
48 * CLIP OS only supports the x86-64 architecture for now.
50 * Running 32-bit programs is voluntarily unsupported. Should you change that
51 in your custom kernel, keep in mind that it requires further attention when
52 configuring it (e.g., ensure that ``CONFIG_COMPAT_VDSO=n``).
54 * Many options that are not useful to us are disabled in order to cut attack
55 surface. As they are not all detailed below, please see
56 ``src/portage/clip/sys-kernel/clipos-kernel/files/config.d/blacklist`` for an
57 exhaustive list of the ones we **explicitly** disable.
62 .. describe:: CONFIG_AUDIT=y
64 CLIP OS will need the auditing infrastructure.
66 .. describe:: CONFIG_IKCONFIG=n
69 We do not need ``.config`` to be available at runtime, neither do we need
70 access to kernel headers through *sysfs*.
72 .. describe:: CONFIG_KALLSYMS=n
74 Symbols are only useful for debug and attack purposes.
76 .. describe:: CONFIG_USERFAULTFD=n
78 The ``userfaultfd()`` system call adds attack surface and can `make heap
79 sprays easier <https://duasynt.com/blog/linux-kernel-heap-spray>`_. Note
80 that the ``vm.unprivileged_userfaultfd`` sysctl can also be used to restrict
81 the use of this system call to privileged users.
83 .. describe:: CONFIG_EXPERT=y
85 This unlocks additional configuration options we need.
89 .. describe:: CONFIG_USER_NS=n
91 User namespaces can be useful for some use cases but even more to an
92 attacker. We choose to disable them for the moment, but we could also enable
93 them and use the ``kernel.unprivileged_userns_clone`` sysctl provided by
94 linux-hardened to disable their unprivileged use.
98 .. describe:: CONFIG_SLUB_DEBUG=y
100 Allow allocator validation checking to be enabled.
102 .. describe:: CONFIG_SLAB_MERGE_DEFAULT=n
104 Merging SLAB caches can make heap exploitation easier.
106 .. describe:: CONFIG_SLAB_FREELIST_RANDOM=y
108 Randomize allocator freelists
110 .. describe:: CONFIG_SLAB_FREELIST_HARDENED=y
114 .. describe:: CONFIG_SLAB_CANARY=y
116 Place canaries at the end of slab allocations. [linux-hardened]_
120 .. describe:: CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
122 Page allocator randomization is primarily a performance improvement for
123 direct-mapped memory-side-cache utilization, but it does reduce the
124 predictability of page allocations and thus complements
125 ``SLAB_FREELIST_RANDOM``. The ``page_alloc.shuffle=1`` parameter needs to be
126 added to the kernel command line.
130 .. describe:: CONFIG_COMPAT_BRK=n
132 Enabling this would disable brk ASLR.
136 .. describe:: CONFIG_GCC_PLUGINS=y
138 Enable GCC plugins, some of which are security-relevant; GCC 4.7 at least is
141 .. describe:: CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y
143 Instrument some kernel code to gather additional (but not
144 cryptographically secure) entropy at boot time.
146 .. describe:: CONFIG_GCC_PLUGIN_STRUCTLEAK=y
147 CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
149 Prevent potential information leakage by forcing zero-initialization of:
151 - structures on the stack containing userspace addresses;
152 - any stack variable (thus including structures) that may be passed by
153 reference and has not already been explicitly initialized.
155 This is particularly important to prevent trivial bypassing of KASLR.
157 .. describe:: CONFIG_GCC_PLUGIN_RANDSTRUCT=y
159 Randomize layout of sensitive kernel structures. Exploits targeting such
160 structures then require an additional information leak vulnerability.
162 .. describe:: CONFIG_GCC_PLUGIN_RANDSTRUCT_PERFORMANCE=n
164 Do not weaken structure randomization
168 .. describe:: CONFIG_ARCH_MMAP_RND_BITS=32
170 Use maximum number of randomized bits for the mmap base address on x86_64.
171 Note that thanks to a linux-hardened patch, this also impacts the number of
172 randomized bits for the stack base address.
176 .. describe:: CONFIG_STACKPROTECTOR=y
177 CONFIG_STACKPROTECTOR_STRONG=y
179 Use ``-fstack-protector-strong`` for best stack canary coverage; GCC 4.9 at
182 .. describe:: CONFIG_VMAP_STACK=y
184 Virtually-mapped stacks benefit from guard pages, thus making kernel stack
185 overflows harder to exploit.
187 .. describe:: CONFIG_REFCOUNT_FULL=y
189 Do extensive checks on reference counting to prevent use-after-free
190 conditions. Without this option, on x86, there already is a fast
191 assembly-based protection based on the PaX implementation but it does not
196 .. describe:: CONFIG_STRICT_MODULE_RWX=y
198 Enforce strict memory mappings permissions for loadable kernel modules.
202 Although CLIP OS stores kernel modules in a read-only rootfs whose integrity is
203 guaranteed by dm-verity, we still enable and enforce module signing as an
204 additional layer of security:
206 .. describe:: CONFIG_MODULE_SIG=y
207 CONFIG_MODULE_SIG_FORCE=y
208 CONFIG_MODULE_SIG_ALL=y
209 CONFIG_MODULE_SIG_SHA512=y
210 CONFIG_MODULE_SIG_HASH="sha512"
214 .. describe:: CONFIG_INIT_STACK_ALL=n
216 This option requires compiler support that is currently only available in
219 Processor type and features
220 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
222 .. describe:: CONFIG_RETPOLINE=y
224 Retpolines are needed to protect against Spectre v2. GCC 7.3.0 or higher is
227 .. describe:: CONFIG_LEGACY_VSYSCALL_NONE=y
228 CONFIG_LEGACY_VSYSCALL_EMULATE=n
229 CONFIG_LEGACY_VSYSCALL_XONLY=n
230 CONFIG_X86_VSYSCALL_EMULATION=n
232 The vsyscall table is not required anymore by libc and is a fixed-position
233 potential source of ROP gadgets.
235 .. describe:: CONFIG_MICROCODE=y
237 Needed to benefit from microcode updates and thus security fixes (e.g.,
238 additional Intel pseudo-MSRs to be used by the kernel as a mitigation for
239 various speculative execution vulnerabilities).
241 .. describe:: CONFIG_X86_MSR=n
244 Enabling those features would only present userspace with more attack
247 .. describe:: CONFIG_KSM=n
249 Enabling this feature can make cache side-channel attacks such as
250 FLUSH+RELOAD much easier to carry out.
254 .. describe:: CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
256 This should in particular be non-zero to prevent the exploitation of kernel
259 .. describe:: CONFIG_MTRR=y
261 Memory Type Range Registers can make speculative execution bugs a bit harder
264 .. describe:: CONFIG_X86_PAT=y
266 Page Attribute Tables are the modern equivalents of MTRRs, which we
269 .. describe:: CONFIG_ARCH_RANDOM=y
271 Enable the RDRAND instruction to benefit from a secure hardware RNG if
272 supported. See also ``CONFIG_RANDOM_TRUST_CPU``.
274 .. describe:: CONFIG_X86_SMAP=y
276 Enable Supervisor Mode Access Prevention to prevent ret2usr exploitation
279 .. describe:: CONFIG_X86_INTEL_UMIP=y
281 Enable User Mode Instruction Prevention. Note that hardware supporting this
282 feature is not common yet.
284 .. describe:: CONFIG_X86_INTEL_MPX=n
286 Intel Memory Protection Extensions (MPX) add hardware assistance to memory
287 protection. Compiler support is required but was deprecated in GCC 8 and
288 removed from GCC 9. Moreover, MPX kernel support is `being dropped
291 .. _MPX_dropped: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f240652b6032b48ad7fa35c5e701cc4c8d697c0b
293 .. describe:: CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n
295 Memory Protection Keys are a promising feature but they are still not
296 supported on current hardware.
298 .. describe:: CONFIG_X86_INTEL_TSX_MODE_OFF=y
300 Set the default value of the ``tsx`` kernel parameter to ``off``.
304 Enable the **seccomp** BPF userspace API for syscall attack surface reduction:
306 .. describe:: CONFIG_SECCOMP=y
307 CONFIG_SECCOMP_FILTER=y
311 .. describe:: CONFIG_RANDOMIZE_BASE=y
313 While this may be seen as a `controversial
314 <https://grsecurity.net/kaslr_an_exercise_in_cargo_cult_security.php>`_
315 feature, it makes sense for CLIP OS. Indeed, KASLR may be defeated thanks to
316 the kernel interfaces that are available to an attacker, or through attacks
317 leveraging hardware vulnerabilities such as speculative and out-of-order
318 execution ones. However, CLIP OS follows the *defense in depth* principle
319 and an attack surface reduction approach. Thus, the following points make
320 KASLR relevant in the CLIP OS kernel:
322 * KASLR was initially designed to counter remote attacks but the strong
323 security model of CLIP OS (e.g., no sysfs mounts in most containers,
324 minimal procfs, no arbitrary code execution) makes a local attack
325 more complex to carry out.
326 * STRUCTLEAK, STACKLEAK, kptr_restrict and
327 ``CONFIG_SECURITY_DMESG_RESTRICT`` are enabled in CLIP OS.
328 * The CLIP OS kernel is custom-compiled (at least for a given deployment),
329 its image is unreadable to all users including privileged ones and updates
330 are end-to-end encrypted. This makes both the content and addresses of the
331 kernel image secret. Note that, however, the production kernel image is
332 currently part of an EFI binary and is not encrypted, causing it to be
333 accessible to a physical attacker. This will change in the future as we
334 will only use the kernel included in the EFI binary to boot and then
335 *kexec* to the real production kernel whose image will be located on an
336 encrypted disk partition.
337 * We enable ``CONFIG_PANIC_ON_OOPS`` by default so that the kernel
338 cannot recover from failed exploit attempts, thus preventing any brute
340 * We enable Kernel Page Table Isolation, mitigating Meltdown and potential
341 other hardware information leakage. Variante 3a (Rogue System Register
342 Read) however remains an important threat to KASLR.
346 .. describe:: CONFIG_RANDOMIZE_MEMORY=y
348 Most of the above explanations stand for that feature.
350 .. describe:: CONFIG_KEXEC=n
353 Disable the ``kexec()`` system call to prevent an already-root attacker from
354 rebooting on an untrusted kernel.
356 .. describe:: CONFIG_CRASH_DUMP=n
358 A crash dump can potentially provide an attacker with useful information.
359 However we disabled ``kexec()`` syscalls above thus this configuration
360 option should have no impact anyway.
364 .. describe:: CONFIG_MODIFY_LDT_SYSCALL=n
366 This is not supposed to be needed by userspace applications and only
367 increases the kernel attack surface.
369 Power management and ACPI options
370 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
372 .. describe:: CONFIG_HIBERNATION=n
374 The CLIP OS swap partition is encrypted with an ephemeral key and thus
375 cannot support suspend to disk.
380 .. describe:: CONFIG_RESET_ATTACK_MITIGATION=n
382 In order to work properly, this mitigation requires userspace support that
383 is currently not available in CLIP OS. Moreover, due to our use of Secure
384 Boot, Trusted Boot and the fact that machines running CLIP OS are expected
385 to lock their BIOS with a password, the type of *cold boot attacks* this
386 mitigation is supposed to thwart should not be an issue.
388 Executable file formats / Emulations
389 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
391 .. describe:: CONFIG_BINFMT_MISC=n
393 We do not want our kernel to support miscellaneous binary classes. ELF
394 binaries and interpreted scripts starting with a shebang are enough.
396 .. describe:: CONFIG_COREDUMP=n
398 Core dumps can provide an attacker with useful information.
403 .. describe:: CONFIG_SYN_COOKIES=y
405 Enable TCP syncookies.
410 .. describe:: CONFIG_HW_RANDOM_TPM=y
412 Expose the TPM's Random Number Generator (RNG) as a Hardware RNG (HWRNG)
413 device, allowing the kernel to collect randomness from it. See documentation
414 of ``CONFIG_RANDOM_TRUST_CPU`` and the ``rng_core.default_quality`` command
415 line parameter for supplementary information.
417 .. describe:: CONFIG_TCG_TPM=y
419 CLIP OS leverages the TPM to ensure :ref:`boot integrity <trusted_boot>`.
421 .. describe:: CONFIG_DEVMEM=n
423 The ``/dev/mem`` device should not be required by any user application
428 If you must enable it, at least enable ``CONFIG_STRICT_DEVMEM`` and
429 ``CONFIG_IO_STRICT_DEVMEM`` to restrict at best access to this device.
431 .. describe:: CONFIG_DEVKMEM=n
433 This virtual device is only useful for debug purposes and is very dangerous
434 as it allows direct kernel memory writing (particularly useful for
437 .. describe:: CONFIG_LEGACY_PTYS=n
439 Use the modern PTY interface only.
441 .. describe:: CONFIG_LDISC_AUTOLOAD=n
443 Do not automatically load any line discipline that is in a kernel module
444 when an unprivileged user asks for it.
446 .. describe:: CONFIG_DEVPORT=n
448 The ``/dev/port`` device should not be used anymore by userspace, and it
449 could increase the kernel attack surface.
451 .. describe:: CONFIG_RANDOM_TRUST_CPU=n
452 CONFIG_RANDOM_TRUST_BOOLOADER=n
454 Do not **credit** entropy generated by the CPU manufacturer's HWRNG nor
455 provided by the booloader, and included in Linux's entropy pool. Fast and
456 robust initialization of Linux's CSPRNG is instead achieved thanks to the
457 TPM's HWRNG (see documentation of ``CONFIG_HW_RANDOM_TPM`` and the
458 ``rng_core.default_quality`` command line parameter).
460 .. describe:: CONFIG_STAGING=n
462 *Staging* drivers are typically of lower quality and under heavy
463 development. They are thus more likely to contain bugs, including security
464 vulnerabilities, and should be avoided.
466 The IOMMU allows for protecting the system's main memory from arbitrary
467 accesses from devices (e.g., DMA attacks). Note that this is related to
468 hardware features. On a recent Intel machine, we enable the following:
470 .. describe:: CONFIG_IOMMU_SUPPORT=y
472 CONFIG_INTEL_IOMMU_SVM=y
473 CONFIG_INTEL_IOMMU_DEFAULT_ON=y
478 .. describe:: CONFIG_PROC_KCORE=n
480 Enabling this would provide an attacker with precious information on the
486 .. describe:: CONFIG_MAGIC_SYSRQ=n
488 This should only be needed for debugging.
490 .. describe:: CONFIG_DEBUG_KERNEL=y
492 This is useful even in a production kernel to enable further configuration
493 options that have security benefits.
495 .. describe:: CONFIG_DEBUG_VIRTUAL=y
497 Enable sanity checks in virtual to page code.
499 .. describe:: CONFIG_STRICT_KERNEL_RWX=y
501 Ensure kernel page tables have strict permissions.
503 .. describe:: CONFIG_DEBUG_WX=y
505 Check and report any dangerous memory mapping permissions, i.e., both
506 writable and executable kernel pages.
508 .. describe:: CONFIG_DEBUG_FS=n
510 The debugfs virtual file system is only useful for debugging and protecting
511 it would require additional work.
513 .. describe:: CONFIG_SLUB_DEBUG_ON=n
515 Using the ``slub_debug`` command line parameter provides more fine grained
518 .. describe:: CONFIG_PANIC_ON_OOPS=y
519 CONFIG_PANIC_TIMEOUT=-1
521 Prevent potential further exploitation of a bug by immediately panicking the
524 The following options add additional checks and validation for various
525 commonly targeted kernel structures:
527 .. describe:: CONFIG_DEBUG_CREDENTIALS=y
528 CONFIG_DEBUG_NOTIFIERS=y
531 .. describe:: CONFIG_BUG_ON_DATA_CORRUPTION=y
533 Note that linux-hardened patches add more places where this configuration
534 option has an impact.
536 .. describe:: CONFIG_SCHED_STACK_END_CHECK=y
537 .. describe:: CONFIG_PAGE_POISONING=n
539 We choose to poison pages with zeroes and thus prefer using
540 ``init_on_free`` in combination with linux-hardened's
541 ``PAGE_SANITIZE_VERIFY``.
546 .. describe:: CONFIG_SECURITY_DMESG_RESTRICT=y
548 Prevent unprivileged users from gathering information from the kernel log
549 buffer via ``dmesg(8)``. Note that this still can be overridden through the
550 ``kernel.dmesg_restrict`` sysctl.
552 .. describe:: CONFIG_PAGE_TABLE_ISOLATION=y
554 Enable KPTI to prevent Meltdown attacks and, more generally, reduce the
555 number of hardware side channels.
559 .. describe:: CONFIG_INTEL_TXT=n
561 CLIP OS does not use Intel Trusted Execution Technology.
565 .. describe:: CONFIG_HARDENED_USERCOPY=y
567 Harden data copies between kernel and user spaces, preventing classes of
568 heap overflow exploits and information leaks.
570 .. describe:: CONFIG_HARDENED_USERCOPY_FALLBACK=n
572 Use strict whitelisting mode, i.e., do not ``WARN()``.
574 .. describe:: CONFIG_FORTIFY_SOURCE=y
576 Leverage compiler to detect buffer overflows.
578 .. describe:: CONFIG_FORTIFY_SOURCE_STRICT_STRING=n
580 This extends ``FORTIFY_SOURCE`` to intra-object overflow checking. It is
581 useful to find bugs but not recommended for a production kernel yet.
584 .. describe:: CONFIG_STATIC_USERMODEHELPER=y
586 This makes the kernel route all usermode helper calls to a single binary
587 that cannot have its name changed. Without this, the kernel can be tricked
588 into calling an attacker-controlled binary (e.g. to bypass SMAP, cf.
589 `exploitation <https://seclists.org/oss-sec/2016/q4/621>`_ of
592 .. describe:: CONFIG_STATIC_USERMODEHELPER_PATH=""
594 Currently, we have no need for usermode helpers therefore we simply
595 disable them. If we ever need some, this path will need to be set to a
596 custom trusted binary in charge of filtering and choosing what real
597 helpers should then be called.
601 .. describe:: CONFIG_SECURITY=y
603 Enable us to choose different security modules.
605 .. describe:: CONFIG_SECURITY_SELINUX=y
607 CLIP OS intends to leverage SELinux in its security model.
609 .. describe:: CONFIG_SECURITY_SELINUX_BOOTPARAM=n
611 We do not need SELinux to be disableable.
613 .. describe:: CONFIG_SECURITY_SELINUX_DISABLE=n
615 We do not want SELinux to be disabled. In addition, this would prevent LSM
616 structures such as security hooks from being marked as read-only.
618 .. describe:: CONFIG_SECURITY_SELINUX_DEVELOP=y
620 For now, but will eventually be ``n``.
622 .. describe:: CONFIG_SECURITY_LOCKDOWN_LSM=y
623 CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
624 CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY=y
626 Basically, the *lockdown* LSM tries to strengthen the boundary between the
627 superuser and the kernel. The *integrity* mode thus restricts access to
628 features that would allow userland to modify the running kernel, and the
629 *confidentiality* mode extends these restrictions to features that would
630 allow userland to extract confidential information held inside the kernel.
631 Note that a significant portion of such features is already disabled in the
632 CLIP OS kernel due to our custom configuration. The *lockdown* functionality
633 is important for CLIP OS as we want to prevent an attacker, be he highly
634 privileged, from persisting on a compromised machine.
638 .. describe:: CONFIG_LSM="yama"
640 SELinux shall be stacked too once CLIP OS uses it.
644 .. describe:: CONFIG_SECURITY_YAMA=y
646 The Yama LSM currently provides ptrace scope restriction (which might be
647 redundant with CLIP-LSM in the future).
651 .. describe:: CONFIG_INTEGRITY=n
653 The integrity subsystem provides several components, the security benefits
654 of which are already enforced by CLIP OS (e.g., read-only mounts for all
655 parts of the system containing executable programs).
659 .. describe:: CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
661 See documentation about the ``kernel.perf_event_paranoid`` sysctl below.
666 .. describe:: CONFIG_SECURITY_TIOCSTI_RESTRICT=y
668 This prevents unprivileged users from using the TIOCSTI ioctl to inject
669 commands into other processes that share a tty session. [linux-hardened]_
673 .. describe:: CONFIG_GCC_PLUGIN_STACKLEAK=y
674 CONFIG_STACKLEAK_TRACK_MIN_SIZE=100
675 CONFIG_STACKLEAK_METRICS=n
676 CONFIG_STACKLEAK_RUNTIME_DISABLE=n
678 ``STACKLEAK`` erases the kernel stack before returning from system calls,
679 leaving it initialized to a poison value. This both reduces the information
680 that kernel stack leak bugs can reveal and the exploitability of uninitialized
681 stack variables. However, it does not cover functions reaching the same stack
682 depth as prior functions during the same system call.
684 It used to also block kernel stack depth overflows caused by ``alloca()``, such
685 as Stack Clash attacks. We maintained this functionality for our kernel for a
686 while but eventually `dropped it
687 <https://github.com/clipos/src_external_linux/commit/3e5f9114fc2f70f6d2ae5d10db10869e0564eb03>`_.
689 .. describe:: CONFIG_INIT_ON_FREE_DEFAULT_ON=y
690 CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
692 These set ``init_on_free=1`` and ``init_on_alloc=1`` on the kernel command
693 line. See the documentation of these kernel parameters for details.
695 .. describe:: CONFIG_PAGE_SANITIZE_VERIFY=y
696 CONFIG_SLAB_SANITIZE_VERIFY=y
698 Verify that newly allocated pages and slab allocations are zeroed to detect
699 write-after-free bugs. This works in concert with ``init_on_free`` and is
700 adjusted to not be redundant with ``init_on_alloc``.
709 GCC version 7.3.0 or higher is required to fully benefit from retpolines
710 (``-mindirect-branch=thunk-extern``).
713 Sysctl Security Tuning
714 ----------------------
716 Many sysctls are not security-relevant or only play a role if some kernel
717 configuration options are enabled/disabled. In other words, the following is
718 tightly related to the CLIP OS kernel configuration detailed above.
720 .. describe:: dev.tty.ldisc_autoload = 0
722 See ``CONFIG_LDISC_AUTOLOAD`` above, which serves as a default value for
725 .. describe:: kernel.kptr_restrict = 2
727 Hide kernel addresses in ``/proc`` and other interfaces, even to privileged
730 .. describe:: kernel.yama.ptrace_scope = 3
732 Enable the strictest ptrace scope restriction provided by the Yama LSM.
734 .. describe:: kernel.perf_event_paranoid = 3
736 This completely disallows unprivileged access to the ``perf_event_open()``
737 system call. This is actually not needed as we already enable
738 ``CONFIG_SECURITY_PERF_EVENTS_RESTRICT``. [linux-hardened]_
740 Note that this requires a patch included in linux-hardened (see `here
741 <https://lwn.net/Articles/696216/>`_ for the reason why it is not upstream).
742 Indeed, on a mainline kernel without such a patch, the above is equivalent
743 to setting this sysctl to ``2``, which would still allow the profiling of
746 .. describe:: kernel.tiocsti_restrict = 1
748 This is already forced by the ``CONFIG_SECURITY_TIOCSTI_RESTRICT`` kernel
749 configuration option that we enable. [linux-hardened]_
751 The following two sysctls help mitigating TOCTOU vulnerabilities by preventing
752 users from creating symbolic or hard links to files they do not own or have
753 read/write access to:
755 .. describe:: fs.protected_symlinks = 1
756 fs.protected_hardlinks = 1
758 In addition, the following other two sysctls impose restrictions on the opening
759 of FIFOs and regular files in order to make similar spoofing attacks harder
760 (note however that `these restrictions currently do not apply to networked
761 filesystems, among others <sysctl_protected_limitations_>`_):
763 .. describe:: fs.protected_fifos = 2
764 fs.protected_regular = 2
766 .. _sysctl_protected_limitations: https://www.openwall.com/lists/oss-security/2020/01/28/2
768 We do not simply disable the BPF Just in Time compiler as CLIP OS plans on
771 .. describe:: kernel.unprivileged_bpf_disabled = 1
773 Prevent unprivileged users from using BPF.
775 .. describe:: net.core.bpf_jit_harden = 2
777 Trades off performance but helps mitigate JIT spraying.
779 .. describe:: kernel.deny_new_usb = 0
781 The management of USB devices is handled at a higher level by CLIP OS.
784 .. describe:: kernel.device_sidechannel_restrict = 1
786 Restrict device timing side channels. [linux-hardened]_
788 .. describe:: fs.suid_dumpable = 0
790 Do not create core dumps of setuid executables. Note that we already
791 disable all core dumps by setting ``CONFIG_COREDUMP=n``.
793 .. describe:: kernel.pid_max = 65536
795 Increase the space for PID values.
797 .. describe:: kernel.modules_disabled = 1
799 Disable module loading once systemd has loaded the ones required for the
800 running machine according to a profile (i.e., a predefined and
801 hardware-specific list of modules).
803 Pure network sysctls (``net.ipv4.*`` and ``net.ipv6.*``) will be detailed in a
807 Command line parameters
808 -----------------------
810 We pass the following command line parameters to the kernel:
812 .. describe:: extra_latent_entropy
814 This parameter provided by a linux-hardened patch (based on the PaX
815 implementation) enables a very simple form of latent entropy extracted
816 during system start-up and added to the entropy obtained with
817 ``GCC_PLUGIN_LATENT_ENTROPY``. [linux-hardened]_
821 This force-enables KPTI even on CPUs claiming to be safe from Meltdown.
823 .. describe:: spectre_v2=on
825 Same reasoning as above but for the Spectre v2 vulnerability. Note that this
826 implies ``spectre_v2_user=on``, which enables the mitigation against user
827 space to user space task attacks (namely IBPB and STIBP when available and
830 .. describe:: spec_store_bypass_disable=seccomp
832 Same reasoning as above but for the Spectre v4 vulnerability. Note that this
833 mitigation requires updated microcode for Intel processors.
836 .. describe:: mds=full,nosmt
838 This parameter controls optional mitigations for the Microarchitectural Data
839 Sampling (MDS) class of Intel CPU vulnerabilities. Not specifying this
840 parameter is equivalent to setting ``mds=full``, which leaves SMT enabled
841 and therefore is not a complete mitigation. Note that this mitigation
842 requires an Intel microcode update and also addresses the TSX Asynchronous
843 Abort (TAA) Intel CPU vulnerability on systems that are affected by MDS.
845 .. describe:: iommu=force
847 Even if we correctly enable the IOMMU in the kernel configuration, the
848 kernel can still decide for various reasons to not initialize it at boot.
849 Therefore, we force it with this parameter. Note that with some Intel
850 chipsets, you may need to add ``intel_iommu=igfx_off`` to allow your GPU to
851 access the physical memory directly without going through the DMA Remapping.
853 .. describe:: slub_debug=F
855 The ``F`` option adds many sanity checks to various slab operations. Other
856 interesting options that we considered but eventually chose to not use are:
858 * The ``P`` option, which enables poisoning on slab cache allocations,
859 disables the ``init_on_free`` and ``SLAB_SANITIZE_VERIFY`` features. As
860 they respectively poison with zeroes on object freeing and check the
861 zeroing on object allocations, we prefer enabling them instead of using
863 * The ``Z`` option enables red zoning, i.e., it adds extra areas around
864 slab objects that detect when one is overwritten past its real size.
865 This can help detect overflows but we already rely on ``SLAB_CANARY``
866 provided by linux-hardened. A canary is much better than a simple red
867 zone as it is supposed to be random.
869 .. describe:: page_alloc.shuffle=1
871 See ``CONFIG_SHUFFLE_PAGE_ALLOCATOR``.
873 .. describe:: rng_core.default_quality=512
875 Increase trust in the TPM's HWRNG to robustly and fastly initialize Linux's
876 CSPRNG by **crediting** half of the entropy it provides.
880 * ``slub_nomerge`` is not used as we already set
881 ``CONFIG_SLAB_MERGE_DEFAULT=n`` in the kernel configuration.
882 * ``l1tf``: The built-in PTE Inversion mitigation is sufficient to mitigate
883 the L1TF vulnerability as long as CLIP OS is not used as an hypervisor with
884 untrusted guest VMs. If it were to be someday, ``l1tf=full,force`` should be
885 used to force-enable VMX unconditional cache flushes and force-disable SMT
886 (note that an Intel microcode update is not required for this mitigation to
887 work but improves performance by providing a way to invalidate caches with a
889 * ``tsx=off``: This parameter is already set by default thanks to
890 ``CONFIG_X86_INTEL_TSX_MODE_OFF``. It deactivates the Intel TSX feature on
891 CPUs that support TSX control (i.e. are recent enough or received a microcode
892 update) and that are not already vulnerable to MDS, therefore mitigating the
893 TSX Asynchronous Abort (TAA) Intel CPU vulnerability.
894 * ``tsx_async_abort``: This parameter controls optional mitigations for the TSX
895 Asynchronous Abort (TAA) Intel CPU vulnerability. Due to our use of
896 ``mds=full,nosmt`` in addition to ``CONFIG_X86_INTEL_TSX_MODE_OFF``, CLIP OS
897 is already protected against this vulnerability as long as the CPU microcode
898 has been updated, whether or not the CPU is affected by MDS. For the record,
899 if we wanted to keep TSX activated, we could specify
900 ``tsx_async_abort=full,nosmt``. Not specifying this parameter is equivalent
901 to setting ``tsx_async_abort=full``, which leaves SMT enabled and therefore
902 is not a complete mitigation. Note that this mitigation requires an Intel
903 microcode update and has no effect on systems that are already affected by
904 MDS and enable mitigations against it, nor on systems that disable TSX.
905 * ``kvm.nx_huge_pages``: This parameter allows to control the KVM hypervisor
906 iTLB multihit mitigations. Such mitigations are not needed as long as CLIP OS
907 is not used as an hypervisor with untrusted guest VMs. If it were to be
908 someday, ``kvm.nx_huge_pages=force`` should be used to ensure that guests
909 cannot exploit the iTLB multihit erratum to crash the host.
910 * ``mitigations``: This parameter controls optional mitigations for CPU
911 vulnerabilities in an arch-independent and more coarse-grained way. For now,
912 we keep using arch-specific options for the sake of explicitness. Not setting
913 this parameter equals setting it to ``auto``, which itself does not update
915 * ``init_on_free=1`` is automatically set due to ``INIT_ON_FREE_DEFAULT_ON``. It
916 zero-fills page and slab allocations on free to reduce risks of information
917 leaks and help mitigate a subset of use-after-free vulnerabilities.
918 * ``init_on_alloc=1`` is automatically set due to ``INIT_ON_ALLOC_DEFAULT_ON``.
919 The purpose of this functionality is to eliminate several kinds of
920 *uninitialized heap memory* flaws by zero-filling:
922 * all page allocator and slab allocator memory when allocated: this is
923 already guaranteed by our use of ``init_on_free`` in combination with
924 ``PAGE_SANITIZE_VERIFY`` and ``SLAB_SANITIZE_VERIFY`` from linux-hardened,
925 and thus has no effect;
926 * a few more *special* objects when allocated: these are the ones for which
927 we enable ``init_on_alloc`` as they are not covered by the aforementioned
928 combination of ``init_on_free`` and ``SANITIZE_VERIFY`` features.
930 .. rubric:: Citations and origin of some items
933 This item is provided by the ``linux-hardened`` patches.
935 .. vim: set tw=79 ts=2 sts=2 sw=2 et: