You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

65 lines
3.7 KiB

  1. 1. Design
  2. The goal of SEGMEXEC is to implement the non-executable page feature using
  3. the segmentation logic of IA-32 based CPUs.
  4. On IA-32 Linux runs in protected mode with paging enabled. This means that
  5. for every memory access (be that instruction fetch or normal data access)
  6. the CPU will perform a two step address translation. In the first step the
  7. logical address decoded from the instruction is translated into a linear
  8. (or in another terminology, virtual) address. This translation is done by
  9. the segmentation logic whose details are explained in a separate document.
  10. While Linux effectively does not use segmentation by creating 0 based and
  11. 4 GB limited segments for both code and data accesses (therefore logical
  12. addresses are the same as linear addresses), it is possible to set up
  13. segments that allow to implement non-executable pages.
  14. The basic idea is that we divide the 3 GB userland linear address space
  15. into two equal halves and use one to store mappings meant for data access
  16. (that is, we define a data segment descriptor to cover the 0-1.5 GB linear
  17. address range) and the other for storing mappings for execution (that is,
  18. we define a code segment descriptor to cover the 1.5-3 GB linear address
  19. range). Since an executable mapping can be used for data accesses as well,
  20. we will have to ensure that such mappings are visible in both segments
  21. and mirror each other. This setup will then separate data accesses from
  22. instruction fetches in the sense that they will hit different linear
  23. addresses and therefore allow for control/intervention based on the access
  24. type. In particular, if a data-only (and therefore non-executable) mapping
  25. is present only in the 0-1.5 GB linear address range, then instruction
  26. fetches to the same logical addresses will end up in the 1.5-3 GB linear
  27. address range and will raise a page fault hence allow detecting such
  28. execution attempts.
  29. 2. Implementation
  30. The core of SEGMEXEC is vma mirroring which is discussed in a separate
  31. document. The mirrors for executable file mappings are set up in do_mmap()
  32. (an inline function defined in include/linux/mm.h) except for a special
  33. case with RANDEXEC (see separate document). do_mmap() is the one common
  34. function called by both userland and kernel originated mapping requests.
  35. The special code and data segment descriptors are placed into a new GDT
  36. called gdt_table2 in arch/i386/kernel/head.S. The separate GDT is needed
  37. for two reasons: first it simplifies the implementation in that the CS/SS
  38. selectors used for userland do not have to change, and second, this setup
  39. prevents a simple attack that a single GDT setup would be subject to (the
  40. retf and other instructions could be abused to break out of the restricted
  41. code segment used for SEGMEXEC tasks). Since the GDT stores the userland
  42. code/data descriptors which are different for SEGMEXEC tasks, we have
  43. to modify the low-level context switching code called __switch_to() in
  44. arch/i386/kernel/process.c and the last steps of load_elf_binary() in
  45. fs/binfmt_elf.c (where the task is first prepared to execute in userland).
  46. The GDT also has APM specific descriptors which are set up at runtime and
  47. must be propagated to the second GDT as well (in arch/i386/kernel/apm.c).
  48. Finally the GDT stores also the per CPU TSS and LDT descriptors whose
  49. content must be synchronized between the two GDTs (in set_tss_desc() and
  50. set_ldt_desc() in arch/i386/kernel/traps.c).
  51. Since the kernel allows userland to define its own code segment descriptors
  52. in the LDT, we have to disallow it since it could be used to break out of
  53. the SEGMEXEC specific restricted code segment (the extra checks are in
  54. write_ldt() in arch/i386/kernel/ldt.c).