You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

75 lines
4.3 KiB

  1. Hello everyone,
  2. i guess those of you tracking the test patches have already noticed that
  3. we recently added support for UDEREF on amd64 as well. now that hopefully
  4. the silly problems have been worked out, it's time to talk about it a bit.
  5. before everything, let's get out one thing that i'll probably repeat every
  6. now and then: UDEREF on amd64 isn't and will never be the same as on i386.
  7. it's just the way it is, it cannot be 'fixed'. now let's see what it can
  8. still do on amd64.
  9. as you probably know (does anyone read config help? ;), UDEREF wants to
  10. ensure that userland and kernel address spaces are properly separated. in
  11. particular, gratuitous dereference of userland addresses by kernel code
  12. should result in an oops instead of userland taking over kernel data flow,
  13. or worse, control flow as well (think of the past year's worth of NULL
  14. dereference based exploits). this separation can be implemented with pretty
  15. much no overhead on i386, but unfortunately amd64 lacks the necessary
  16. segmentation logic and the alternative ain't pretty ;).
  17. so what does UDEREF do on amd64? on userland->kernel transitions it basically
  18. unmaps the original userland address range and remaps it at a different address
  19. using non-exec/supervisor rights (so direct code execution as used by most
  20. exploits is not possible at least). this remapping is the main cause of its
  21. performance impact as well, and i think it cannot really be reduced any further.
  22. in any case, most kernel code will run without access to the actual userland
  23. address range, so in this sense it's similar to what UDEREF on i386 offers.
  24. this is also where the similarities end :), so let's look at the bad stuff
  25. now. UDEREF/amd64 doesn't ensure that the (legitimate) userland accessor
  26. functions cannot actually access kernel memory when only userland is allowed
  27. (some in-kernel users of certain syscalls can temporarily access kernel memory
  28. as userland, and that is enforced on UDEREF/i386 but not on amd64). so if
  29. there's a bug where userland can trick the kernel into accessing a userland
  30. pointer that actually points to kernel space, it'll succeed, unlike on i386.
  31. the other bad thing is the presence of the userland shadow area. this has
  32. two consequences: 1. the userland address space size is smaller under UDEREF
  33. (42 vs. 47 bits, with corresponding reduction of ASLR of course), 2. this
  34. shadow area is always mapped so kernel code accidentally accessing its range
  35. may not oops on it and can be exploited (such accesses can usually happen only
  36. if an exploit can make the kernel dereference arbitrary addresses in which
  37. case the presence of this area is the least of your concerns though).
  38. what about performance? well, 'it depends', in particular it depends on the
  39. amount of user/kernel transitions of your workload as that's where the extra
  40. code really hits (it's basically a TLB flush and two CR0 writes if you have
  41. KERNEXEC as well, say 600 cycles + TLB repopulation time). on a simple
  42. compilation test i get these times:
  43. #time emerge portage -j2 on 2.6.33.1-pax no UDEREF
  44. 25.55user 7.44system 0:36.16elapsed 91%CPU (0avgtext+0avgdata 555648maxresident)k
  45. 56inputs+56816outputs (0major+1715421minor)pagefaults 0swaps
  46. #time emerge portage -j2 on 2.6.32.10-pax UDEREF KERNEXEC
  47. 28.01user 11.03system 0:38.54elapsed 101%CPU (0avgtext+0avgdata 555600maxresident)k
  48. 56inputs+56832outputs (0major+1718704minor)pagefaults 0swaps
  49. feel free to submit benchmarks (preferably on real life apps, not synthetic) so
  50. that people know better what to expect. as usual, virtualization doesn't like the
  51. tricks and suffers more, although less than i386, so the pax_nouderef kernel
  52. command line option will work for amd64 as well.
  53. last but not least a note on implementation. besides the already mentioned
  54. special userland shadow area there's another important bit: per-CPU PGDs.
  55. what this does is simple: each CPU gets its own top-level page directory
  56. for its exclusive use (instead of the usual per-process PGD). this among
  57. other things means that we can begin the proper lockdown of the entire page
  58. table hierarchy (a todo item for KERNEXEC as well, that's why this feature
  59. is also now enabled there, even on i386/PAE).
  60. so this is it in a nutshell, if you have questions, comments, complaints,
  61. etc, you know where to reach us ;).