Discussion:
Situations about PC values in kernel data segments
Yue Chen
2015-04-11 09:18:28 UTC
Permalink
Dear all,

We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).

Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.

Here are some situations I can think about:
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like Task
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment not
present'' exception handler).

Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for saved
register values.

Best thanks and regards,
Yue
John Baldwin
2015-04-17 13:22:43 UTC
Permalink
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like Task
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment not
present'' exception handler).
Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for saved
register values.
For i386 and amd64, I think all of the code that is executed does live in a
.text segment. When pcb_onfault is used it is set to point to code in a .text
segment, not anywhere else. Similarly, fault and exception handlers as well
as the stub for new threads/processes after fork/thread_create is in .text
as well. There are multiple text segments present when modules are loaded
of course, but you should be able to enumerate all of those in the linker.
--
John Baldwin
Konstantin Belousov
2015-04-17 13:43:48 UTC
Permalink
Post by John Baldwin
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like Task
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment not
present'' exception handler).
Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for saved
register values.
For i386 and amd64, I think all of the code that is executed does live in a
.text segment. When pcb_onfault is used it is set to point to code in a .text
segment, not anywhere else. Similarly, fault and exception handlers as well
as the stub for new threads/processes after fork/thread_create is in .text
as well. There are multiple text segments present when modules are loaded
of course, but you should be able to enumerate all of those in the linker.
Wasn't bpf enhanced to compile filters to the native code, on x86 ?
Also, what about BIOS code ? Esp. since the spread of UEFI and hope that
our kernel starts using UEFI runtime services one day. My point is that
_relying_ on enumeration of the text segments for kernel and modules to
determine all executable memory is not correct.
Warner Losh
2015-04-17 13:46:29 UTC
Permalink
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like Task
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment not
present'' exception handler).
Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for saved
register values.
For i386 and amd64, I think all of the code that is executed does live in a
.text segment. When pcb_onfault is used it is set to point to code in a .text
segment, not anywhere else. Similarly, fault and exception handlers as well
as the stub for new threads/processes after fork/thread_create is in .text
as well. There are multiple text segments present when modules are loaded
of course, but you should be able to enumerate all of those in the linker.
Wasn't bpf enhanced to compile filters to the native code, on x86 ?
Also, what about BIOS code ? Esp. since the spread of UEFI and hope that
our kernel starts using UEFI runtime services one day. My point is that
_relying_ on enumeration of the text segments for kernel and modules to
determine all executable memory is not correct.
Yes. That ‘one day’ will be quite soon
 I have patches in my patch queue
which should do the right thing.

Warner
John Baldwin
2015-04-20 15:00:27 UTC
Permalink
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like Task
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment not
present'' exception handler).
Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for saved
register values.
For i386 and amd64, I think all of the code that is executed does live in a
.text segment. When pcb_onfault is used it is set to point to code in a .text
segment, not anywhere else. Similarly, fault and exception handlers as well
as the stub for new threads/processes after fork/thread_create is in .text
as well. There are multiple text segments present when modules are loaded
of course, but you should be able to enumerate all of those in the linker.
Wasn't bpf enhanced to compile filters to the native code, on x86 ?
Also, what about BIOS code ? Esp. since the spread of UEFI and hope that
our kernel starts using UEFI runtime services one day. My point is that
_relying_ on enumeration of the text segments for kernel and modules to
determine all executable memory is not correct.
It depends on the scope. If this is for a graduate research project to build
a prototype to see if this is feasible, then some cavets are acceptable if
they are known. One could be to disallow the bpf JIT option (I believe it is
not in GENERIC)? EFI is actually fairly easily handled since the EFI memory
map gives you the bounds of the executable code and you can just treat that as
an additional .text segment.
--
John Baldwin
Yue Chen
2015-04-20 17:54:28 UTC
Permalink
Post by John Baldwin
Are you asking if you can figure out if a given PC value used as the value
of $rip for an arbitrary instruction is valid, or are you trying to
enumerate
Post by John Baldwin
all the words in memory that hold a pointer to a .text value (like
pcb_onfault)?
I assumed the former.
So sorry for the confusion. I mean any other situations of the *latter*
one, which *excludes* function pointers.
And this does not have to be a full-word pointer. This can be a half-word
displacement/offset
to the address in .text, or a special encoding of the address as well.
Post by John Baldwin
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g.,
the
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function
entry
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values
into a
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack;
switch/case
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
jump table target; page fault handler (pcb_onfault on *BSD);
restartable
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
atomic sequences (RAS) registry; thread/process context structure
like Task
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
state segment (TSS), process control block (PCB) and thread control
block
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
(TCB); situations for debugging purposes (e.g., like those in
``segment not
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
present'' exception handler).
Additionally, does any of these addresses have offset formats or
special
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for
saved
Post by Konstantin Belousov
Post by John Baldwin
Post by Yue Chen
register values.
For i386 and amd64, I think all of the code that is executed does live
in a
Post by Konstantin Belousov
Post by John Baldwin
.text segment. When pcb_onfault is used it is set to point to code in
a .text
Post by Konstantin Belousov
Post by John Baldwin
segment, not anywhere else. Similarly, fault and exception handlers
as well
Post by Konstantin Belousov
Post by John Baldwin
as the stub for new threads/processes after fork/thread_create is in
.text
Post by Konstantin Belousov
Post by John Baldwin
as well. There are multiple text segments present when modules are
loaded
Post by Konstantin Belousov
Post by John Baldwin
of course, but you should be able to enumerate all of those in the
linker.
Post by Konstantin Belousov
Wasn't bpf enhanced to compile filters to the native code, on x86 ?
Also, what about BIOS code ? Esp. since the spread of UEFI and hope that
our kernel starts using UEFI runtime services one day. My point is that
_relying_ on enumeration of the text segments for kernel and modules to
determine all executable memory is not correct.
It depends on the scope. If this is for a graduate research project to build
a prototype to see if this is feasible, then some cavets are acceptable if
they are known. One could be to disallow the bpf JIT option (I believe it is
not in GENERIC)? EFI is actually fairly easily handled since the EFI memory
map gives you the bounds of the executable code and you can just treat that as
an additional .text segment.
--
John Baldwin
Yue Chen
2015-04-17 20:19:54 UTC
Permalink
I mean, the PC values in non-.text segments like .data, .rodata, stack,
heap, etc. Usually this is for comparison purposes. E.g., compare the
faulting PC against some range already stored in a table/handler.
When pcb_onfault is used it is set to point to code in a .text segment,
not anywhere else.

The pointer value stored in non-.text segments is a PC value (instruction
address in .text), like 0xffffffff12345678, and may not be a function entry
point address, right?
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like
Task
Post by Yue Chen
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment
not
Post by Yue Chen
present'' exception handler).
Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for
saved
Post by Yue Chen
register values.
For i386 and amd64, I think all of the code that is executed does live in a
.text segment. When pcb_onfault is used it is set to point to code in a .text
segment, not anywhere else. Similarly, fault and exception handlers as well
as the stub for new threads/processes after fork/thread_create is in .text
as well. There are multiple text segments present when modules are loaded
of course, but you should be able to enumerate all of those in the linker.
--
John Baldwin
John Baldwin
2015-04-20 15:07:05 UTC
Permalink
Post by Yue Chen
I mean, the PC values in non-.text segments like .data, .rodata, stack,
heap, etc. Usually this is for comparison purposes. E.g., compare the
faulting PC against some range already stored in a table/handler.
When pcb_onfault is used it is set to point to code in a .text segment,
not anywhere else.
The pointer value stored in non-.text segments is a PC value (instruction
address in .text), like 0xffffffff12345678, and may not be a function entry
point address, right?
I think I do not follow your question.

Are you asking if you can figure out if a given PC value used as the value
of $rip for an arbitrary instruction is valid, or are you trying to enumerate
all the words in memory that hold a pointer to a .text value (like
pcb_onfault)?

I assumed the former. AFAIK, the kernel is not going to execute any code
from .data, .rodata, or the stack. For things like pcb_onfault, the value
stored is in .text, like this:

ENTRY(copyout)
PUSH_FRAME_POINTER
movq PCPU(CURPCB),%rax
movq $copyout_fault,PCB_ONFAULT(%rax)
testq %rdx,%rdx /* anything to do? */
jz done_copyout
...
done_copyout:
xorl %eax,%eax
movq PCPU(CURPCB),%rdx
movq %rax,PCB_ONFAULT(%rdx)
POP_FRAME_POINTER
ret

ALIGN_TEXT
copyout_fault:
movq PCPU(CURPCB),%rdx
movq $0,PCB_ONFAULT(%rdx)
movq $EFAULT,%rax
POP_FRAME_POINTER
ret
END(copyout)

Here 'copyout_fault' is in .text, not in a different section.
Post by Yue Chen
Post by Yue Chen
Dear all,
We are working on a project about OS security.
We wonder in which situations the program counter (PC) value (e.g., the
value in %RIP on x86_64, i.e, instruction address) could be in kernel
(module) data segments (including stack, heap, etc.).
Here we mainly care about the address/value that are NOT function entry
points since there exist a number of function pointers. Also, we only
consider the normal cases because one can write arbitrary values into a
variable/pointer. And we mainly consider i386, AMD64 and ARM.
function/interrupt/exception/syscall return address on stack; switch/case
jump table target; page fault handler (pcb_onfault on *BSD); restartable
atomic sequences (RAS) registry; thread/process context structure like
Task
Post by Yue Chen
state segment (TSS), process control block (PCB) and thread control block
(TCB); situations for debugging purposes (e.g., like those in ``segment
not
Post by Yue Chen
present'' exception handler).
Additionally, does any of these addresses have offset formats or special
encodings? For example, on x86_64, we may use 32-bit RIP-relative
(addressing) offset to represent a 64-bit full address. In glibc's
setjmp/longjmp jmp_buf, they use a special encoding (PTR_MANGLE) for
saved
Post by Yue Chen
register values.
For i386 and amd64, I think all of the code that is executed does live in a
.text segment. When pcb_onfault is used it is set to point to code in a .text
segment, not anywhere else. Similarly, fault and exception handlers as well
as the stub for new threads/processes after fork/thread_create is in .text
as well. There are multiple text segments present when modules are loaded
of course, but you should be able to enumerate all of those in the linker.
--
John Baldwin
--
John Baldwin
Loading...