A more general possible meltdown/spectre countermeasure

Discussion:

Eric McCorkle

2018-01-05 22:02:11 UTC

Re-posting to -hackers and -arch. I'm going to start working on
something like this over the weekend.

-------- Forwarded Message --------
Subject: A more general possible meltdown/spectre countermeasure
Date: Thu, 4 Jan 2018 23:05:40 -0500
From: Eric McCorkle <***@metricspace.net>
To: freebsd-***@freebsd.org <freebsd-***@freebsd.org>

I've thought more about how to deal with meltdown/spectre, and I have an
idea I'd like to put forward. However, I'm still in something of a
panic mode, so I'm not certain as to its effectiveness. Needless to
say, I welcome any feedback on this, and I may be completely off-base.

I'm calling this a "countermeasure" as opposed to a "mitigation", as
it's something that requires modification of code as opposed to a
drop-in patch.

== Summary ==

Provide a kernel and userland API by which memory allocation can be done
with extended attributes. In userland, this could be accomplished by
extending MMAP flags, and I could imagine a malloc-with-attributes flag.
In kernel space, this must already exist, as drivers need to allocate
memory with various MTRR-type attributes set.

The immediate aim here is to store sensitive information that must
remain memory-resident in non-cacheable memory locations (or, if more
effective attribute combinations exist, using those instead). See the
rationale for the argument why this should work.

Assuming the rationale holds, then the attack surface should be greatly
reduced. Attackers would need to grab sensitive data out of stack
frames or similar locations if/when it gets copied there for faster use.
Moreover, if this is done right, it could dovetail nicely into a
framework for storing and processing sensitive assets in more secure
hardware[0] (like smart cards, the FPGAs I posted earlier, or other
options).

The obvious downside is that you take a performance hit storing things
in non-cacheable locations, especially if you plan on doing heavy
computation in that memory (say, encryption/decryption). However, this
is almost certainly going to be less than the projected 30-50%
performance hit from other mitigations. Also, this technique should
work against spectre as well as meltdown (assuming the rationale holds).

The second downside is that you have to modify code for this to work,
and you have to be careful not to keep copies of sensitive information
around too long (this gets tricky in userland, where you might get
interrupted and switched out).

[0]: Full disclosure, enabling open hardware implementations of this
kind of thing is something of an agenda of mine.

== Rationale ==

(Again, I'm tired, rushed, and somewhat panicked so my logic could be
faulty at any point, so please point it out if it is)

The rationale for why this should work relies on assumptions about
out-of-order pipelines that cannot be guaranteed to hold, but are
extremely likely to be true.

As background, these attacks depend on out-of-order execution performing
operations that end up affecting cache and branch-prediction state,
ultimately storing information about sensitive data in these
side-channels before the fault conditions are detected and acted upon.
I'll borrow terminology from the paper, using "transient instructions"
to refer to speculatively executed instructions that will eventually be
cancelled by a fault.

These attacks depend entirely on transient instructions being able to
get sensitive information into the processor core and then perform some
kind of instruction on them before the fault condition cancels them.
Therefore, anything that prevents them from doing this *should* counter
the attack. If the actual sensitive data never makes it to the core
before the fault is detected, the dependent memory accesses/branches
never get executed and the data never makes it to the side-channels.

Another assumption here is that CPU architects are going to want to
squash faulted instructions ASAP and stop issuing along those
speculative branches, so as to reclaim execution units. So I'm assuming
once a fault comes back from address translation, then transient
execution stops dead.

Now, break down the cases for whether the address containing sensitive
data is in cache and TLB or not. (I'm assuming here that caches are
virtually-indexed, which enables cache lookups to bypass address
translation.)

* In cache, in TLB: You end up basically racing between the cache and
TLB, which will very likely end up detecting the fault before the data
arrives, but at the very worst, you get one or two cycles of transient
instruction execution before the fault.

* In cache, not in TLB: Virtually-indexed tagged means you get a cache
lookup racing a page-table walk. The cache lookup beats the page table
walk by potentially hundreds (maybe thousands) of cycles, giving you a
bunch of transient instructions before a fault gets triggered. This is
the main attack case.

* Not in cache, in TLB: Memory access requires address translation,
which comes back almost immediately as a fault.

* Not in cache, not in TLB: You have to do a page table walk before you
can fetch the location, as you have to go out to physical memory (and
therefore need a physical address). The page table walk will come back
with a fault, stopping the attack.

So, unless I'm missing something here, both non-cached cases defeat the
meltdown attack, as you *cannot* get the data unless you do address
translation first (and therefore detect faults).

As for why this defeats the spectre attack, the logic is similar: you've
jumped into someone else's executable code, hoping to scoop up enough
information into your branch predictor before the fault kicks you out.
However, to capture anything about sensitive information in your
side-channels, the transient instructions need to actually get it into
the core before a fault gets detected. The same case analysis as above
applies, so you never actually get the sensitive info into the core
before a fault comes back and you get squashed.

[1]: A physically-indexed cache would be largely immune to this attack,
as you'd have to do address translation before doing a cache lookup.

I have some ideas that can build on this, but I'd like to get some
feedback first.
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-security
To unsubscribe, send any mail to "freebsd-security-***@freebsd.org"

Warner Losh

2018-01-05 22:10:27 UTC

Permalink

I think this is fatally flawed.

The side channel is the cache. Not the data at risk.

Any mapped memory, cached or not, can be used to influence the cache.
Storing stuff in uncached memory won't affect the side channel one bit.

Basically, all attacks boil down to tricking the processor, at elevated
privs, to doing something like

a = foo[offset];

where foo + offset are designed to communicate information by populating a
cache line. offset need not be cached itself and can be the result of
simple computations that depend on anything accessible at all in the kernel.

Warner

Post by Eric McCorkle
Re-posting to -hackers and -arch. I'm going to start working on
something like this over the weekend.
-------- Forwarded Message --------
Subject: A more general possible meltdown/spectre countermeasure
Date: Thu, 4 Jan 2018 23:05:40 -0500
I've thought more about how to deal with meltdown/spectre, and I have an
idea I'd like to put forward. However, I'm still in something of a
panic mode, so I'm not certain as to its effectiveness. Needless to
say, I welcome any feedback on this, and I may be completely off-base.
I'm calling this a "countermeasure" as opposed to a "mitigation", as
it's something that requires modification of code as opposed to a
drop-in patch.
== Summary ==
Provide a kernel and userland API by which memory allocation can be done
with extended attributes. In userland, this could be accomplished by
extending MMAP flags, and I could imagine a malloc-with-attributes flag.
In kernel space, this must already exist, as drivers need to allocate
memory with various MTRR-type attributes set.
The immediate aim here is to store sensitive information that must
remain memory-resident in non-cacheable memory locations (or, if more
effective attribute combinations exist, using those instead). See the
rationale for the argument why this should work.
Assuming the rationale holds, then the attack surface should be greatly
reduced. Attackers would need to grab sensitive data out of stack
frames or similar locations if/when it gets copied there for faster use.
Moreover, if this is done right, it could dovetail nicely into a
framework for storing and processing sensitive assets in more secure
hardware[0] (like smart cards, the FPGAs I posted earlier, or other
options).
The obvious downside is that you take a performance hit storing things
in non-cacheable locations, especially if you plan on doing heavy
computation in that memory (say, encryption/decryption). However, this
is almost certainly going to be less than the projected 30-50%
performance hit from other mitigations. Also, this technique should
work against spectre as well as meltdown (assuming the rationale holds).
The second downside is that you have to modify code for this to work,
and you have to be careful not to keep copies of sensitive information
around too long (this gets tricky in userland, where you might get
interrupted and switched out).
[0]: Full disclosure, enabling open hardware implementations of this
kind of thing is something of an agenda of mine.
== Rationale ==
(Again, I'm tired, rushed, and somewhat panicked so my logic could be
faulty at any point, so please point it out if it is)
The rationale for why this should work relies on assumptions about
out-of-order pipelines that cannot be guaranteed to hold, but are
extremely likely to be true.
As background, these attacks depend on out-of-order execution performing
operations that end up affecting cache and branch-prediction state,
ultimately storing information about sensitive data in these
side-channels before the fault conditions are detected and acted upon.
I'll borrow terminology from the paper, using "transient instructions"
to refer to speculatively executed instructions that will eventually be
cancelled by a fault.
These attacks depend entirely on transient instructions being able to
get sensitive information into the processor core and then perform some
kind of instruction on them before the fault condition cancels them.
Therefore, anything that prevents them from doing this *should* counter
the attack. If the actual sensitive data never makes it to the core
before the fault is detected, the dependent memory accesses/branches
never get executed and the data never makes it to the side-channels.
Another assumption here is that CPU architects are going to want to
squash faulted instructions ASAP and stop issuing along those
speculative branches, so as to reclaim execution units. So I'm assuming
once a fault comes back from address translation, then transient
execution stops dead.
Now, break down the cases for whether the address containing sensitive
data is in cache and TLB or not. (I'm assuming here that caches are
virtually-indexed, which enables cache lookups to bypass address
translation.)
* In cache, in TLB: You end up basically racing between the cache and
TLB, which will very likely end up detecting the fault before the data
arrives, but at the very worst, you get one or two cycles of transient
instruction execution before the fault.
* In cache, not in TLB: Virtually-indexed tagged means you get a cache
lookup racing a page-table walk. The cache lookup beats the page table
walk by potentially hundreds (maybe thousands) of cycles, giving you a
bunch of transient instructions before a fault gets triggered. This is
the main attack case.
* Not in cache, in TLB: Memory access requires address translation,
which comes back almost immediately as a fault.
* Not in cache, not in TLB: You have to do a page table walk before you
can fetch the location, as you have to go out to physical memory (and
therefore need a physical address). The page table walk will come back
with a fault, stopping the attack.
So, unless I'm missing something here, both non-cached cases defeat the
meltdown attack, as you *cannot* get the data unless you do address
translation first (and therefore detect faults).
As for why this defeats the spectre attack, the logic is similar: you've
jumped into someone else's executable code, hoping to scoop up enough
information into your branch predictor before the fault kicks you out.
However, to capture anything about sensitive information in your
side-channels, the transient instructions need to actually get it into
the core before a fault gets detected. The same case analysis as above
applies, so you never actually get the sensitive info into the core
before a fault comes back and you get squashed.
[1]: A physically-indexed cache would be largely immune to this attack,
as you'd have to do address translation before doing a cache lookup.
I have some ideas that can build on this, but I'd like to get some
feedback first.
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-security
"
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch

Eric McCorkle

2018-01-05 22:15:26 UTC

Permalink

Right, but you have to get the value "foo" into the pipeline in order
for it to affect the side-channels. This technique attempts to stop
that from happening.

Unless I made a mistake, non-cached memory reads force address
translation to happen first, which detects faults and blocks the
meltdown attack.

It also stops spectre with very high probability, as it's very unlikely
that an uncached load will arrive before the speculative thread gets
squashed.

Post by Warner Losh
I think this is fatally flawed.
The side channel is the cache. Not the data at risk.
Any mapped memory, cached or not, can be used to influence the cache.
Storing stuff in uncached memory won't affect the side channel one bit.
Basically, all attacks boil down to tricking the processor, at elevated
privs, to doing something like
a = foo[offset];
where foo + offset are designed to communicate information by populating
a cache line. offset need not be cached itself and can be the result of
simple computations that depend on anything accessible at all in the kernel.
Warner
Re-posting to -hackers and -arch. I'm going to start working on
something like this over the weekend.
-------- Forwarded Message --------
Subject: A more general possible meltdown/spectre countermeasure
Date: Thu, 4 Jan 2018 23:05:40 -0500
I've thought more about how to deal with meltdown/spectre, and I have an
idea I'd like to put forward. However, I'm still in something of a
panic mode, so I'm not certain as to its effectiveness. Needless to
say, I welcome any feedback on this, and I may be completely off-base.
I'm calling this a "countermeasure" as opposed to a "mitigation", as
it's something that requires modification of code as opposed to a
drop-in patch.
== Summary ==
Provide a kernel and userland API by which memory allocation can be done
with extended attributes. In userland, this could be accomplished by
extending MMAP flags, and I could imagine a malloc-with-attributes flag.
In kernel space, this must already exist, as drivers need to allocate
memory with various MTRR-type attributes set.
The immediate aim here is to store sensitive information that must
remain memory-resident in non-cacheable memory locations (or, if more
effective attribute combinations exist, using those instead). See the
rationale for the argument why this should work.
Assuming the rationale holds, then the attack surface should be greatly
reduced. Attackers would need to grab sensitive data out of stack
frames or similar locations if/when it gets copied there for faster use.
Moreover, if this is done right, it could dovetail nicely into a
framework for storing and processing sensitive assets in more secure
hardware[0] (like smart cards, the FPGAs I posted earlier, or other
options).
The obvious downside is that you take a performance hit storing things
in non-cacheable locations, especially if you plan on doing heavy
computation in that memory (say, encryption/decryption). However, this
is almost certainly going to be less than the projected 30-50%
performance hit from other mitigations. Also, this technique should
work against spectre as well as meltdown (assuming the rationale holds).
The second downside is that you have to modify code for this to work,
and you have to be careful not to keep copies of sensitive information
around too long (this gets tricky in userland, where you might get
interrupted and switched out).
[0]: Full disclosure, enabling open hardware implementations of this
kind of thing is something of an agenda of mine.
== Rationale ==
(Again, I'm tired, rushed, and somewhat panicked so my logic could be
faulty at any point, so please point it out if it is)
The rationale for why this should work relies on assumptions about
out-of-order pipelines that cannot be guaranteed to hold, but are
extremely likely to be true.
As background, these attacks depend on out-of-order execution performing
operations that end up affecting cache and branch-prediction state,
ultimately storing information about sensitive data in these
side-channels before the fault conditions are detected and acted upon.
I'll borrow terminology from the paper, using "transient instructions"
to refer to speculatively executed instructions that will eventually be
cancelled by a fault.
These attacks depend entirely on transient instructions being able to
get sensitive information into the processor core and then perform some
kind of instruction on them before the fault condition cancels them.
Therefore, anything that prevents them from doing this *should* counter
the attack. If the actual sensitive data never makes it to the core
before the fault is detected, the dependent memory accesses/branches
never get executed and the data never makes it to the side-channels.
Another assumption here is that CPU architects are going to want to
squash faulted instructions ASAP and stop issuing along those
speculative branches, so as to reclaim execution units. So I'm assuming
once a fault comes back from address translation, then transient
execution stops dead.
Now, break down the cases for whether the address containing sensitive
data is in cache and TLB or not. (I'm assuming here that caches are
virtually-indexed, which enables cache lookups to bypass address
translation.)
* In cache, in TLB: You end up basically racing between the cache and
TLB, which will very likely end up detecting the fault before the data
arrives, but at the very worst, you get one or two cycles of transient
instruction execution before the fault.
* In cache, not in TLB: Virtually-indexed tagged means you get a cache
lookup racing a page-table walk. The cache lookup beats the page table
walk by potentially hundreds (maybe thousands) of cycles, giving you a
bunch of transient instructions before a fault gets triggered. This is
the main attack case.
* Not in cache, in TLB: Memory access requires address translation,
which comes back almost immediately as a fault.
* Not in cache, not in TLB: You have to do a page table walk before you
can fetch the location, as you have to go out to physical memory (and
therefore need a physical address). The page table walk will come back
with a fault, stopping the attack.
So, unless I'm missing something here, both non-cached cases defeat the
meltdown attack, as you *cannot* get the data unless you do address
translation first (and therefore detect faults).
As for why this defeats the spectre attack, the logic is similar: you've
jumped into someone else's executable code, hoping to scoop up enough
information into your branch predictor before the fault kicks you out.
However, to capture anything about sensitive information in your
side-channels, the transient instructions need to actually get it into
the core before a fault gets detected. The same case analysis as above
applies, so you never actually get the sensitive info into the core
before a fault comes back and you get squashed.
[1]: A physically-indexed cache would be largely immune to this attack,
as you'd have to do address translation before doing a cache lookup.
I have some ideas that can build on this, but I'd like to get some
feedback first.
_______________________________________________
mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-security
<https://lists.freebsd.org/mailman/listinfo/freebsd-security>
To unsubscribe, send any mail to
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
<https://lists.freebsd.org/mailman/listinfo/freebsd-arch>
To unsubscribe, send any mail to

Warner Losh

2018-01-05 22:22:21 UTC

Permalink

While you might be right, I've seen no indication that a cache miss would
defeat these attacks in the public and non-public data I've looked at, even
though a large number of alternatives to the published workarounds have
been discussed. I'm therefore somewhat skeptical this would be effective.
I'm open, however, to data that changes that skepticism...

Warner

Post by Eric McCorkle
Right, but you have to get the value "foo" into the pipeline in order
for it to affect the side-channels. This technique attempts to stop
that from happening.
Unless I made a mistake, non-cached memory reads force address
translation to happen first, which detects faults and blocks the
meltdown attack.
It also stops spectre with very high probability, as it's very unlikely
that an uncached load will arrive before the speculative thread gets
squashed.

kernel.

Post by Warner Losh
Warner
Re-posting to -hackers and -arch. I'm going to start working on
something like this over the weekend.
-------- Forwarded Message --------
Subject: A more general possible meltdown/spectre countermeasure
Date: Thu, 4 Jan 2018 23:05:40 -0500
I've thought more about how to deal with meltdown/spectre, and I

have an

Post by Warner Losh
idea I'd like to put forward. However, I'm still in something of a
panic mode, so I'm not certain as to its effectiveness. Needless to
say, I welcome any feedback on this, and I may be completely

off-base.

Post by Warner Losh
I'm calling this a "countermeasure" as opposed to a "mitigation", as
it's something that requires modification of code as opposed to a
drop-in patch.
== Summary ==
Provide a kernel and userland API by which memory allocation can be

done

Post by Warner Losh
with extended attributes. In userland, this could be accomplished by
extending MMAP flags, and I could imagine a malloc-with-attributes

flag.

Post by Warner Losh
In kernel space, this must already exist, as drivers need to

allocate

Post by Warner Losh
memory with various MTRR-type attributes set.
The immediate aim here is to store sensitive information that must
remain memory-resident in non-cacheable memory locations (or, if more
effective attribute combinations exist, using those instead). See

the

Post by Warner Losh
rationale for the argument why this should work.
Assuming the rationale holds, then the attack surface should be

greatly

Post by Warner Losh
reduced. Attackers would need to grab sensitive data out of stack
frames or similar locations if/when it gets copied there for faster

use.

Post by Warner Losh
Moreover, if this is done right, it could dovetail nicely into a
framework for storing and processing sensitive assets in more secure
hardware[0] (like smart cards, the FPGAs I posted earlier, or other
options).
The obvious downside is that you take a performance hit storing

things

Post by Warner Losh
in non-cacheable locations, especially if you plan on doing heavy
computation in that memory (say, encryption/decryption). However,

this

Post by Warner Losh
is almost certainly going to be less than the projected 30-50%
performance hit from other mitigations. Also, this technique should
work against spectre as well as meltdown (assuming the rationale

holds).

Post by Warner Losh
The second downside is that you have to modify code for this to work,
and you have to be careful not to keep copies of sensitive

information

Post by Warner Losh
around too long (this gets tricky in userland, where you might get
interrupted and switched out).
[0]: Full disclosure, enabling open hardware implementations of this
kind of thing is something of an agenda of mine.
== Rationale ==
(Again, I'm tired, rushed, and somewhat panicked so my logic could be
faulty at any point, so please point it out if it is)
The rationale for why this should work relies on assumptions about
out-of-order pipelines that cannot be guaranteed to hold, but are
extremely likely to be true.
As background, these attacks depend on out-of-order execution

performing

Post by Warner Losh
operations that end up affecting cache and branch-prediction state,
ultimately storing information about sensitive data in these
side-channels before the fault conditions are detected and acted

upon.

Post by Warner Losh
I'll borrow terminology from the paper, using "transient

instructions"

Post by Warner Losh
to refer to speculatively executed instructions that will eventually

Post by Warner Losh
cancelled by a fault.
These attacks depend entirely on transient instructions being able to
get sensitive information into the processor core and then perform

some

Post by Warner Losh
kind of instruction on them before the fault condition cancels them.
Therefore, anything that prevents them from doing this *should*

counter

Post by Warner Losh
the attack. If the actual sensitive data never makes it to the core
before the fault is detected, the dependent memory accesses/branches
never get executed and the data never makes it to the side-channels.
Another assumption here is that CPU architects are going to want to
squash faulted instructions ASAP and stop issuing along those
speculative branches, so as to reclaim execution units. So I'm

assuming

Post by Warner Losh
once a fault comes back from address translation, then transient
execution stops dead.
Now, break down the cases for whether the address containing

sensitive

Post by Warner Losh
data is in cache and TLB or not. (I'm assuming here that caches are
virtually-indexed, which enables cache lookups to bypass address
translation.)
* In cache, in TLB: You end up basically racing between the cache and
TLB, which will very likely end up detecting the fault before the

data

Post by Warner Losh
arrives, but at the very worst, you get one or two cycles of

transient

Post by Warner Losh
instruction execution before the fault.
* In cache, not in TLB: Virtually-indexed tagged means you get a

cache

Post by Warner Losh
lookup racing a page-table walk. The cache lookup beats the page

table

Post by Warner Losh
walk by potentially hundreds (maybe thousands) of cycles, giving you

Post by Warner Losh
bunch of transient instructions before a fault gets triggered. This

Post by Warner Losh
the main attack case.
* Not in cache, in TLB: Memory access requires address translation,
which comes back almost immediately as a fault.
* Not in cache, not in TLB: You have to do a page table walk before

you

Post by Warner Losh
can fetch the location, as you have to go out to physical memory (and
therefore need a physical address). The page table walk will come

back

Post by Warner Losh
with a fault, stopping the attack.
So, unless I'm missing something here, both non-cached cases defeat

the

Post by Warner Losh
meltdown attack, as you *cannot* get the data unless you do address
translation first (and therefore detect faults).

you've

Post by Warner Losh
jumped into someone else's executable code, hoping to scoop up enough
information into your branch predictor before the fault kicks you

out.

Post by Warner Losh
However, to capture anything about sensitive information in your
side-channels, the transient instructions need to actually get it

into

Post by Warner Losh
the core before a fault gets detected. The same case analysis as

above

Post by Warner Losh
applies, so you never actually get the sensitive info into the core
before a fault comes back and you get squashed.
[1]: A physically-indexed cache would be largely immune to this

attack,

Post by Warner Losh
as you'd have to do address translation before doing a cache lookup.
I have some ideas that can build on this, but I'd like to get some
feedback first.
_______________________________________________
mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-security
<https://lists.freebsd.org/mailman/listinfo/freebsd-security>
To unsubscribe, send any mail to
_______________________________________________

list

Post by Warner Losh
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
<https://lists.freebsd.org/mailman/listinfo/freebsd-arch>
To unsubscribe, send any mail to

Eric McCorkle

2018-01-05 22:31:54 UTC

Permalink

Well, the only way to find out would be to try it out.

However, unless I'm missing something, if you're trying to pull a
meltdown attack, you try and fetch from the kernel. If that location
isn't cached (or if your cache is physically indexed), you need the
physical address (otherwise you don't know where to look), and thus have
to go through address translation, at which point you detect that the
page isn't accessible and fault. In the mean time, you can't
speculatively execute any of the operations that load up the
side-channels, because you don't have the sensitive data.

The reason you can pull off a meltdown attack at all is that a
virtually-indexed cache lets you get the data in parallel with address
translation (breaking the dependency between address translation and
fetching data), which takes 1000s of cycles for a TLB miss, during which
you have the data and can launch a whole bunch of transient ops.

Again, these are uncharted waters we're in; so it's entirely possible
I'm missing something here.

I have an

Post by Warner Losh
idea I'd like to put forward. However, I'm still in something

of a

Post by Warner Losh
panic mode, so I'm not certain as to its effectiveness.

Needless to

Post by Warner Losh
say, I welcome any feedback on this, and I may be completely

off-base.

Post by Warner Losh
I'm calling this a "countermeasure" as opposed to a

"mitigation", as

Post by Warner Losh
it's something that requires modification of code as opposed to a
drop-in patch.
== Summary ==
Provide a kernel and userland API by which memory allocation

can be done

Post by Warner Losh
with extended attributes. In userland, this could be

accomplished by

Post by Warner Losh
extending MMAP flags, and I could imagine a

malloc-with-attributes flag.

Post by Warner Losh
In kernel space, this must already exist, as drivers need to

allocate

Post by Warner Losh
memory with various MTRR-type attributes set.
The immediate aim here is to store sensitive information that must
remain memory-resident in non-cacheable memory locations (or,

if more

Post by Warner Losh
effective attribute combinations exist, using those instead).

See the

Post by Warner Losh
rationale for the argument why this should work.
Assuming the rationale holds, then the attack surface should

be greatly

Post by Warner Losh
reduced. Attackers would need to grab sensitive data out of stack
frames or similar locations if/when it gets copied there for

faster use.

Post by Warner Losh
Moreover, if this is done right, it could dovetail nicely into a
framework for storing and processing sensitive assets in more

secure

Post by Warner Losh
hardware[0] (like smart cards, the FPGAs I posted earlier, or

other

Post by Warner Losh
options).
The obvious downside is that you take a performance hit

storing things

Post by Warner Losh
in non-cacheable locations, especially if you plan on doing heavy
computation in that memory (say, encryption/decryption).

However, this

Post by Warner Losh
is almost certainly going to be less than the projected 30-50%
performance hit from other mitigations. Also, this technique

should

Post by Warner Losh
work against spectre as well as meltdown (assuming the

rationale holds).

Post by Warner Losh
The second downside is that you have to modify code for this

to work,

Post by Warner Losh
and you have to be careful not to keep copies of sensitive

information

Post by Warner Losh
around too long (this gets tricky in userland, where you might get
interrupted and switched out).
[0]: Full disclosure, enabling open hardware implementations

of this

Post by Warner Losh
kind of thing is something of an agenda of mine.
== Rationale ==
(Again, I'm tired, rushed, and somewhat panicked so my logic

could be

Post by Warner Losh
faulty at any point, so please point it out if it is)
The rationale for why this should work relies on assumptions about
out-of-order pipelines that cannot be guaranteed to hold, but are
extremely likely to be true.
As background, these attacks depend on out-of-order execution

performing

Post by Warner Losh
operations that end up affecting cache and branch-prediction

state,

Post by Warner Losh
ultimately storing information about sensitive data in these
side-channels before the fault conditions are detected and

acted upon.

Post by Warner Losh
I'll borrow terminology from the paper, using "transient

instructions"

Post by Warner Losh
to refer to speculatively executed instructions that will

eventually be

Post by Warner Losh
cancelled by a fault.
These attacks depend entirely on transient instructions being

able to

Post by Warner Losh
get sensitive information into the processor core and then

perform some

Post by Warner Losh
kind of instruction on them before the fault condition cancels

them.

Post by Warner Losh
Therefore, anything that prevents them from doing this

*should* counter

Post by Warner Losh
the attack. If the actual sensitive data never makes it to

the core

Post by Warner Losh
before the fault is detected, the dependent memory

accesses/branches

Post by Warner Losh
never get executed and the data never makes it to the

side-channels.

Post by Warner Losh
Another assumption here is that CPU architects are going to

want to

Post by Warner Losh
squash faulted instructions ASAP and stop issuing along those
speculative branches, so as to reclaim execution units. So

I'm assuming

Post by Warner Losh
once a fault comes back from address translation, then transient
execution stops dead.
Now, break down the cases for whether the address containing

sensitive

Post by Warner Losh
data is in cache and TLB or not. (I'm assuming here that

caches are

Post by Warner Losh
virtually-indexed, which enables cache lookups to bypass address
translation.)
* In cache, in TLB: You end up basically racing between the

cache and

Post by Warner Losh
TLB, which will very likely end up detecting the fault before

the data

Post by Warner Losh
arrives, but at the very worst, you get one or two cycles of

transient

Post by Warner Losh
instruction execution before the fault.
* In cache, not in TLB: Virtually-indexed tagged means you get

a cache

Post by Warner Losh
lookup racing a page-table walk. The cache lookup beats the

page table

Post by Warner Losh
walk by potentially hundreds (maybe thousands) of cycles,

giving you a

Post by Warner Losh
bunch of transient instructions before a fault gets

triggered. This is

Post by Warner Losh
the main attack case.
* Not in cache, in TLB: Memory access requires address

translation,

Post by Warner Losh
which comes back almost immediately as a fault.
* Not in cache, not in TLB: You have to do a page table walk

before you

Post by Warner Losh
can fetch the location, as you have to go out to physical

memory (and

Post by Warner Losh
therefore need a physical address). The page table walk will

come back

Post by Warner Losh
with a fault, stopping the attack.
So, unless I'm missing something here, both non-cached cases

defeat the

Post by Warner Losh
meltdown attack, as you *cannot* get the data unless you do

address

Post by Warner Losh
translation first (and therefore detect faults).
As for why this defeats the spectre attack, the logic is

similar: you've

Post by Warner Losh
jumped into someone else's executable code, hoping to scoop up

enough

Post by Warner Losh
information into your branch predictor before the fault kicks

you out.

Post by Warner Losh
However, to capture anything about sensitive information in your
side-channels, the transient instructions need to actually get

it into

Post by Warner Losh
the core before a fault gets detected. The same case analysis

as above

Post by Warner Losh
applies, so you never actually get the sensitive info into the

core

Post by Warner Losh
before a fault comes back and you get squashed.
[1]: A physically-indexed cache would be largely immune to

this attack,

Post by Warner Losh
as you'd have to do address translation before doing a cache

lookup.

Post by Warner Losh
I have some ideas that can build on this, but I'd like to get some
feedback first.
_______________________________________________
mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-security

<https://lists.freebsd.org/mailman/listinfo/freebsd-security>

Post by Warner Losh
<https://lists.freebsd.org/mailman/listinfo/freebsd-security

<https://lists.freebsd.org/mailman/listinfo/freebsd-security>>

Post by Warner Losh
To unsubscribe, send any mail to
_______________________________________________

mailing list

Post by Warner Losh
https://lists.freebsd.org/mailman/listinfo/freebsd-arch

<https://lists.freebsd.org/mailman/listinfo/freebsd-arch>

Post by Warner Losh
<https://lists.freebsd.org/mailman/listinfo/freebsd-arch

<https://lists.freebsd.org/mailman/listinfo/freebsd-arch>>

Post by Warner Losh
To unsubscribe, send any mail to

Warner Losh

2018-01-05 23:08:19 UTC

Permalink

Wouldn't you have to also unmap it from the direct map for this to be
effective?

Warner

Post by Eric McCorkle
Well, the only way to find out would be to try it out.
However, unless I'm missing something, if you're trying to pull a
meltdown attack, you try and fetch from the kernel. If that location
isn't cached (or if your cache is physically indexed), you need the
physical address (otherwise you don't know where to look), and thus have
to go through address translation, at which point you detect that the
page isn't accessible and fault. In the mean time, you can't
speculatively execute any of the operations that load up the
side-channels, because you don't have the sensitive data.
The reason you can pull off a meltdown attack at all is that a
virtually-indexed cache lets you get the data in parallel with address
translation (breaking the dependency between address translation and
fetching data), which takes 1000s of cycles for a TLB miss, during which
you have the data and can launch a whole bunch of transient ops.
Again, these are uncharted waters we're in; so it's entirely possible
I'm missing something here.

unlikely

Post by Warner Losh
that an uncached load will arrive before the speculative thread gets
squashed.

Post by Warner Losh
I think this is fatally flawed.
The side channel is the cache. Not the data at risk.
Any mapped memory, cached or not, can be used to influence the

cache.