Discussion:
Allow faster eventhandler dispatching by keeping pointers to handler lists.
Ian Lepore
2017-10-29 02:19:27 UTC
Permalink
There has been some talk lately of the kernel eventhandler mechanism
being inefficient due to holding a lock while walking a global list
doing strcmp() to find the right list of handlers.  I've posted a
phabricator review to alleviate that by allowing high-frequency events
to pre-define the event list and keep a pointer to it, to avoid the
name lookups.

https://reviews.freebsd.org/D12821

-- Ian
Matt Joras
2017-10-29 02:37:35 UTC
Permalink
Post by Ian Lepore
There has been some talk lately of the kernel eventhandler mechanism
being inefficient due to holding a lock while walking a global list
doing strcmp() to find the right list of handlers.  I've posted a
phabricator review to alleviate that by allowing high-frequency events
to pre-define the event list and keep a pointer to it, to avoid the
name lookups.
https://reviews.freebsd.org/D12821
-- Ian
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Hah, as it happens I just posted a revision with largely the same
intention, though I approached the problem a bit differently:
https://reviews.freebsd.org/D12814.

Matt
Ian Lepore
2017-10-29 16:12:32 UTC
Permalink
Post by Matt Joras
Post by Ian Lepore
There has been some talk lately of the kernel eventhandler
mechanism
being inefficient due to holding a lock while walking a global list
doing strcmp() to find the right list of handlers.  I've posted a
phabricator review to alleviate that by allowing high-frequency events
to pre-define the event list and keep a pointer to it, to avoid the
name lookups.
https://reviews.freebsd.org/D12821
-- Ian
 
Hah, as it happens I just posted a revision with largely the same
https://reviews.freebsd.org/D12814.
Matt
Actually, it looks like we've done nearly identical work, modulo some
minor naming differences (that I'm completely agnostic about), and you
have the EHL_NONEMPTY flag that didn't occur to me until after I
created the phab review last night.

The other big difference is that mine still links the fast/static lists
into the global list-of-lists, to preserve what I call "late loose
binding" where an event handler can register for events before the
event provider is present (picture a module that monitors events which
gets loaded before a module that produces those events).

It ocurred to me this morning that an optimization for mine would be to
place any list created by eventhandler_create_list() at the end of the
global list, on the theory that it will mostly be accessed directly via
pointer and the items at the head of the global list should be the ones
more likely to be searched by name.

-- Ian
Ian Lepore
2017-10-29 16:24:07 UTC
Permalink
Post by Ian Lepore
Post by Matt Joras
Post by Ian Lepore
There has been some talk lately of the kernel eventhandler
mechanism
being inefficient due to holding a lock while walking a global list
doing strcmp() to find the right list of handlers.  I've posted a
phabricator review to alleviate that by allowing high-frequency events
to pre-define the event list and keep a pointer to it, to avoid the
name lookups.
https://reviews.freebsd.org/D12821
-- Ian
 
Hah, as it happens I just posted a revision with largely the same
https://reviews.freebsd.org/D12814.
Matt
Actually, it looks like we've done nearly identical work, modulo some
minor naming differences (that I'm completely agnostic about), and you
have the EHL_NONEMPTY flag that didn't occur to me until after I
created the phab review last night.
The other big difference is that mine still links the fast/static lists
into the global list-of-lists, to preserve what I call "late loose
binding" where an event handler can register for events before the
event provider is present (picture a module that monitors events which
gets loaded before a module that produces those events).
It ocurred to me this morning that an optimization for mine would be to
place any list created by eventhandler_create_list() at the end of the
global list, on the theory that it will mostly be accessed directly via
pointer and the items at the head of the global list should be the ones
more likely to be searched by name.
-- Ian
Oops, I apparently overlooked _eventhandler_insert_static() in your
changes.  I think if that were changed to first check whether the new
handler list already exists on the global list, our changes really
would be essentially identical.

-- Ian
Matt Joras
2017-10-29 16:38:48 UTC
Permalink
Post by Ian Lepore
Post by Ian Lepore
Actually, it looks like we've done nearly identical work, modulo some
minor naming differences (that I'm completely agnostic about), and you
have the EHL_NONEMPTY flag that didn't occur to me until after I
created the phab review last night.
The other big difference is that mine still links the fast/static lists
into the global list-of-lists, to preserve what I call "late loose
binding" where an event handler can register for events before the
event provider is present (picture a module that monitors events which
gets loaded before a module that produces those events).
It ocurred to me this morning that an optimization for mine would be to
place any list created by eventhandler_create_list() at the end of the
global list, on the theory that it will mostly be accessed directly via
pointer and the items at the head of the global list should be the ones
more likely to be searched by name.
-- Ian
Oops, I apparently overlooked _eventhandler_insert_static() in your
changes.  I think if that were changed to first check whether the new
handler list already exists on the global list, our changes really
would be essentially identical.
-- Ian
Indeed. The other difference I noted is that my version
statically-allocates the actual list structs, and relies on static
initialization, whereas yours uses malloc and initializes them explicitly.

How would you like to proceed?

Matt
Ian Lepore
2017-10-29 17:03:44 UTC
Permalink
Post by Matt Joras
Post by Ian Lepore
Post by Ian Lepore
Actually, it looks like we've done nearly identical work, modulo some
minor naming differences (that I'm completely agnostic about), and you
have the EHL_NONEMPTY flag that didn't occur to me until after I
created the phab review last night.
The other big difference is that mine still links the fast/static lists
into the global list-of-lists, to preserve what I call "late loose
binding" where an event handler can register for events before the
event provider is present (picture a module that monitors events which
gets loaded before a module that produces those events).
It ocurred to me this morning that an optimization for mine would be to
place any list created by eventhandler_create_list() at the end of the
global list, on the theory that it will mostly be accessed directly via
pointer and the items at the head of the global list should be the ones
more likely to be searched by name.
-- Ian
Oops, I apparently overlooked _eventhandler_insert_static() in your
changes.  I think if that were changed to first check whether the new
handler list already exists on the global list, our changes really
would be essentially identical.
-- Ian
Indeed. The other difference I noted is that my version
statically-allocates the actual list structs, and relies on static
initialization, whereas yours uses malloc and initializes them explicitly.
How would you like to proceed?
Matt
Oh.  Right.  Hmmm, I think malloc() is required to support the case
where a handler registers before the static list init is invoked, and I
do think that's a useful feature to preserve.  That means the lists
aren't really static, though, which then makes STATIC a bit out of
place in the new function/macro names.

From my version of the changes, I really liked then names
EVENTHANDLER_DEFINE_LIST() and EVENTHANDLER_DECLARE_LIST() because they
so exactly say what they do.  On the other hand, I hate
EVENTHANDLER_INVOKE_LIST(), I just couldn't think of anything
better.  I considered FAST where LIST appears, playing off the existing
'slow' comments, but it seems to somehow imply the handlers themselves
are different or faster, which is all wrong.

-- Ian
Hans Petter Selasky
2017-10-30 08:17:30 UTC
Permalink
Post by Ian Lepore
Oh.  Right.  Hmmm, I think malloc() is required to support the case
where a handler registers before the static list init is invoked, and I
do think that's a useful feature to preserve.  That means the lists
aren't really static, though, which then makes STATIC a bit out of
place in the new function/macro names.
Why not use RCU here, and then update sys/queue.h to be RCU safe?

--HPS
Ian Lepore
2017-10-30 16:05:03 UTC
Permalink
Post by Hans Petter Selasky
Post by Ian Lepore
Oh.  Right.  Hmmm, I think malloc() is required to support the case
where a handler registers before the static list init is invoked, and I
do think that's a useful feature to preserve.  That means the lists
aren't really static, though, which then makes STATIC a bit out of
place in the new function/macro names.
Why not use RCU here, and then update sys/queue.h to be RCU safe?
--HPS
I'm not sure how that suggestion relates to that paragraph, but the
code already does use a form of RCU to handle deletion of registered
handlers from individual event lists.  It could probably benefit from a
rewrite to more fully embrace RCU and use it for insertions as well,
but that's beyond the scope of these changes, which are trying to
eliminate the worst uses of a global lock at a higher level than
maintaining the list for a single event.

-- Ian
Matt Joras
2017-10-30 16:38:49 UTC
Permalink
Post by Ian Lepore
Post by Hans Petter Selasky
Post by Ian Lepore
Oh.  Right.  Hmmm, I think malloc() is required to support the case
where a handler registers before the static list init is invoked, and I
do think that's a useful feature to preserve.  That means the lists
aren't really static, though, which then makes STATIC a bit out of
place in the new function/macro names.
Why not use RCU here, and then update sys/queue.h to be RCU safe?
--HPS
I'm not sure how that suggestion relates to that paragraph, but the
code already does use a form of RCU to handle deletion of registered
handlers from individual event lists.  It could probably benefit from a
rewrite to more fully embrace RCU and use it for insertions as well,
but that's beyond the scope of these changes, which are trying to
eliminate the worst uses of a global lock at a higher level than
maintaining the list for a single event.
-- Ian
Agreed. There are all sorts of clever things we can do with a more
comprehensive rewrite. It would be a nice place to try out RCU-ish
methods of handling object removal without locks.

Matt
Mateusz Guzik
2017-10-31 10:05:13 UTC
Permalink
Post by Matt Joras
Post by Ian Lepore
Post by Hans Petter Selasky
Oh. Right. Hmmm, I think malloc() is required to support the case
where a handler registers before the static list init is invoked,
and I
Post by Matt Joras
Post by Ian Lepore
Post by Hans Petter Selasky
do think that's a useful feature to preserve. That means the lists
aren't really static, though, which then makes STATIC a bit out of
place in the new function/macro names.
Why not use RCU here, and then update sys/queue.h to be RCU safe?
--HPS
I'm not sure how that suggestion relates to that paragraph, but the
code already does use a form of RCU to handle deletion of registered
handlers from individual event lists. It could probably benefit from a
rewrite to more fully embrace RCU and use it for insertions as well,
but that's beyond the scope of these changes, which are trying to
eliminate the worst uses of a global lock at a higher level than
maintaining the list for a single event.
-- Ian
Agreed. There are all sorts of clever things we can do with a more
comprehensive rewrite. It would be a nice place to try out RCU-ish
methods of handling object removal without locks.
The current EH code delightet us long enough. It has to be dropped.
delighted even
I don't think RCU (which we don't have anyway, apart from a rather poor
substitute) is suitable for use here.
First off, vast majority of all lists are never going away and most
callbacks are not going away either.
The gist is that in the worst case we can use rmlocks, which boil down
to per-cpu locks. Employing rcu would require refing and unrefing which
slows things down due to cacheline bounces.
1. always-existing list + never-removed callbacks
Note that here callbacks can be *added*, but never removed. If
implemented as a list it is really trivial with what hps referred to as
"rcu safe" macros - you set everything up and update the ->next pointer
last. Then the list is always safe to traverse with the worst case being
you finished before the addition of the next element completed. But you
never see half-constructed anything.
2. always-existing list + removable callbacks
They can be stored *separately* to never-removed callbacks. Here is
where rmlocks come to play. An important detail is that the current eh
mechanism allows deregeistration to be performed while *within* a
callback. I think that's a rather weird-ass feature which needs to be
dropped from the replacement. I don't see any added value and I see
*loss* in that things have to get refed/unrefed and stuff relocked.
3. lists which can disappear
Once more rmlocks.
I can't stress enough that writing to shared areas is a major performance
problem on multicore systems. Code which does it for no good reason is
an excellent candidate for a revamp.

I don't see how said writing can be avoided while providing the current EH
guarantee that things can be deregistered from the callback.

One more reason to create a *new* mechanism from scratch and let the
old code die off.
--
Mateusz Guzik <mjguzik gmail.com>
Loading...