All proceeds from Ad Clicks goes to the author of this site.

 

Friday, August 25, 2006

Jim Keniston posted the following idea on how to implement userland probes in Systemtap, Linux’s answer to DTrace, and the Systemtap mailing, List I have reformatted it and commented on it for those that don’t follow the Systemtap mailing list.

Here's where we stand on user-space probes (uprobes). The intent of uprobes is to unable application developers to create low-overhead, dynamic instrumentation for their apps, with uprobes-based instrumentation interoperating usefully, as needed, with kprobes-based instrumentation. Comments are welcome.

Recent History
--------------
Last spring, Prasanna Panchamukhi offered up a kernel-only approach, where instrumentation would be coded as a kernel module, a la kprobes. This performed well (e.g. 1 usec per probepoint hit on my Pentium M), but we got bad reviews on such things as the kernel-only approach and the per-executable tracing (e.g., hooking read_page(s)).

I tried an approach based on ptrace, with no kernel enhancements, but it lacked certain necessary features (e.g., #2-5 below), probe overhead was 12-15x worse than Prasanna's approach, and I couldn't get it to work when probing multiple processes. (Frank Eigler independently suggested this approach and termed it "Plan B from outer space.")

is 12-15x worse than the current solution used in strace?


While I was stumped trying to make Plan B work, Roland McGrath made utrace available to us. We looked this over as we found the time, and it looked promising.

There has been much debate within the kprobes teams about the proper programming model to support. Discussions at OLS didn't yield many new ideas, let alone consensus.

The Current Approach:

Overview
------------------------------
The approach we are now coding can be summarized as follows. (Okay,
it's not much like Plan B, but B+ sounds better than C.)

a. A system-call API that is an alternative to ptrace, provides better support for probepoints and return probes, and exploits all the process-lifetime events made accessible by utrace.

b. The "tracer" process detects events (e.g., probe hits) by polling rather than catching SIGCHLD signals.

c. Hooks to allow kernel-mode instrumentation to cooperate with user-mode "tracer" processes.

Here are the requirements we will satisfy with this approach.

0. Per-process (not per-executable) tracing.

1. Instrumentation can be coded entirely as a user-space app...

Sounds like a nightmare waiting to happen, if I want to trace something from userland into the kernel and back, I start writing userland code, then into kernel code, and quite possibly having kernel code access variables and statistics stored in userland, meaning lots of checks that the user remembers to call the routines that safely move data back and forth between the two?

How is this better than just enhancing a debugger such as gdb? how are stacks dealt with, since you quite possibly having one process investigate another, if you don't get everything perfect the program being watched can corrupt the data of the second?


2. ... but in situations where performance is critical, uprobes can run a named kernel handler without waking up the tracer process.

Now if we start out coding our script to only work in userland, then all of a sudden we decide we need better performance, we have to go back and recode parts to work in kernel land and quite possibly break our algorithms that were talking to kernel land, or probes in the kernel that accessed userland data that just moved back into the kernel?


3. A user-mode tracer can invoke a previously registered kernel-mode handler, so we have simple and efficient communication between user- and kernel-mode instrumentation.

How do you keep a userland program from exploiting systemtaps Architecture and executing kernel probes from other active Systemtap scripts, isn't this a huge back door for rootkits especially once people start using systemtaps methods for monitoring systems continuously?


4. Multiple tracer processes can trace the same tracee.

5. As needed, we can "pre-define" a set of useful kernel handlers.

6. Uprobes can be easily extended (exploiting utrace) to support notifying the tracer of non-probepoint events in the probee, such as signals and system calls.

7. The user API should be easier to use than the ptrace API.

8. Handlers run in process context -- the tracee's context (see requirement 2) or the tracer's context while the tracee is stopped (see requirement 3).

stack corruption or even slight stack placement differences, would
severely limit the usefulness of the solution, it will have the same
effect as debugging an app in gdb, the app only breaks when the
userland debugger is not running.


4 Comments:

Anonymous Anonymous said...

James, if you really care to argue about technical issues that you only partly seem to understand, take it up on the mailing list. Sniping here is cowardly.

10:08 AM  
Anonymous Anonymous said...

I see now that, at least this time, you also posted to the mailing list. Sorry.

10:22 AM  
Blogger jamesd_wi said...

I have a long history of posting to the mailing list.

I also have a rather long list of bugreports that I have generated some of them rather nasty to fix. But all quite deadly.

I may not be an expert, but I have coded scripts in both systemtap and dtrace.

11:59 AM  
Blogger Derek Crudgington said...

It's not cowardly.. it lets the real world know what a POS systemtap is. Thanks jamesd.

3:41 PM  

Post a Comment

<< Home