Dtrace equivalent for Linux only requires a PhD.
Well if you read Linux on POWER versus Solaris 10, Part 1:A technical comparison. You get the idea that making what dtrace does is easy to accomplish in Linux, well if you dig a little deeper into how the Linux version works you are in for some surprises.
There are a number of powerful technologies available for Linux on POWER that provide some, if not all, of the features provided by DTrace. In the following, we provide a brief introduction to each of those tools.
dtrace -l | grep mmap
195 syscall mmap entry
196 syscall mmap return
375 syscall mmap64 entry
376 syscall mmap64 return
5225 fbt genunix smmap_common entry
5226 fbt genunix smmap_common return
9395 fbt genunix smmaplf32 entry
9396 fbt genunix smmaplf32 return
12543 fbt genunix smmap32 entry
12544 fbt genunix smmap32 return
12545 fbt genunix smmap64 entry
12546 fbt genunix smmap64 return
13902 fbt genunix cdev_mmap entry
13903 fbt genunix cdev_mmap return
14146 fbt genunix ddi_mmap_get_model entry
14147 fbt genunix ddi_mmap_get_model return
24837 fbt cgsix cg6_mmap entry
24838 fbt cgsix cg6_mmap return
28247 fbt mm mmmmap entry
28248 fbt mm mmmmap return
Okay now that we have the name of the probe, lets write a simple script that says “here I am” everytime a mmap gets called.
syscall::mmap:entry
{
printf("here i am");
}
Dtrace is just some C dialect mixed with some awk, pretty easy no complex kernel coding so lets run it.
# dtrace -s test.d
dtrace: script 'test.d' matched 1 probe
CPU ID FUNCTION:NAME
0 195 mmap:entry here i am
^C
That is it. Were done, of course we could do a lot more and get more information, with just another line of code, but I’ll save that for another day dtrace comes with 30,000 probes on a basic install, if you want to watch the kernel most likely the probe is ready and waiting.
Now lets look at the KProbe solution.
First we have to install a patch on our kernel, well lets hope that we did this before the box went into production. And that our 3rd party software creator doesn’t have a problem with this, will IBM’s db2 customer support people be okay with having this patch in the kernel, we can only hope and pray.
$tar -xvzf kprobes-2.6.8-rc1.tar.gz $cd /usr/src/linux-2.6.8-rc1 $patch -p1 < ../kprobes-2.6.8-rc1-base.patch
Okay next step,
For each probe, you will need to allocate the structure struct kprobe kp; (see include/linux/kprobes.h for more information on this
Hmm so we need to write 3 functions in what looks its using some pretty deep C voodoo there. Do your production servers have compile tools installed?
/* pre_handler: this is called just before the probed instruction is * executed. */ int handler_pre(struct kprobe *p, struct pt_regs *regs) { printk("pre_handler: p->addr=0x%p, eflags=0x%lx\n",p->addr, regs->eflags); return 0; } /* post_handler: this is called after the probed instruction is executed * (provided no exception is generated). */ void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { printk("post_handler: p->addr=0x%p, eflags=0x%lx \n", p->addr, regs->eflags); } /* fault_handler: this is called if an exception is generated for any * instruction within the fault-handler, or when Kprobes * single-steps the probed instruction. */ int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) { printk("fault_handler:p->addr=0x%p, eflags=0x%lx\n", p->addr, regs->eflags); return 0; }
Excited yet?
Next step we have to specify the kernel routine address okay we have 4 choices lets look use the first, it seems the easiest.
sys_mmap
0000000000425880 t __pci_mmap_make_offset_bus
0000000000425a60 t __pci_mmap_make_offset
0000000000425ba0 t __pci_mmap_set_flags
0000000000425bc0 t __pci_mmap_set_pgprot
0000000000425be0 T pci_mmap_page_range
000000000042d960 T sys32_mmap
000000000042da00 T sys32_mmap2
00000000004402c0 T sunos_mmap
0000000000459ca0 T do_mmap_pgoff
000000000045ada0 T build_mmap_rb
000000000045ae00 T exit_mmap
000000000045e1e0 T generic_file_mmap
000000000046a0a0 t shmem_mmap
0000000000476740 t exec_mmap
00000000004c65a0 t nfs_file_mmap
000000000051d580 t shm_mmap
00000000005259a0 t mmap_mem
0000000000525fe0 t mmap_zero
0000000000526260 t mmap_kmem
00000000005b9720 t proc_bus_pci_mmap
00000000005cc000 t fb_mmap
00000000005d61e0 t sbusfb_mmapsize
00000000005d6220 t sbusfb_mmap
00000000005da4a0 t atyfb_mmap
00000000005fd080 t sock_mmap
00000000005ffc40 T sock_no_mmap
0000000000661a20 t packet_mmap
0000000000718db8 d ffb_mmap_map
0000000000719168 d cg6_mmap_map
000000000071f7a0 d packet_mmap_ops
000000000072a4f0 R __ksymtab_do_mmap_pgoff
000000000072adb0 R __ksymtab_generic_file_mmap
000000000072f3f0 R __ksymtab_sock_no_mmap
0000000000731f60 R __kstrtab_do_mmap_pgoff
0000000000732e10 R __kstrtab_generic_file_mmap
000000000073b2e8 R __kstrtab_sock_no_mmap
phoenix:/boot#
okay we now have address looks like its
000000000042d960 T sys32_mmap
Well now its time to write some more code,.
/* specify pre_handler address */ kp.pre_handler=handler_pre; /* specify post_handler address */ kp.post_handler=handler_post; /* specify fault_handler address */ kp.fault_handler=handler_fault; /* specify the address/offset where you want to insert probe. * You can get the address using one of the methods described above. */ kp.addr = (kprobe_opcode_t *) kallsyms_lookup_name("do_fork"); /* check if the kallsyms_lookup_name() returned the correct value. */ if (kp.add == NULL) { printk("kallsyms_lookup_name could not find address for the specified symbol name\n"); return 1; } /* or specify address directly. * $grep "do_fork" /usr/src/linux/System.map * or * $cat /proc/kallsyms |grep do_fork * or * $nm vmlinuz |grep do_fork */ kp.addr = (kprobe_opcode_t *) 0xc01441d0; /* All set to register with Kprobes */ register_kprobe(&kp);
no were not done yet next step is have to add printf code into the kernel function.
You can insert printk's at the beginning of a routine or at any offset in the function (the offset must be at the instruction boundary). The following code samples show how to calculate the offset. First, disassemble the machine instructions from the object file and save them as a file:
$objdump -D /usr/src/linux/kernel/fork.o > fork.dis
Well that produces a nice object dump you know assembly language right?
Now we have yet more details to look into,
To insert the probe at offset 0x22c4, get the relative offset from the beginning of the routine 0x22c4 - 0x22b0 = 0x14 and then add the offset to the address of do_fork 0xc01441d0 + 0x14. (To ascertain the address of do_fork, run $cat /proc/kallsyms | grep do_fork.)
You can also add the relative offset of do_fork 0x22c4 - 0x22b0 = 0x14 to the output of kallsyms_lookup_name("do_fork"); Thus: 0x14 + kallsyms_lookup_name("do_fork");
okay now that we have done all that we are able to start our new probe.
We already compiled in support for the SysRq key. Enable it with:
$echo 1 > /proc/sys/kernel/sysrq
Now you can use Alt+SysRq+W to view all inserted kernel probes on the console, or in /var/log/messages.
Well the dtrace example and the introduction text to this document ended at on page 2, In the word processor I’m using I just crossed page 6, KProbes are simple right? In case you are wondering I was pasting the examples from the document because I’m not a kernel hacker, and I don’t have 4 hours to learn enough of there code. For those of you say I picked an overly simple example for dtrace, lets do there example in dtrace so you can compare.
dtrace -l | grep fork | grep syscall
9 syscall forkall entry
10 syscall forkall return
203 syscall vfork entry
204 syscall vfork return
245 syscall fork1 entry
246 syscall fork1 return
#
We have the list of probes It should be fork1 so lets write some code.
# cat test2.d
syscall::fork1:entry
{
printf("tid: %d, pid: %d, execname: %s\n", tid, pid, execname);
}
Okay here is the output of the above code, gives you even more info than the KProbes vesion.
dtrace -s test2.d
dtrace: script 'test2.d' matched 1 probe
CPU ID FUNCTION:NAME
0 245 fork1:entry tid: 1, pid: 1998, execname: sh
1 245 fork1:entry tid: 1, pid: 1998, execname: sh
Well to wrap things up, apparently KProbes is made for the kernel programmer, and dtrace is made for the Sys-admin, most sys-admin know a little C code, and sed and awk, so we have to simple dtrace scripts, and we have 5 pages of fairly complex C kernel code, to do less. Which do you want to use in production?
For more blogs related to Linux,
dtrace, KProbes, Solaris.











8 Comments:
It looks to me as if enabling KProbe usage requires more effort than adding a statically defined D probe to your driver or kernel module.
Why would a sane person bother?
Oh, I know, to "prove" that because you can do it on Linux, therefore Linux is just as good as Solaris.
Bzzzzt.
This comment has been removed by a blog administrator.
This comment has been removed by a blog administrator.
You should look at Systemtap of you are looking for a true dtrace equivalent - http://sourceware.org/systemtap
performanceguru: I may have misunderstood what I read at http://sourceware.org/systemtap/runtime/start_page.html but to say that that's a dtrace equivalent implies to me that you haven't seen dtrace.
well i have taken a look at systemtap, see my comments at
http://uadmin.blogspot.com/2005/08/systemtap-alpha.html
also it would be nice to see the output of even a simple script running on a live system done using systemtap along with its script that generated the code.
Just how hard would it be to port dTrace to Linux? Is this a licensing issue or a technical one?
its not that difficult to port dtrace to linux; i have a near working port (see www.crisp.demon.co.uk/tools.html for the source code).
its a part time thing for me, and after about 6 weeks - its coming together. see my blog on the web site for odd progress tips.
Post a Comment
<< Home