tools/perf/Documentation/perf-bench.txt

   1 perf-bench(1)
   2 =============
   3
   4 NAME
   5 ----
   6 perf-bench - General framework for benchmark suites
   7
   8 SYNOPSIS
   9 --------
  10 [verse]
  11 'perf bench' [<common options>] <subsystem> <suite> [<options>]
  12
  13 DESCRIPTION
  14 -----------
  15 This 'perf bench' command is a general framework for benchmark suites.
  16
  17 COMMON OPTIONS
  18 --------------
  19 -r::
  20 --repeat=::
  21 Specify number of times to repeat the run (default 10).
  22
  23 -f::
  24 --format=::
  25 Specify format style.
  26 Current available format styles are:
  27
  28 'default'::
  29 Default style. This is mainly for human reading.
  30 ---------------------
  31 % perf bench sched pipe                      # with no style specified
  32 (executing 1000000 pipe operations between two tasks)
  33         Total time:5.855 sec
  34                 5.855061 usecs/op
  35                 170792 ops/sec
  36 ---------------------
  37
  38 'simple'::
  39 This simple style is friendly for automated
  40 processing by scripts.
  41 ---------------------
  42 % perf bench --format=simple sched pipe      # specified simple
  43 5.988
  44 ---------------------
  45
  46 SUBSYSTEM
  47 ---------
  48
  49 'sched'::
  50         Scheduler and IPC mechanisms.
  51
  52 'syscall'::
  53         System call performance (throughput).
  54
  55 'mem'::
  56         Memory access performance.
  57
  58 'numa'::
  59         NUMA scheduling and MM benchmarks.
  60
  61 'futex'::
  62         Futex stressing benchmarks.
  63
  64 'epoll'::
  65         Eventpoll (epoll) stressing benchmarks.
  66
  67 'internals'::
  68         Benchmark internal perf functionality.
  69
  70 'uprobe'::
  71         Benchmark overhead of uprobe + BPF.
  72
  73 'all'::
  74         All benchmark subsystems.
  75
  76 SUITES FOR 'sched'
  77 ~~~~~~~~~~~~~~~~~~
  78 *messaging*::
  79 Suite for evaluating performance of scheduler and IPC mechanisms.
  80 Based on hackbench by Rusty Russell.
  81
  82 Options of *messaging*
  83 ^^^^^^^^^^^^^^^^^^^^^^
  84 -p::
  85 --pipe::
  86 Use pipe() instead of socketpair()
  87
  88 -t::
  89 --thread::
  90 Be multi thread instead of multi process
  91
  92 -g::
  93 --group=::
  94 Specify number of groups
  95
  96 -l::
  97 --nr_loops=::
  98 Specify number of loops
  99
 100 Example of *messaging*
 101 ^^^^^^^^^^^^^^^^^^^^^^
 102
 103 ---------------------
 104 % perf bench sched messaging                 # run with default
 105 options (20 sender and receiver processes per group)
 106 (10 groups == 400 processes run)
 107
 108       Total time:0.308 sec
 109
 110 % perf bench sched messaging -t -g 20        # be multi-thread, with 20 groups
 111 (20 sender and receiver threads per group)
 112 (20 groups == 800 threads run)
 113
 114       Total time:0.582 sec
 115 ---------------------
 116
 117 *pipe*::
 118 Suite for pipe() system call.
 119 Based on pipe-test-1m.c by Ingo Molnar.
 120
 121 Options of *pipe*
 122 ^^^^^^^^^^^^^^^^^
 123 -l::
 124 --loop=::
 125 Specify number of loops.
 126
 127 -G::
 128 --cgroups=::
 129 Names of cgroups for sender and receiver, separated by a comma.
 130 This is useful to check cgroup context switching overhead.
 131 Note that perf doesn't create nor delete the cgroups, so users should
 132 make sure that the cgroups exist and are accessible before use.
 133
 134
 135 Example of *pipe*
 136 ^^^^^^^^^^^^^^^^^
 137
 138 ---------------------
 139 % perf bench sched pipe
 140 (executing 1000000 pipe operations between two tasks)
 141
 142         Total time:8.091 sec
 143                 8.091833 usecs/op
 144                 123581 ops/sec
 145
 146 % perf bench sched pipe -l 1000              # loop 1000
 147 (executing 1000 pipe operations between two tasks)
 148
 149         Total time:0.016 sec
 150                 16.948000 usecs/op
 151                 59004 ops/sec
 152
 153 % perf bench sched pipe -G AAA,BBB
 154 (executing 1000000 pipe operations between cgroups)
 155 # Running 'sched/pipe' benchmark:
 156 # Executed 1000000 pipe operations between two processes
 157
 158      Total time: 6.886 [sec]
 159
 160        6.886208 usecs/op
 161          145217 ops/sec
 162
 163 ---------------------
 164
 165 SUITES FOR 'syscall'
 166 ~~~~~~~~~~~~~~~~~~
 167 *basic*::
 168 Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
 169 This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
 170 cached by glibc.
 171
 172
 173 SUITES FOR 'mem'
 174 ~~~~~~~~~~~~~~~~
 175 *memcpy*::
 176 Suite for evaluating performance of simple memory copy in various ways.
 177
 178 Options of *memcpy*
 179 ^^^^^^^^^^^^^^^^^^^
 180 -l::
 181 --size::
 182 Specify size of memory to copy (default: 1MB).
 183 Available units are B, KB, MB, GB and TB (case insensitive).
 184
 185 -f::
 186 --function::
 187 Specify function to copy (default: default).
 188 Available functions are depend on the architecture.
 189 On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
 190
 191 -l::
 192 --nr_loops::
 193 Repeat memcpy invocation this number of times.
 194
 195 -c::
 196 --cycles::
 197 Use perf's cpu-cycles event instead of gettimeofday syscall.
 198
 199 *memset*::
 200 Suite for evaluating performance of simple memory set in various ways.
 201
 202 Options of *memset*
 203 ^^^^^^^^^^^^^^^^^^^
 204 -l::
 205 --size::
 206 Specify size of memory to set (default: 1MB).
 207 Available units are B, KB, MB, GB and TB (case insensitive).
 208
 209 -f::
 210 --function::
 211 Specify function to set (default: default).
 212 Available functions are depend on the architecture.
 213 On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
 214
 215 -l::
 216 --nr_loops::
 217 Repeat memset invocation this number of times.
 218
 219 -c::
 220 --cycles::
 221 Use perf's cpu-cycles event instead of gettimeofday syscall.
 222
 223 SUITES FOR 'numa'
 224 ~~~~~~~~~~~~~~~~~
 225 *mem*::
 226 Suite for evaluating NUMA workloads.
 227
 228 SUITES FOR 'futex'
 229 ~~~~~~~~~~~~~~~~~~
 230 *hash*::
 231 Suite for evaluating hash tables.
 232
 233 *wake*::
 234 Suite for evaluating wake calls.
 235
 236 *wake-parallel*::
 237 Suite for evaluating parallel wake calls.
 238
 239 *requeue*::
 240 Suite for evaluating requeue calls.
 241
 242 *lock-pi*::
 243 Suite for evaluating futex lock_pi calls.
 244
 245 SUITES FOR 'epoll'
 246 ~~~~~~~~~~~~~~~~~~
 247 *wait*::
 248 Suite for evaluating concurrent epoll_wait calls.
 249
 250 *ctl*::
 251 Suite for evaluating multiple epoll_ctl calls.
 252
 253 SUITES FOR 'internals'
 254 ~~~~~~~~~~~~~~~~~~~~~~
 255 *synthesize*::
 256 Suite for evaluating perf's event synthesis performance.
 257
 258 SEE ALSO
 259 --------
 260 linkperf:perf[1]