1 Coresight - HW Assisted Tracing on ARM
2 ======================================
4 Author: Mathieu Poirier <mathieu.poirier@linaro.org>
5 Date: September 11th, 2014
10 Coresight is an umbrella of technologies allowing for the debugging of ARM
11 based SoC. It includes solutions for JTAG and HW assisted tracing. This
12 document is concerned with the latter.
14 HW assisted tracing is becoming increasingly useful when dealing with systems
15 that have many SoCs and other components like GPU and DMA engines. ARM has
16 developed a HW assisted tracing solution by means of different components, each
17 being added to a design at synthesis time to cater to specific tracing needs.
18 Components are generally categorised as source, link and sinks and are
19 (usually) discovered using the AMBA bus.
21 "Sources" generate a compressed stream representing the processor instruction
22 path based on tracing scenarios as configured by users. From there the stream
23 flows through the coresight system (via ATB bus) using links that are connecting
24 the emanating source to a sink(s). Sinks serve as endpoints to the coresight
25 implementation, either storing the compressed stream in a memory buffer or
26 creating an interface to the outside world where data can be transferred to a
27 host without fear of filling up the onboard coresight memory buffer.
29 At typical coresight system would look like this:
31 *****************************************************************
32 **************************** AMBA AXI ****************************===||
33 ***************************************************************** ||
36 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
37 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
38 |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
39 | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
40 | # ETM # ::::: | # PTM # ::::: ::::: @ |
41 | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
42 | |->### | ! | |->### | ! | ! . | || DAP ||
43 | | # | ! | | # | ! | ! . | |||||||||
44 | | . | ! | | . | ! | ! . | | |
45 | | . | ! | | . | ! | ! . | | *
46 | | . | ! | | . | ! | ! . | | SWD/
47 | | . | ! | | . | ! | ! . | | JTAG
48 *****************************************************************<-|
49 *************************** AMBA Debug APB ************************
50 *****************************************************************
53 *****************************************************************
54 ******************** Cross Trigger Matrix (CTM) *******************
55 *****************************************************************
58 *****************************************************************
59 ****************** AMBA Advanced Trace Bus (ATB) ******************
60 *****************************************************************
62 | * ===== F =====<---------|
63 | ::::::::: ==== U ====
64 |-->:: CTI ::<!! === N ===
67 | ! &&&&&&&&& IIIIIII == L ==
68 |------>&& ETB &&<......II I =======
71 | ! I REP I<..........
73 | !!>&&&&&&&&& II I *Source: ARM ltd.
74 |------>& TPIU &<......II I DAP = Debug Access Port
75 &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
76 ; PTM = Program Trace Macrocell
77 ; CTI = Cross Trigger Interface
78 * ETB = Embedded Trace Buffer
79 To trace port TPIU= Trace Port Interface Unit
80 SWD = Serial Wire Debug
82 While on target configuration of the components is done via the APB bus,
83 all trace data are carried out-of-band on the ATB bus. The CTM provides
84 a way to aggregate and distribute signals between CoreSight components.
86 The coresight framework provides a central point to represent, configure and
87 manage coresight devices on a platform. This first implementation centers on
88 the basic tracing functionality, enabling components such ETM/PTM, funnel,
89 replicator, TMC, TPIU and ETB. Future work will enable more
90 intricate IP blocks such as STM and CTI.
93 Acronyms and Classification
94 ---------------------------
98 PTM: Program Trace Macrocell
99 ETM: Embedded Trace Macrocell
100 STM: System trace Macrocell
101 ETB: Embedded Trace Buffer
102 ITM: Instrumentation Trace Macrocell
103 TPIU: Trace Port Interface Unit
104 TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router
105 TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO
106 CTI: Cross Trigger Interface
111 ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
113 Funnel, replicator (intelligent or not), TMC-ETR
115 ETBv1.0, ETB1.1, TPIU, TMC-ETF
121 ----------------------
123 See Documentation/devicetree/bindings/arm/coresight.txt for details.
125 As of this writing drivers for ITM, STMs and CTIs are not provided but are
126 expected to be added as the solution matures.
129 Framework and implementation
130 ----------------------------
132 The coresight framework provides a central point to represent, configure and
133 manage coresight devices on a platform. Any coresight compliant device can
134 register with the framework for as long as they use the right APIs:
136 struct coresight_device *coresight_register(struct coresight_desc *desc);
137 void coresight_unregister(struct coresight_device *csdev);
139 The registering function is taking a "struct coresight_device *csdev" and
140 register the device with the core framework. The unregister function takes
141 a reference to a "struct coresight_device", obtained at registration time.
143 If everything goes well during the registration process the new devices will
144 show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
146 root:~# ls /sys/bus/coresight/devices/
147 replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
148 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
151 The functions take a "struct coresight_device", which looks like this:
153 struct coresight_desc {
154 enum coresight_dev_type type;
155 struct coresight_dev_subtype subtype;
156 const struct coresight_ops *ops;
157 struct coresight_platform_data *pdata;
159 const struct attribute_group **groups;
163 The "coresight_dev_type" identifies what the device is, i.e, source link or
164 sink while the "coresight_dev_subtype" will characterise that type further.
166 The "struct coresight_ops" is mandatory and will tell the framework how to
167 perform base operations related to the components, each component having
168 a different set of requirement. For that "struct coresight_ops_sink",
169 "struct coresight_ops_link" and "struct coresight_ops_source" have been
172 The next field, "struct coresight_platform_data *pdata" is acquired by calling
173 "of_get_coresight_platform_data()", as part of the driver's _probe routine and
174 "struct device *dev" gets the device reference embedded in the "amba_device":
176 static int etm_probe(struct amba_device *adev, const struct amba_id *id)
180 drvdata->dev = &adev->dev;
184 Specific class of device (source, link, or sink) have generic operations
185 that can be performed on them (see "struct coresight_ops"). The
186 "**groups" is a list of sysfs entries pertaining to operations
187 specific to that component only. "Implementation defined" customisations are
188 expected to be accessed and controlled using those entries.
191 How to use the tracer modules
192 -----------------------------
194 There are two ways to use the Coresight framework: 1) using the perf cmd line
195 tools and 2) interacting directly with the Coresight devices using the sysFS
196 interface. Preference is given to the former as using the sysFS interface
197 requires a deep understanding of the Coresight HW. The following sections
198 provide details on using both methods.
200 1) Using the sysFS interface:
202 Before trace collection can start, a coresight sink needs to be identified.
203 There is no limit on the amount of sinks (nor sources) that can be enabled at
204 any given moment. As a generic operation, all device pertaining to the sink
205 class will have an "active" entry in sysfs:
207 root:/sys/bus/coresight/devices# ls
208 replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
209 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
210 root:/sys/bus/coresight/devices# ls 20010000.etb
211 enable_sink status trigger_cntr
212 root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
213 root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
215 root:/sys/bus/coresight/devices#
217 At boot time the current etm3x driver will configure the first address
218 comparator with "_stext" and "_etext", essentially tracing any instruction
219 that falls within that range. As such "enabling" a source will immediately
220 trigger a trace capture:
222 root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
223 root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
225 root:/sys/bus/coresight/devices# cat 20010000.etb/status
229 RAM wrt ptr: 0x19d3 <----- The write pointer is moving
234 root:/sys/bus/coresight/devices#
236 Trace collection is stopped the same way:
238 root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
239 root:/sys/bus/coresight/devices#
241 The content of the ETB buffer can be harvested directly from /dev:
243 root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
248 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
249 root:/sys/bus/coresight/devices#
251 The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
253 Following is a DS-5 output of an experimental loop that increments a variable up
254 to a certain value. The example is simple and yet provides a glimpse of the
255 wealth of possibilities that coresight provides.
258 Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
259 Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
260 Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
261 Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
262 Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
263 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
264 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
265 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
266 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
267 Timestamp Timestamp: 17106715833
268 Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
269 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
270 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
271 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
272 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
273 Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
274 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
275 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
276 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
277 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
278 Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
279 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
280 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
281 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
282 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
283 Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
284 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
285 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
286 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
287 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
288 Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
289 Instruction 0 0x8026B550 E3530004 false CMP r3,#4
290 Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
291 Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
292 Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
293 Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
294 Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
295 Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
296 Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
297 Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
298 Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
299 Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
300 Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
301 Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
303 Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
304 Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
305 Timestamp Timestamp: 17107041535
307 2) Using perf framework:
309 Coresight tracers are represented using the Perf framework's Performance
310 Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
311 controlling when tracing gets enabled based on when the process of interest is
312 scheduled. When configured in a system, Coresight PMUs will be listed when
313 queried by the perf command line tool:
315 linaro@linaro-nano:~$ ./perf list pmu
317 List of pre-defined events (to be used in -e):
319 cs_etm// [Kernel PMU event]
321 linaro@linaro-nano:~$
323 Regardless of the number of tracers available in a system (usually equal to the
324 amount of processor cores), the "cs_etm" PMU will be listed only once.
326 A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
327 listed along with configuration options within forward slashes '/'. Since a
328 Coresight system will typically have more than one sink, the name of the sink to
329 work with needs to be specified as an event option. Names for sink to choose
330 from are listed in sysFS under ($SYSFS)/bus/coresight/devices:
332 root@linaro-nano:~# ls /sys/bus/coresight/devices/
333 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
334 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
335 20070000.etr 20120000.replicator 220c0000.funnel
336 23040000.etm 23140000.etm 23340000.etm
338 root@linaro-nano:~# perf record -e cs_etm/@20070000.etr/u --per-thread program
340 The syntax within the forward slashes '/' is important. The '@' character
341 tells the parser that a sink is about to be specified and that this is the sink
342 to use for the trace session.
344 More information on the above and other example on how to use Coresight with
345 the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
348 2.1) AutoFDO analysis using the perf tools:
350 perf can be used to record and analyze trace of programs.
352 Execution can be recorded using 'perf record' with the cs_etm event,
353 specifying the name of the sink to record to, e.g:
355 perf record -e cs_etm/@20070000.etr/u --per-thread
357 The 'perf report' and 'perf script' commands can be used to analyze execution,
358 synthesizing instruction and branch events from the instruction trace.
359 'perf inject' can be used to replace the trace data with the synthesized events.
360 The --itrace option controls the type and frequency of synthesized events
361 (see perf documentation).
363 Note that only 64-bit programs are currently supported - further work is
364 required to support instruction decode of 32-bit Arm programs.
367 Generating coverage files for Feedback Directed Optimization: AutoFDO
368 ---------------------------------------------------------------------
370 'perf inject' accepts the --itrace option in which case tracing data is
371 removed and replaced with the synthesized events. e.g.
373 perf inject --itrace --strip -i perf.data -o perf.data.new
375 Below is an example of using ARM ETM for autoFDO. It requires autofdo
376 (https://github.com/google/autofdo) and gcc version 5. The bubble
377 sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
379 $ gcc-5 -O3 sort.c -o sort
380 $ taskset -c 2 ./sort
381 Bubble sorting array of 30000 elements
384 $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
385 Bubble sorting array of 30000 elements
387 [ perf record: Woken up 35 times to write data ]
388 [ perf record: Captured and wrote 69.640 MB perf.data ]
390 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
391 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
392 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
393 $ taskset -c 2 ./sort_autofdo
394 Bubble sorting array of 30000 elements
398 How to use the STM module
399 -------------------------
401 Using the System Trace Macrocell module is the same as the tracers - the only
402 difference is that clients are driving the trace capture rather
403 than the program flow through the code.
405 As with any other CoreSight component, specifics about the STM tracer can be
406 found in sysfs with more information on each entry being found in [1]:
408 root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
409 enable_source hwevent_select port_enable subsystem uevent
410 hwevent_enable mgmt port_select traceid
413 Like any other source a sink needs to be identified and the STM enabled before
416 root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
417 root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
419 From there user space applications can request and use channels using the devfs
420 interface provided for that purpose by the generic STM API:
422 root@genericarmv8:~# ls -l /dev/20100000.stm
423 crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm
426 Details on how to use the generic STM API can be found here [2].
428 [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
429 [2]. Documentation/trace/stm.rst
430 [3]. https://github.com/Linaro/perf-opencsd