Documentation/trace/hwlat_detector.txt - third_party/linux - Git at Google

 Introduction:
 -------------

 The tracer hwlat_detector is a special purpose tracer that is used to
 detect large system latencies induced by the behavior of certain underlying
 hardware or firmware, independent of Linux itself. The code was developed
 originally to detect SMIs (System Management Interrupts) on x86 systems,
 however there is nothing x86 specific about this patchset. It was
 originally written for use by the "RT" patch since the Real Time
 kernel is highly latency sensitive.

 SMIs are not serviced by the Linux kernel, which means that it does not
 even know that they are occuring. SMIs are instead set up by BIOS code
 and are serviced by BIOS code, usually for "critical" events such as
 management of thermal sensors and fans. Sometimes though, SMIs are used for
 other tasks and those tasks can spend an inordinate amount of time in the
 handler (sometimes measured in milliseconds). Obviously this is a problem if
 you are trying to keep event service latencies down in the microsecond range.

 The hardware latency detector works by hogging one of the cpus for configurable
 amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
 for some period, then looking for gaps in the TSC data. Any gap indicates a
 time when the polling was interrupted and since the interrupts are disabled,
 the only thing that could do that would be an SMI or other hardware hiccup
 (or an NMI, but those can be tracked).

 Note that the hwlat detector should *NEVER* be used in a production environment.
 It is intended to be run manually to determine if the hardware platform has a
 problem with long system firmware service routines.

 Usage:
 ------

 Write the ASCII text "hwlat" into the current_tracer file of the tracing system
 (mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to
 redefine the threshold in microseconds (us) above which latency spikes will
 be taken into account.

 Example:

 	# echo hwlat > /sys/kernel/tracing/current_tracer
 	# echo 100 > /sys/kernel/tracing/tracing_thresh

 The /sys/kernel/tracing/hwlat_detector interface contains the following files:

 width			- time period to sample with CPUs held (usecs)
 			  must be less than the total window size (enforced)
 window			- total period of sampling, width being inside (usecs)

 By default the width is set to 500,000 and window to 1,000,000, meaning that
 for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
 (0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will
 change to a default of 10 usecs. If any latencies that exceed the threshold is
 observed then the data will be written to the tracing ring buffer.

 The minimum sleep time between periods is 1 millisecond. Even if width
 is less than 1 millisecond apart from window, to allow the system to not
 be totally starved.

 If tracing_thresh was zero when hwlat detector was started, it will be set
 back to zero if another tracer is loaded. Note, the last value in
 tracing_thresh that hwlat detector had will be saved and this value will
 be restored in tracing_thresh if it is still zero when hwlat detector is
 started again.

 The following tracing directory files are used by the hwlat_detector:

 in /sys/kernel/tracing:

  tracing_threshold	- minimum latency value to be considered (usecs)
  tracing_max_latency	- maximum hardware latency actually observed (usecs)
  tracing_cpumask	- the CPUs to move the hwlat thread across
  hwlat_detector/width	- specified amount of time to spin within window (usecs)
  hwlat_detector/window	- amount of time between (width) runs (usecs)

 The hwlat detector's kernel thread will migrate across each CPU specified in
 tracing_cpumask between each window. To limit the migration, either modify
 tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU
 affinity directly, and the migration will stop.
	Introduction:
	-------------

	The tracer hwlat_detector is a special purpose tracer that is used to
	detect large system latencies induced by the behavior of certain underlying
	hardware or firmware, independent of Linux itself. The code was developed
	originally to detect SMIs (System Management Interrupts) on x86 systems,
	however there is nothing x86 specific about this patchset. It was
	originally written for use by the "RT" patch since the Real Time
	kernel is highly latency sensitive.

	SMIs are not serviced by the Linux kernel, which means that it does not
	even know that they are occuring. SMIs are instead set up by BIOS code
	and are serviced by BIOS code, usually for "critical" events such as
	management of thermal sensors and fans. Sometimes though, SMIs are used for
	other tasks and those tasks can spend an inordinate amount of time in the
	handler (sometimes measured in milliseconds). Obviously this is a problem if
	you are trying to keep event service latencies down in the microsecond range.

	The hardware latency detector works by hogging one of the cpus for configurable
	amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
	for some period, then looking for gaps in the TSC data. Any gap indicates a
	time when the polling was interrupted and since the interrupts are disabled,
	the only thing that could do that would be an SMI or other hardware hiccup
	(or an NMI, but those can be tracked).

	Note that the hwlat detector should NEVER be used in a production environment.
	It is intended to be run manually to determine if the hardware platform has a
	problem with long system firmware service routines.

	Usage:
	------

	Write the ASCII text "hwlat" into the current_tracer file of the tracing system
	(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to
	redefine the threshold in microseconds (us) above which latency spikes will
	be taken into account.

	Example:

	# echo hwlat > /sys/kernel/tracing/current_tracer
	# echo 100 > /sys/kernel/tracing/tracing_thresh

	The /sys/kernel/tracing/hwlat_detector interface contains the following files:

	width - time period to sample with CPUs held (usecs)
	must be less than the total window size (enforced)
	window - total period of sampling, width being inside (usecs)

	By default the width is set to 500,000 and window to 1,000,000, meaning that
	for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
	(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will
	change to a default of 10 usecs. If any latencies that exceed the threshold is
	observed then the data will be written to the tracing ring buffer.

	The minimum sleep time between periods is 1 millisecond. Even if width
	is less than 1 millisecond apart from window, to allow the system to not
	be totally starved.

	If tracing_thresh was zero when hwlat detector was started, it will be set
	back to zero if another tracer is loaded. Note, the last value in
	tracing_thresh that hwlat detector had will be saved and this value will
	be restored in tracing_thresh if it is still zero when hwlat detector is
	started again.

	The following tracing directory files are used by the hwlat_detector:

	in /sys/kernel/tracing:

	tracing_threshold - minimum latency value to be considered (usecs)
	tracing_max_latency - maximum hardware latency actually observed (usecs)
	tracing_cpumask - the CPUs to move the hwlat thread across
	hwlat_detector/width - specified amount of time to spin within window (usecs)
	hwlat_detector/window - amount of time between (width) runs (usecs)

	The hwlat detector's kernel thread will migrate across each CPU specified in
	tracing_cpumask between each window. To limit the migration, either modify
	tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU
	affinity directly, and the migration will stop.