L2 data loads. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]:
00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only
02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines)
EventSel=29H
CoreOnly
L2_LINES_IN
L2 lines allocated. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]:
00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only
02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines)
EventSel=24H
CoreOnly
L2_LINES_OUT
L2 lines evicted. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]:
00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only
02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines)
EventSel=26H
CoreOnly
L2_M_LINES_OUT
Lw M-state lines evicted. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]:
00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only
02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines)
EventSel=27H
CoreOnly
DATA_MEM_REFS
All loads from any memory type. All stores to any memory type. Each part of a split is counted separately. The internal logic counts not only memory loads and stores, but also internal retries.
80-bit floating-point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64-bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performed, is only counted once).
Does not include I/O accesses, or other nonmemory accesses.
EventSel=43H,UMask=00H
CoreOnly
DCU_LINES_IN
Total lines allocated in DCU.
EventSel=45H,UMask=00H
CoreOnly
DCU_M_LINES_IN
Number of M state lines allocated in DCU.
EventSel=46H,UMask=00H
CoreOnly
DCU_M_LINES_ OUT
Number of M state lines evicted from DCU.
This includes evictions via snoop HITM, intervention or replacement.
EventSel=47H,UMask=00H
CoreOnly
DCU_MISS_ OUTSTANDING
Weighted number of cycles while a DCU miss is outstanding, incremented by the number of outstanding cache misses at any particular time.
Cacheable read requests only are considered.
Uncacheable requests are excluded.
Read-for-ownerships are counted, as well as line fills, invalidates, and stores. An access that also misses the L2 is short-changed by 2 cycles (i.e., if counts N cycles, should be N+2 cycles).
Subsequent loads to the same cache line will not result in any additional counts.
Count value not precise, but still useful.
EventSel=48H,UMask=00H
CoreOnly
IFU_IFETCH
Number of instruction fetches, both cacheable and noncacheable, including UC fetches.
EventSel=80H,UMask=00H
CoreOnly
IFU_IFETCH_ MISS
Number of instruction fetch misses
All instruction fetches that do not hit the IFU (i.e., that produce memory requests). This includes UC accesses.
EventSel=81H,UMask=00H
CoreOnly
ITLB_MISS
Number of ITLB misses.
EventSel=85H,UMask=00H
CoreOnly
IFU_MEM_STALL
Number of cycles instruction fetch is stalled, for any reason.
Includes IFU cache misses, ITLB misses, ITLB faults, and other minor stalls.
EventSel=86H,UMask=00H
CoreOnly
ILD_STALL
Number of cycles that the instruction length decoder is stalled.
EventSel=87H,UMask=00H
CoreOnly
L2_IFETCH
Number of L2 instruction fetches. The count includes only L2 cacheable instruction fetches; it does not include UC instruction fetches.
This event indicates that a normal instruction fetch was received by the L2. It does not include ITLB miss accesses.
EventSel=28H,UMask=0FH
CoreOnly
L2_LD
Number of L2 data loads.
This event indicates that a normal, unlocked, load memory access was received by the L2.
It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses.
It does include L2 cacheable TLB miss memory accesses.
EventSel=29H,UMask=0FH
CoreOnly
L2_ST
Number of L2 data stores.
This event indicates that a normal, unlocked, store memory access was received by the L2.
it indicates that the DCU sent a read-for- ownership request to the L2. It also includes Invalid to Modified requests sent by the DCU to the L2.
It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses.
It includes TLB miss memory accesses.
EventSel=2AH,UMask=0FH
CoreOnly
L2_LINES_IN
Number of lines allocated in the L2.
EventSel=24H,UMask=00H
CoreOnly
L2_LINES_OUT
Number of lines removed from the L2 for any reason.
EventSel=26H,UMask=00H
CoreOnly
L2_M_LINES_INM
Number of modified lines allocated in the L2.
EventSel=25H,UMask=00H
CoreOnly
L2_M_LINES_ OUTM
Number of modified lines removed from the L2 for any reason.
EventSel=27H,UMask=00H
CoreOnly
L2_RQSTS
Total number of L2 requests.
EventSel=2EH,UMask=0FH
CoreOnly
L2_ADS
Number of L2 address strobes.
EventSel=21H,UMask=00H
CoreOnly
L2_DBUS_BUSY
Number of cycles during which the L2 cache data bus was busy.
EventSel=22H,UMask=00H
CoreOnly
L2_DBUS_BUSY_ RD
Number of cycles during which the data bus was busy transferring read data from L2 to the processor.
EventSel=23H,UMask=00H
CoreOnly
BUS_DRDY_ CLOCKS
Number of clocks during which DRDY# is asserted.
Utilization of the external system data bus during data transfers. Unit Mask = 00H counts bus clocks when the processor is driving DRDY#.
Unit Mask = 20H counts in processor clocks when any agent is driving DRDY#.
EventSel=62H,UMask=00H
CoreOnly
BUS_LOCK_ CLOCKS
Number of clocks during which LOCK# is asserted on the external system bus. Always counts in processor clocks.
EventSel=63H,UMask=00H
CoreOnly
BUS_REQ_ OUTSTANDING
Number of bus requests outstanding.
This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle. Counts only DCU full-line cacheable reads, not RFOs, writes, instruction fetches, or anything else. Counts “waiting for bus to complete” (last data chunk received).
EventSel=60H,UMask=00H
CoreOnly
BUS_TRAN_BRD
Number of burst read transactions.
EventSel=65H,UMask=00H
CoreOnly
BUS_TRAN_RFO
Number of completed read for ownership transactions.
EventSel=66H,UMask=00H
CoreOnly
BUS_TRANS_WB
Number of completed write back transactions.
EventSel=67H,UMask=00H
CoreOnly
BUS_TRAN_ IFETCH
Number of completed instruction fetch transactions.
EventSel=68H,UMask=00H
CoreOnly
BUS_TRAN_INVA L
Number of completed invalidate transactions.
EventSel=69H,UMask=00H
CoreOnly
BUS_TRAN_PWR
Number of completed partial write transactions.
EventSel=6AH,UMask=00H
CoreOnly
BUS_TRANS_P
Number of completed partial transactions.
EventSel=6BH,UMask=00H
CoreOnly
BUS_TRANS_IO
Number of completed I/O transactions.
EventSel=6CH,UMask=00H
CoreOnly
BUS_TRAN_DEF
Number of completed deferred transactions.
EventSel=6DH,UMask=00H
CoreOnly
BUS_TRAN_ BURST
Number of completed burst transactions.
EventSel=6EH,UMask=00H
CoreOnly
BUS_TRAN_ANY
Number of all completed bus transactions.
Address bus utilization can be calculated knowing the minimum address bus occupancy.
Includes special cycles, etc.
EventSel=70H,UMask=00H
CoreOnly
BUS_TRAN_MEM
Number of completed memory transactions.
EventSel=6FH,UMask=00H
CoreOnly
BUS_DATA_RCV
Number of bus clock cycles during which this processor is receiving data.
EventSel=64H,UMask=00H
CoreOnly
BUS_BNR_DRV
Number of bus clock cycles during which this processor is driving the BNR# pin.
EventSel=61H,UMask=00H
CoreOnly
BUS_HIT_DRV
Number of bus clock cycles during which this processor is driving the HIT# pin. Includes cycles due to snoop stalls.
The event counts correctly, but BPMi (breakpoint monitor) pins function as follows based on the setting of the PC bits (bit 19 in the PerfEvtSel0 and PerfEvtSel1 registers):
• If the core-clock-to- bus-clock ratio is 2:1 or 3:1, and a PC bit is set, the BPMi pins will be asserted for a single clock when the counters overflow.
• If the PC bit is clear, the processor toggles the BPMi pins when the counter overflows.
• If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performance monitoring counter events.
EventSel=7AH,UMask=00H
CoreOnly
BUS_HITM_DRV
Number of bus clock cycles during which this processor is driving the HITM# pin. Includes cycles due to snoop stalls.
The event counts correctly, but BPMi (breakpoint monitor) pins function as follows based on the setting of the PC bits (bit 19 in the PerfEvtSel0 and PerfEvtSel1 registers):
• If the core-clock-to- bus-clock ratio is 2:1 or 3:1, and a PC bit is set, the BPMi pins will be asserted for a single clock when the counters overflow. • If the PC bit is clear, the processor toggles the BPMipins when the counter overflows.
• If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performance monitoring counter events.
EventSel=7BH,UMask=00H
CoreOnly
BUS_SNOOP_ STALL
Number of clock cycles during which the bus is snoop stalled.
EventSel=7EH,UMask=00H
CoreOnly
FLOPS
Number of computational floating-point operations retired.
Excludes floating-point computational operations that cause traps or assists.
Includes floating-point computational operations executed by the assist handler.
Includes internal sub-operations for complex floating-point instructions like transcendentals.
Excludes floating-point loads and stores. Counter 0 only.
EventSel=C1H,UMask=00H
CoreOnly
FP_COMP_OPS_ EXE
Number of computational floating-point operations executed.
The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs.
This number does not include the number of cycles, but the number of operations.
This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction. Counter 0 only.
EventSel=10H,UMask=00H
CoreOnly
FP_ASSIST
Number of floating-point exception cases handled by microcode. Counter 1 only.
This event includes counts due to speculative execution.
EventSel=11H,UMask=00H
CoreOnly
MUL
Number of multiplies.
This count includes integer as well as FP multiplies and is speculative. Counter 1 only.
EventSel=12H,UMask=00H
CoreOnly
DIV
Number of divides.
This count includes integer as well as FP divides and is speculative. Counter 1 only.
EventSel=13H,UMask=00H
CoreOnly
CYCLES_DIV_ BUSY
Number of cycles during which the divider is busy, and cannot accept new divides.
This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative. Counter 0 only.
EventSel=14H,UMask=00H
CoreOnly
LD_BLOCKS
Number of load operations delayed due to store buffer blocks.
Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load.
EventSel=03H,UMask=00H
CoreOnly
SB_DRAINS
Number of store buffer drain cycles.
Incremented every cycle the store buffer is draining.
Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing).
EventSel=04H,UMask=00H
CoreOnly
MISALIGN_ MEM_REF
Number of misaligned data memory references.
Incremented by 1 every cycle, during which either the processor’s load or store pipeline dispatches a misaligned ?op.
Counting is performed if it is the first or second half, or if it is blocked, squashed, or missed.
In this context, misaligned means crossing a 64-bit boundary. MISALIGN_MEM_
REF is only an approximation to the true number of misaligned memory references.
The value returned is roughly proportional to the number of misaligned memory accesses (the size of the problem).
EventSel=05H,UMask=00H
CoreOnly
EMON_KNI_PREF
_DISPATCHED
Number of Streaming SIMD extensions prefetch/weakly-ordered instructions dispatched (speculative prefetches are included in counting):
0: prefetch NTA
1: prefetch T1
2: prefetch T2
3: weakly ordered stores Counters 0 and 1. Pentium III processor only.
EventSel=07H,UMask=00H
01H
02H
03H
CoreOnly
EMON_KNI_PREF
_MISS
Number of prefetch/weakly-ordered instructions that miss all caches:
0: prefetch NTA
1: prefetch T1
2: prefetch T2
3: weakly ordered stores Counters 0 and 1. Pentium III processor only.
EventSel=4BH,UMask=00H
01H
02H
03H
CoreOnly
INST_RETIRED
Number of instructions retired. A hardware interrupt received during/after the last iteration of the REP STOS flow causes the counter to undercount by 1 instruction. An SMI received while executing a HLT instruction will cause the performance counter to not count the RSM instruction and undercount by 1.
EventSel=C0H,UMask=00H
CoreOnly
UOPS_RETIRED
Number of ?ops retired.
EventSel=C2H,UMask=00H
CoreOnly
INST_DECODED
Number of instructions decoded.
EventSel=D0H,UMask=00H
CoreOnly
EMON_KNI_INST_ RETIRED
Number of Streaming SIMD extensions retired:
0: packed & scalar 1: scalar. Counters 0 and 1. Pentium III processor only.
EventSel=D8H,UMask=00H
01H
CoreOnly
EMON_KNI_ COMP_ INST_RET
Number of Streaming SIMD extensions computation instructions retired:
0: packed and scalar 1: scalar Counters 0 and 1. Pentium III processor only.
EventSel=D9H,UMask=00H
01H
CoreOnly
HW_INT_RX
Number of hardware interrupts received.
EventSel=C8H,UMask=00H
CoreOnly
CYCLES_INT_ MASKED
Number of processor cycles for which interrupts are disabled.
EventSel=C6H,UMask=00H
CoreOnly
CYCLES_INT_ PENDING_ AND_MASKED
Number of processor cycles for which interrupts are disabled and interrupts are pending.
EventSel=C7H,UMask=00H
CoreOnly
BR_INST_ RETIRED
Number of branch instructions retired.
EventSel=C4H,UMask=00H
CoreOnly
BR_MISS_PRED_ RETIRED
Number of mispredicted branches retired.
EventSel=C5H,UMask=00H
CoreOnly
BR_TAKEN_ RETIRED
Number of taken branches retired.
EventSel=C9H,UMask=00H
CoreOnly
BR_MISS_PRED_ TAKEN_RET
Number of taken mispredictions branches retired.
EventSel=CAH,UMask=00H
CoreOnly
BR_INST_ DECODED
Number of branch instructions decoded.
EventSel=E0H,UMask=00H
CoreOnly
BTB_MISSES
Number of branches for which the BTB did not produce a prediction.
EventSel=E2H,UMask=00H
CoreOnly
BR_BOGUS
Number of bogus branches.
EventSel=E4H,UMask=00H
CoreOnly
BACLEARS
Number of times BACLEAR is asserted.
This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not.
EventSel=E6H,UMask=00H
CoreOnly
RESOURCE_ STALLS
Incremented by 1 during every cycle for which there is a resource related stall.
Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc.
In addition to resource related stalls, this event counts some other events.
Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations.
EventSel=A2H,UMask=00H
CoreOnly
PARTIAL_RAT_ STALLS
Number of cycles or events for partial stalls. This includes flag partial stalls.
EventSel=D2H,UMask=00H
CoreOnly
SEGMENT_REG_ LOADS
Number of segment register loads.
EventSel=06H,UMask=00H
CoreOnly
CPU_CLK_ UNHALTED
Number of cycles during which the processor is not halted.
EventSel=79H,UMask=00H
CoreOnly
MMX_INSTR_ EXEC
Number of MMX Instructions Executed. Available in Intel Celeron, Pentium II and Pentium II Xeon processors only.
Does not account for MOVQ and MOVD stores from register to memory.
EventSel=B0H,UMask=00H
CoreOnly
MMX_SAT_ INSTR_EXEC
Number of MMX Saturating Instructions Executed. Available in Pentium II and Pentium III processors only.
EventSel=B1H,UMask=00H
CoreOnly
MMX_UOPS_ EXEC
Number of MMX ?ops Executed. Available in Pentium II and Pentium III processors only.
EventSel=B2H,UMask=0FH
CoreOnly
MMX_INSTR_ TYPE_EXEC
MMX packed multiply instructions executed. MMX packed shift instructions executed.
MMX pack operation instructions executed.
MMX unpack operation instructions executed.
MMX packed logical instructions executed.
MMX packed arithmetic instructions executed. Available in Pentium II and Pentium III processors only.
EventSel=B3H,UMask=01H
02H
04H
08H
10H
20H
CoreOnly
FP_MMX_TRANS
Transitions from MMX instruction to floating-point instructions.
Transitions from floating-point instructions to MMX instructions. Available in Pentium II and Pentium III processors only.
EventSel=CCH,UMask=00H
01H
CoreOnly
MMX_ASSIST
Number of MMX Assists (that is, the number of EMMS instructions executed). Available in Pentium II and Pentium III processors only.
EventSel=CDH,UMask=00H
CoreOnly
MMX_INSTR_RET
Number of MMX Instructions Retired. Available in Pentium II processors only.
EventSel=CEH,UMask=00H
CoreOnly
SEG_RENAME_ STALLS
Number of Segment Register Renaming Stalls: Available in Pentium II and Pentium III processors only.
EventSel=D4H
CoreOnly
SEG_RENAME_ STALLS
Segment register ES
EventSel=D4H,UMask=02H
CoreOnly
SEG_RENAME_ STALLS
Segment register DS
EventSel=D4H,UMask=04H
CoreOnly
SEG_RENAME_ STALLS
Segment register FS
EventSel=D4H,UMask=08H
CoreOnly
SEG_RENAME_ STALLS
Segment register FS
EventSel=D4H,UMask=0FH
CoreOnly
SEG_REG_ RENAMES
Number of Segment Register Renames: Available in Pentium II and Pentium III processors only.
EventSel=D5H
CoreOnly
Segment register ES
EventSel=D5H,UMask=01H
CoreOnly
Segment register DS
EventSel=D5H,UMask=02H
CoreOnly
Segment register FS
EventSel=D5H,UMask=04H
CoreOnly
Segment register FS
EventSel=D5H,UMask=08H
CoreOnly
RET_SEG_ RENAMES
Number of segment register rename events retired. Available in Pentium II and Pentium III processors only.
EventSel=D6H,UMask=00H
CoreOnly
DATA_READ
Number of memory data reads (internal data cache hit and miss combined). Split cycle reads are counted individually. Data Memory Reads that are part of TLB miss processing are not included. These events may occur at a maximum of two per clock. I/O is not included.
EventSel=00H
CoreOnly
DATA_WRITE
Number of memory data writes (internal data cache hit and miss combined); I/O not included. Split cycle writes are counted individually. These events may occur at a maximum of two per clock. I/O is not included.
EventSel=01H
CoreOnly
DATA_TLB_MISS
Number of misses to the data cache translation look-aside buffer.
EventSel=0H2
CoreOnly
DATA_READ_MISS
Number of memory read accesses that miss the internal data cache whether or not the access is cacheable or noncacheable. Additional reads to the same cache line after the first BRDY# of the burst line fill is returned but before the final (fourth) BRDY# has been returned, will not cause the counter to be incremented additional times.
Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included.
EventSel=03H
CoreOnly
DATA WRITE MISS
Number of memory write accesses that miss the internal data cache whether or not the access is cacheable or noncacheable. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included.
EventSel=04H
CoreOnly
WRITE_HIT_TO_ M-_OR_E- STATE_LINES
Number of write hits to exclusive or modified lines in the data cache. These are the writes that may be held up if EWBE# is inactive. These events may occur a maximum of two per clock.
EventSel=05H
CoreOnly
DATA_CACHE_ LINES_ WRITTEN_BACK
Number of dirty lines (all) that are written back, regardless of the cause. Replacements and internal and external snoops can all cause writeback and are counted.
EventSel=06H
CoreOnly
EXTERNAL_ SNOOPS
Number of accepted external snoops whether they hit in the code cache or data cache or neither. Assertions of EADS# outside of the sampling interval are not counted, and no internal snoops are counted.
EventSel=07H
CoreOnly
EXTERNAL_DATA_ CACHE_SNOOP_ HITS
Number of external snoops to the data cache. Snoop hits to a valid line in either the data cache, the data line fill buffer, or one of the write back buffers are all counted as hits.
EventSel=08H
CoreOnly
MEMORY ACCESSES IN BOTH PIPES
Number of data memory reads or writes that are paired in both pipes of the pipeline. These accesses are not necessarily run in parallel due to cache misses, bank conflicts, etc.
EventSel=09H
CoreOnly
BANK CONFLICTS
Number of actual bank conflicts.
EventSel=0AH
CoreOnly
MISALIGNED DATA MEMORY OR I/O REFERENCES
Number of memory or I/O reads or writes that are misaligned. A 2- or 4-byte access is misaligned when it crosses a 4- byte boundary; an 8-byte access is misaligned when it crosses an 8-byte boundary. Ten byte accesses are treated as two separate accesses of 8 and 2 bytes each.
EventSel=0BH
CoreOnly
CODE READ
Number of instruction reads; whether the read is cacheable or noncacheable. Individual 8-byte noncacheable instruction reads are counted.
EventSel=0CH
CoreOnly
CODE TLB MISS
Number of instruction reads that miss the code TLB whether the read is cacheable or noncacheable. Individual 8-byte noncacheable instruction reads are counted.
EventSel=0DH
CoreOnly
CODE CACHE MISS
Number of instruction reads that miss the internal code cache; whether the read is cacheable or noncacheable. Individual 8-byte noncacheable instruction reads are counted.
EventSel=0EH
CoreOnly
ANY SEGMENT REGISTER LOADED
Number of writes into any segment register in real or protected mode including the LDTR, GDTR, IDTR, and TR. Segment loads are caused by explicit segment register load instructions, far control transfers, and task switches. Far control transfers and task switches causing a privilege level change will signal this event twice. Interrupts and exceptions may initiate a far control transfer.
EventSel=0FH
CoreOnly
Reserved
EventSel=10H
CoreOnly
Reserved
EventSel=11H
CoreOnly
Branches
Number of taken and not taken branches, including: conditional branches, jumps, calls, returns, software interrupts, and interrupt returns. Also counted as taken branches are serializing instructions, VERR and VERW instructions, some segment descriptor loads, hardware interrupts (including FLUSH#), and programmatic exceptions that invoke a trap or fault handler. The pipe is not necessarily flushed.
The number of branches actually executed is measured, not the number of predicted branches.
EventSel=12H
CoreOnly
BTB_HITS
Number of BTB hits that occur. Hits are counted only for those instructions that are actually executed.
EventSel=13H
CoreOnly
TAKEN_BRANCH_ OR_BTB_HIT
Number of taken branches or BTB hits that occur. This event type is a logical OR of taken branches and BTB hits. It represents an event that may cause a hit in the BTB. Specifically, it is either a candidate for a space in the BTB or it is already in the BTB.
EventSel=14H
CoreOnly
PIPELINE FLUSHES
Number of pipeline flushes that occur
Pipeline flushes are caused by BTB misses on taken branches, mispredictions, exceptions, interrupts, and some segment descriptor loads. The counter will not be incremented for serializing instructions (serializing instructions cause the prefetch queue to be flushed but will not trigger the Pipeline Flushed event counter) and software interrupts (software interrupts do not flush the pipeline).
EventSel=15H
CoreOnly
INSTRUCTIONS_ EXECUTED
Number of instructions executed (up to two per clock). Invocations of a fault handler are considered instructions. All hardware and software interrupts and exceptions will also cause the count to be incremented. Repeat prefixed string instructions will only increment this counter once despite the fact that the repeat loop executes the same instruction multiple times until the loop criteria is satisfied.
This applies to all the Repeat string instruction prefixes (i.e., REP, REPE, REPZ, REPNE, and REPNZ). This counter will also only increment once per each HLT instruction executed regardless of how many cycles the processor remains in the HALT state.
EventSel=16H
CoreOnly
INSTRUCTIONS_ EXECUTED_ V PIPE
Number of instructions executed in the V_pipe.
The event indicates the number of instructions that were paired. This event is the same as the 16H event except it only counts the number of instructions actually executed in the V-pipe.
EventSel=17H
CoreOnly
BUS_CYCLE_ DURATION
Number of clocks while a bus cycle is in progress.
This event measures bus use. The count includes HLDA, AHOLD, and BOFF# clocks.
EventSel=18H
CoreOnly
WRITE_BUFFER_ FULL_STALL_ DURATION
Number of clocks while the pipeline is stalled due to full write buffers. Full write buffers stall data memory read misses, data memory write misses, and data memory write hits to S- state lines. Stalls on I/O accesses are not included.
EventSel=19H
CoreOnly
WAITING_FOR_ DATA_MEMORY_ READ_STALL_ DURATION
Number of clocks while the pipeline is stalled while waiting for data memory reads. Data TLB Miss processing is also included in the count. The pipeline stalls while a data memory read is in progress including attempts to read that are not bypassed while a line is being filled.
EventSel=1AH
CoreOnly
STALL ON WRITE TO AN E- OR M- STATE LINE
Number of stalls on writes to E- or M- state lines.
EventSel=1BH
CoreOnly
LOCKED BUS CYCLE
Number of locked bus cycles that occur as the result of the LOCK prefix or LOCK instruction, page-table updates, and descriptor table updates. Only the read portion of the locked read-modify-write is counted. Split locked cycles (SCYC active) count as two separate accesses. Cycles restarted due to BOFF# are not re-counted.
EventSel=1CH
CoreOnly
I/O READ OR WRITE CYCLE
Number of bus cycles directed to I/O space. Misaligned I/O accesses will generate two bus cycles. Bus cycles restarted due to BOFF# are not re-counted.
EventSel=1DH
CoreOnly
NONCACHEABLE_ MEMORY_READS
Number of noncacheable instruction or data memory read bus cycles.
The count includes read cycles caused by TLB misses, but does not include read cycles to I/O space. Cycles restarted due to BOFF# are not re-counted.
EventSel=1EH
CoreOnly
PIPELINE_AGI_ STALLS
Number of address generation interlock (AGI) stalls.
An AGI occurring in both the U- and V- pipelines in the same clock signals this event twice. An AGI occurs when the instruction in the execute stage of either of U- or V-pipelines is writing to either the index or base address register of an instruction in the D2 (address generation) stage of either the U- or V- pipelines.
EventSel=1FH
CoreOnly
Reserved
EventSel=20H
CoreOnly
Reserved
EventSel=21H
CoreOnly
FLOPS
Number of floating-point operations that occur. Number of floating-point adds, subtracts, multiplies, divides, remainders, and square roots are counted. The transcendental instructions consist of multiple adds and multiplies and will signal this event multiple times.
Instructions generating the divide-by-zero, negative square root, special operand, or stack exceptions will not be counted.
Instructions generating all other floating-point exceptions will be counted. The integer multiply instructions and other instructions which use the x87 FPU will be counted.
EventSel=22H
CoreOnly
BREAKPOINT MATCH ON DR0 REGISTER
Number of matches on register DR0 breakpoint. The counters is incremented regardless if the breakpoints are enabled or not. However, if breakpoints are not enabled, code breakpoint matches will not be checked for instructions executed in the V-pipe and will not cause this counter to be incremented. (They are checked on instruction executed in the U-pipe only when breakpoints are not enabled.)
These events correspond to the signals driven on the BP[3:0] pins. Refer to Chapter 17, “Debug, Branch Profile, TSC, and Intel® Resource Director Technology (Intel® RDT) Features” for more information.
EventSel=23H
CoreOnly
BREAKPOINT MATCH ON DR1 REGISTER
Number of matches on register DR1 breakpoint. See comment for 23H event.
EventSel=24H
CoreOnly
BREAKPOINT MATCH ON DR2 REGISTER
Number of matches on register DR2 breakpoint. See comment for 23H event.
EventSel=25H
CoreOnly
BREAKPOINT MATCH ON DR3 REGISTER
Number of matches on register DR3 breakpoint. See comment for 23H event.
EventSel=26H
CoreOnly
HARDWARE INTERRUPTS
Number of taken INTR and NMI interrupts.
EventSel=27H
CoreOnly
DATA_READ_OR_ WRITE
Number of memory data reads and/or writes (internal data cache hit and miss combined). Split cycle reads and writes are counted individually. Data Memory Reads that are part of TLB miss processing are not included. These events may occur at a maximum of two per clock. I/O is not included.
EventSel=28H
CoreOnly
DATA_READ_MISS OR_WRITE MISS
Number of memory read and/or write accesses that miss the internal data cache, whether or not the access is cacheable or noncacheable. Additional reads to the same cache line after the first BRDY# of the burst line fill is returned but before the final (fourth) BRDY# has been returned, will not cause the counter to be incremented additional times.
Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included.
EventSel=29H
CoreOnly
BUS_OWNERSHIP_ LATENCY
(Counter 0)
The time from LRM bus ownership request to bus ownership granted (that is, the time from the earlier of a PBREQ (0), PHITM# or HITM#
assertion to a PBGNT assertion) The ratio of the 2AH events counted on counter 0 and counter 1 is the average stall time due to bus ownership conflict.
EventSel=2AH
CoreOnly
BUS OWNERSHIP TRANSFERS
(Counter 1)
The number of buss ownership transfers (that is, the number of PBREQ (0) assertions The ratio of the 2AH events counted on counter 0 and counter 1 is the average stall time due to bus ownership conflict.
EventSel=2AH
CoreOnly
MMX_ INSTRUCTIONS_ EXECUTED_
U-PIPE (Counter 0)
Number of MMX instructions executed in the U-pipe
EventSel=2BH
CoreOnly
MMX_ INSTRUCTIONS_ EXECUTED_
V-PIPE (Counter 1)
Number of MMX instructions executed in the V-pipe
EventSel=2BH
CoreOnly
CACHE_M- STATE_LINE_ SHARING
(Counter 0)
Number of times a processor identified a hit to a modified line due to a memory access in the other processor (PHITM (O)) If the average memory latencies of the system are known, this event enables the user to count the Write Backs on PHITM(O) penalty and the Latency on Hit Modified(I) penalty.
EventSel=2CH
CoreOnly
CACHE_LINE_ SHARING
(Counter 1)
Number of shared data lines in the L1 cache (PHIT (O))
Number of transitions between MMX and floating-point instructions or vice versa
An even count indicates the processor is in MMX state. an odd count indicates it is in FP state. This event counts the first floating-point instruction following an MMX instruction or first MMX instruction following a floating-point instruction.
The count may be used to estimate the penalty in transitions between floating-point state and MMX state.
Number of clocks the bus is busy due to the processor’s own activity (the bus activity that is caused by the processor)
EventSel=2EH
CoreOnly
WRITES_TO_ NONCACHEABLE_ MEMORY
(Counter 1)
Number of write accesses to noncacheable memory The count includes write cycles caused by TLB misses and I/O write cycles.
Cycles restarted due to BOFF# are not re-counted.
Number of saturating MMX instructions executed, independently of whether they actually saturated.
EventSel=2FH
CoreOnly
SATURATIONS_ PERFORMED
(Counter 1)
Number of MMX instructions that used saturating arithmetic when at least one of its results actually saturated If an MMX instruction operating on 4 doublewords saturated in three out of the four results, the counter will be incremented by one only.
EventSel=2FH
CoreOnly
NUMBER_OF_ CYCLES_NOT_IN_ HALT_STATE
(Counter 0)
Number of cycles the processor is not idle due to HLT instruction This event will enable the user to calculate “net CPI”. Note that during the time that the processor is executing the HLT instruction, the Time-Stamp Counter is not disabled. Since this event is controlled by the Counter Controls CC0, CC1 it can be used to calculate the CPI at CPL=3, which the TSC cannot provide.
EventSel=30H
CoreOnly
DATA_CACHE_ TLB_MISS_ STALL_DURATION
(Counter 1)
Number of clocks the pipeline is stalled due to a data cache translation look- aside buffer (TLB) miss
EventSel=30H
CoreOnly
MMX_ INSTRUCTION_ DATA_READS
(Counter 0)
Number of MMX instruction data reads
EventSel=31H
CoreOnly
MMX_ INSTRUCTION_ DATA_READ_ MISSES
(Counter 1)
Number of MMX instruction data read misses
EventSel=31H
CoreOnly
FLOATING_POINT_S TALLS_DURATION
(Counter 0)
Number of clocks while pipe is stalled due to a floating-point freeze
EventSel=32H
CoreOnly
TAKEN_BRANCHES
(Counter 1)
Number of taken branches
EventSel=32H
CoreOnly
D1_STARVATION_ AND_FIFO_IS_ EMPTY
(Counter 0)
Number of times D1 stage cannot issue ANY instructions since the FIFO buffer is empty The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer.
Number of times the D1 stage issues a single instruction (since the FIFO buffer had just one instruction ready) The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer.
When combined with the previously defined events, Instruction Executed (16H) and Instruction Executed in the V-pipe (17H), this event enables the user to calculate the numbers of time pairing rules prevented issuing of two instructions.
EventSel=33H
CoreOnly
MMX_ INSTRUCTION_ DATA_WRITES
(Counter 0)
Number of data writes caused by MMX instructions
EventSel=34H
CoreOnly
MMX_ INSTRUCTION_ DATA_WRITE_ MISSES
(Counter 1)
Number of data write misses caused by MMX instructions
Number of pipeline flushes due to wrong branch predictions resolved in either the E-stage or the WB-stage The count includes any pipeline flush due to a branch that the pipeline did not follow correctly. It includes cases where a branch was not in the BTB, cases where a branch was in the BTB but was mispredicted, and cases where a branch was correctly predicted but to the wrong address.
Branches are resolved in either the Execute stage
(E-stage) or the Writeback stage (WB-stage). In the later case, the misprediction penalty is larger by one clock. The difference between the 35H event count in counter 0 and counter 1 is the number of E-stage resolved branches.
Number of returns predicted incorrectly or not predicted at all The count is the difference between the total number of executed returns and the number of returns that were correctly predicted. Only RET instructions are counted (for example, IRET instructions are not counted).
EventSel=37H
CoreOnly
PREDICTED_ RETURNS
(Counter 1)
Number of predicted returns (whether they are predicted correctly and incorrectly Only RET instructions are counted (for example, IRET instructions are not counted).
EventSel=37H
CoreOnly
MMX_MULTIPLY_ UNIT_INTERLOCK
(Counter 0)
Number of clocks the pipe is stalled since the destination of previous MMX multiply instruction is not ready yet The counter will not be incremented if there is another cause for a stall. For each occurrence of a multiply interlock, this event will be counted twice (if the stalled instruction comes on the next clock after the multiply) or by once (if the stalled instruction comes two clocks after the multiply).
Number of clocks a MOVD/MOVQ instruction store is stalled in D2 stage due to a previous MMX operation with a destination to be used in the store instruction.
EventSel=38H
CoreOnly
RETURNS
(Counter 0)
Number or returns executed. Only RET instructions are counted; IRET instructions are not counted. Any exception taken on a RET instruction and any interrupt recognized by the processor on the instruction boundary prior to the execution of the RET instruction will also cause this counter to be incremented.
EventSel=39H
CoreOnly
Reserved
EventSel=39H
CoreOnly
BTB_FALSE_ ENTRIES
(Counter 0)
Number of false entries in the Branch Target Buffer False entries are causes for misprediction other than a wrong prediction.