| EMON_EST_TRANS |
Number of Enhanced Intel SpeedStep technology transitions: Mask = 00H - All transitions Mask = 02H - Only Frequency transitions |
EventSel=58H |
CoreOnly |
| EMON_THERMAL_TRIP |
Duration/Occurrences in thermal trip; to count number of thermal trips: bit 22 in PerfEvtSel0/1 needs to be set to enable edge detect. |
EventSel=59H |
CoreOnly |
| BR_INST_EXEC |
Branch instructions that were executed (not necessarily retired). |
EventSel=88H |
CoreOnly |
| BR_MISSP_EXEC |
Branch instructions executed that were mispredicted at execution. |
EventSel=89H |
CoreOnly |
| BR_BAC_MISSP_EXEC |
Branch instructions executed that were mispredicted at front end (BAC). |
EventSel=8AH |
CoreOnly |
| BR_CND_EXEC |
Conditional branch instructions that were executed. |
EventSel=8BH |
CoreOnly |
| BR_CND_MISSP_EXEC |
Conditional branch instructions executed that were mispredicted. |
EventSel=8CH |
CoreOnly |
| BR_IND_EXEC |
Indirect branch instructions executed. |
EventSel=8DH |
CoreOnly |
| BR_IND_MISSP_EXEC |
Indirect branch instructions executed that were mispredicted. |
EventSel=8EH |
CoreOnly |
| BR_RET_EXEC |
Return branch instructions executed. |
EventSel=8FH |
CoreOnly |
| BR_RET_MISSP_EXEC |
Return branch instructions executed that were mispredicted at execution. |
EventSel=90H |
CoreOnly |
| BR_RET_BAC_MISSP_EXEC |
Return branch instructions executed that were mispredicted at front end (BAC). |
EventSel=91H |
CoreOnly |
| BR_CALL_EXEC |
CALL instruction executed. |
EventSel=92H |
CoreOnly |
| BR_CALL_MISSP_EXEC |
CALL instruction executed and miss predicted. |
EventSel=93H |
CoreOnly |
| BR_IND_CALL_EXEC |
Indirect CALL instructions executed. |
EventSel=94H |
CoreOnly |
| EMON_SIMD_INSTR_RETIRED |
Number of retired MMX instructions. |
EventSel=CEH |
CoreOnly |
| EMON_SYNCH_UOPS |
Sync micro-ops |
EventSel=D3H |
CoreOnly |
| EMON_ESP_UOPS |
Total number of micro-ops |
EventSel=D7H |
CoreOnly |
| EMON_FUSED_UOPS_RET |
Number of retired fused micro-ops: Mask = 0 - Fused micro-ops Mask = 1 - Only load+Op micro-ops Mask = 2 - Only std+sta micro-ops |
EventSel=DAH |
CoreOnly |
| EMON_UNFUSION |
Number of unfusion events in the ROB, happened on a FP exception to a fused µop. |
EventSel=DBH |
CoreOnly |
| EMON_PREF_RQSTS_UP |
Number of upward prefetches issued. |
EventSel=F0H |
CoreOnly |
| EMON_PREF_RQSTS_DN |
Number of downward prefetches issued. |
EventSel=F8H |
CoreOnly |
| CPU_CLK_UNHALTED |
Number of cycles during which the processor is not halted, and not in a thermal trip. |
EventSel=79H |
CoreOnly |
| EMON_SSE_SSE2_INST_RETIRED |
Streaming SIMD Extensions Instructions Retired: Mask = 0 - SSE packed single and scalar single Mask = 1 - SSE scalar-single Mask = 2 - SSE2 packed-double Mask = 3 - SSE2 scalar-double |
EventSel=D8H |
CoreOnly |
| EMON_SSE_SSE2_COMP_INST_RETIRED |
Computational SSE Instructions Retired: Mask = 0 - SSE packed single Mask = 1 - SSE Scalar-single Mask = 2 - SSE2 packed-double Mask = 3 - SSE2 scalar-double |
EventSel=D9H |
CoreOnly |
| L2_LD |
L2 data loads. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]: 00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only 02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines) |
EventSel=29H |
CoreOnly |
| L2_LINES_IN |
L2 lines allocated. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]: 00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only 02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines) |
EventSel=24H |
CoreOnly |
| L2_LINES_OUT |
L2 lines evicted. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]: 00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only 02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines) |
EventSel=26H |
CoreOnly |
| L2_ |
Lw M-state lines evicted. Mask[0] = 1 - count I state lines Mask[1] = 1 - count S state lines Mask[2] = 1 - count E state lines Mask[3] = 1 - count M state lines Mask[5:4]: 00H - Excluding hardware-prefetched lines 01H - Hardware-prefetched lines only 02H/03H - All (HW-prefetched lines and non HW -- Prefetched lines) |
EventSel=27H |
CoreOnly |
| _ |
All loads from any memory type. All stores to any memory type. Each part of a split is counted separately. The internal logic counts not only memory loads and stores, but also internal retries. 80-bit floating-point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64-bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performed, is only counted once). Does not include I/O accesses, or other nonmemory accesses. |
EventSel=43H,UMask=00H |
CoreOnly |
| _ |
Total lines allocated in DCU. |
EventSel=45H,UMask=00H |
CoreOnly |
| _ |
Number of M state lines allocated in DCU. |
EventSel=46H,UMask=00H |
CoreOnly |
| _ OUT |
Number of M state lines evicted from DCU. This includes evictions via snoop HITM, intervention or replacement. |
EventSel=47H,UMask=00H |
CoreOnly |
| _ OUTSTANDING |
Weighted number of cycles while a DCU miss is outstanding, incremented by the number of outstanding cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded. Read-for-ownerships are counted, as well as line fills, invalidates, and stores. An access that also misses the L2 is short-changed by 2 cycles (i.e., if counts N cycles, should be N+2 cycles). Subsequent loads to the same cache line will not result in any additional counts. Count value not precise, but still useful. |
EventSel=48H,UMask=00H |
CoreOnly |
| _ |
Number of instruction fetches, both cacheable and noncacheable, including UC fetches. |
EventSel=80H,UMask=00H |
CoreOnly |
| _ MISS |
Number of instruction fetch misses All instruction fetches that do not hit the IFU (i.e., that produce memory requests). This includes UC accesses. |
EventSel=81H,UMask=00H |
CoreOnly |
| _ |
Number of ITLB misses. |
EventSel=85H,UMask=00H |
CoreOnly |
| _ |
Number of cycles instruction fetch is stalled, for any reason. Includes IFU cache misses, ITLB misses, ITLB faults, and other minor stalls. |
EventSel=86H,UMask=00H |
CoreOnly |
| _ |
Number of cycles that the instruction length decoder is stalled. |
EventSel=87H,UMask=00H |
CoreOnly |
| L2_IFETCH |
Number of L2 instruction fetches. The count includes only L2 cacheable instruction fetches; it does not include UC instruction fetches. This event indicates that a normal instruction fetch was received by the L2. It does not include ITLB miss accesses. |
EventSel=28H,UMask=0FH |
CoreOnly |
| L2_LD |
Number of L2 data loads. This event indicates that a normal, unlocked, load memory access was received by the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses. |
EventSel=29H,UMask=0FH |
CoreOnly |
| L2_ST |
Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. it indicates that the DCU sent a read-for- ownership request to the L2. It also includes Invalid to Modified requests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses. It includes TLB miss memory accesses. |
EventSel=2AH,UMask=0FH |
CoreOnly |
| L2_ |
Number of lines allocated in the L2. |
EventSel=24H,UMask=00H |
CoreOnly |
| L2_ |
Number of lines removed from the L2 for any reason. |
EventSel=26H,UMask=00H |
CoreOnly |
| L2_ |
Number of modified lines allocated in the L2. |
EventSel=25H,UMask=00H |
CoreOnly |
| L2_ OUTM |
Number of modified lines removed from the L2 for any reason. |
EventSel=27H,UMask=00H |
CoreOnly |
| L2_RQSTS |
Total number of L2 requests. |
EventSel=2EH,UMask=0FH |
CoreOnly |
| L2_ADS |
Number of L2 address strobes. |
EventSel=21H,UMask=00H |
CoreOnly |
| L2_ |
Number of cycles during which the L2 cache data bus was busy. |
EventSel=22H,UMask=00H |
CoreOnly |
| L2_ RD |
Number of cycles during which the data bus was busy transferring read data from L2 to the processor. |
EventSel=23H,UMask=00H |
CoreOnly |
| _ CLOCKS |
Number of clocks during which DRDY# is asserted. Utilization of the external system data bus during data transfers. Unit Mask = 00H counts bus clocks when the processor is driving DRDY#. Unit Mask = 20H counts in processor clocks when any agent is driving DRDY#. |
EventSel=62H,UMask=00H |
CoreOnly |
| _ CLOCKS |
Number of clocks during which LOCK# is asserted on the external system bus. Always counts in processor clocks. |
EventSel=63H,UMask=00H |
CoreOnly |
| _ OUTSTANDING |
Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle. Counts only DCU full-line cacheable reads, not RFOs, writes, instruction fetches, or anything else. Counts "waiting for bus to complete" (last data chunk received). |
EventSel=60H,UMask=00H |
CoreOnly |
| _ |
Number of burst read transactions. |
EventSel=65H,UMask=00H |
CoreOnly |
| _ |
Number of completed read for ownership transactions. |
EventSel=66H,UMask=00H |
CoreOnly |
| _ |
Number of completed write back transactions. |
EventSel=67H,UMask=00H |
CoreOnly |
| _ IFETCH |
Number of completed instruction fetch transactions. |
EventSel=68H,UMask=00H |
CoreOnly |
| _ L |
Number of completed invalidate transactions. |
EventSel=69H,UMask=00H |
CoreOnly |
| _ |
Number of completed partial write transactions. |
EventSel=6AH,UMask=00H |
CoreOnly |
| _ |
Number of completed partial transactions. |
EventSel=6BH,UMask=00H |
CoreOnly |
| _ |
Number of completed I/O transactions. |
EventSel=6CH,UMask=00H |
CoreOnly |
| _ |
Number of completed deferred transactions. |
EventSel=6DH,UMask=00H |
CoreOnly |
| _ BURST |
Number of completed burst transactions. |
EventSel=6EH,UMask=00H |
CoreOnly |
| _ |
Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles, etc. |
EventSel=70H,UMask=00H |
CoreOnly |
| _ |
Number of completed memory transactions. |
EventSel=6FH,UMask=00H |
CoreOnly |
| _ |
Number of bus clock cycles during which this processor is receiving data. |
EventSel=64H,UMask=00H |
CoreOnly |
| _ |
Number of bus clock cycles during which this processor is driving the BNR# pin. |
EventSel=61H,UMask=00H |
CoreOnly |
| _ |
Number of bus clock cycles during which this processor is driving the HIT# pin. Includes cycles due to snoop stalls. The event counts correctly, but BPMi (breakpoint monitor) pins function as follows based on the setting of the PC bits (bit 19 in the PerfEvtSel0 and PerfEvtSel1 registers): • If the core-clock-to- bus-clock ratio is 2:1 or 3:1, and a PC bit is set, the BPMi pins will be asserted for a single clock when the counters overflow. • If the PC bit is clear, the processor toggles the BPMi pins when the counter overflows. • If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performance monitoring counter events. |
EventSel=7AH,UMask=00H |
CoreOnly |
| _ |
Number of bus clock cycles during which this processor is driving the HITM# pin. Includes cycles due to snoop stalls. The event counts correctly, but BPMi (breakpoint monitor) pins function as follows based on the setting of the PC bits (bit 19 in the PerfEvtSel0 and PerfEvtSel1 registers): • If the core-clock-to- bus-clock ratio is 2:1 or 3:1, and a PC bit is set, the BPMi pins will be asserted for a single clock when the counters overflow. • If the PC bit is clear, the processor toggles the BPMipins when the counter overflows. • If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performance monitoring counter events. |
EventSel=7BH,UMask=00H |
CoreOnly |
| _ STALL |
Number of clock cycles during which the bus is snoop stalled. |
EventSel=7EH,UMask=00H |
CoreOnly |
| FLOPS |
Number of computational floating-point operations retired. Excludes floating-point computational operations that cause traps or assists. Includes floating-point computational operations executed by the assist handler. Includes internal sub-operations for complex floating-point instructions like transcendentals. Excludes floating-point loads and stores. Counter 0 only. |
EventSel=C1H,UMask=00H |
CoreOnly |
| _ EXE |
Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This number does not include the number of cycles, but the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction. Counter 0 only. |
EventSel=10H,UMask=00H |
CoreOnly |
| _ |
Number of floating-point exception cases handled by microcode. Counter 1 only. This event includes counts due to speculative execution. |
EventSel=11H,UMask=00H |
CoreOnly |
| MUL |
Number of multiplies. This count includes integer as well as FP multiplies and is speculative. Counter 1 only. |
EventSel=12H,UMask=00H |
CoreOnly |
| DIV |
Number of divides. This count includes integer as well as FP divides and is speculative. Counter 1 only. |
EventSel=13H,UMask=00H |
CoreOnly |
| _ BUSY |
Number of cycles during which the divider is busy, and cannot accept new divides. This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative. Counter 0 only. |
EventSel=14H,UMask=00H |
CoreOnly |
| _ |
Number of load operations delayed due to store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load. |
EventSel=03H,UMask=00H |
CoreOnly |
| _ |
Number of store buffer drain cycles. Incremented every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing). |
EventSel=04H,UMask=00H |
CoreOnly |
| MISALIGN_ _ |
Number of misaligned data memory references. Incremented by 1 every cycle, during which either the processor's load or store pipeline dispatches a misaligned ?op. Counting is performed if it is the first or second half, or if it is blocked, squashed, or missed. In this context, misaligned means crossing a 64-bit boundary. _ REF is only an approximation to the true number of misaligned memory references. The value returned is roughly proportional to the number of misaligned memory accesses (the size of the problem). |
EventSel=05H,UMask=00H |
CoreOnly |
| _ _DISPATCHED |
Number of Streaming SIMD extensions prefetch/weakly-ordered instructions dispatched (speculative prefetches are included in counting): 0: prefetch NTA 1: prefetch T1 2: prefetch T2 3: weakly ordered stores Counters 0 and 1. Pentium III processor only. |
EventSel=07H,UMask=00H 01H 02H 03H |
CoreOnly |
| _ _MISS |
Number of prefetch/weakly-ordered instructions that miss all caches: 0: prefetch NTA 1: prefetch T1 2: prefetch T2 3: weakly ordered stores Counters 0 and 1. Pentium III processor only. |
EventSel=4BH,UMask=00H 01H 02H 03H |
CoreOnly |
| _ |
Number of instructions retired. A hardware interrupt received during/after the last iteration of the REP STOS flow causes the counter to undercount by 1 instruction. An SMI received while executing a HLT instruction will cause the performance counter to not count the RSM instruction and undercount by 1. |
EventSel=C0H,UMask=00H |
CoreOnly |
| _ |
Number of ?ops retired. |
EventSel=C2H,UMask=00H |
CoreOnly |
| _ |
Number of instructions decoded. |
EventSel=D0H,UMask=00H |
CoreOnly |
| _ RETIRED |
Number of Streaming SIMD extensions retired: 0: packed & scalar 1: scalar. Counters 0 and 1. Pentium III processor only. |
EventSel=D8H,UMask=00H 01H |
CoreOnly |
| _ COMP_ _ |
Number of Streaming SIMD extensions computation instructions retired: 0: packed and scalar 1: scalar Counters 0 and 1. Pentium III processor only. |
EventSel=D9H,UMask=00H 01H |
CoreOnly |
| _ |
Number of hardware interrupts received. |
EventSel=C8H,UMask=00H |
CoreOnly |
| _ MASKED |
Number of processor cycles for which interrupts are disabled. |
EventSel=C6H,UMask=00H |
CoreOnly |
| _ PENDING_ _ |
Number of processor cycles for which interrupts are disabled and interrupts are pending. |
EventSel=C7H,UMask=00H |
CoreOnly |
| _ RETIRED |
Number of branch instructions retired. |
EventSel=C4H,UMask=00H |
CoreOnly |
| _ RETIRED |
Number of mispredicted branches retired. |
EventSel=C5H,UMask=00H |
CoreOnly |
| _ RETIRED |
Number of taken branches retired. |
EventSel=C9H,UMask=00H |
CoreOnly |
| _ _ |
Number of taken mispredictions branches retired. |
EventSel=CAH,UMask=00H |
CoreOnly |
| _ DECODED |
Number of branch instructions decoded. |
EventSel=E0H,UMask=00H |
CoreOnly |
| _ |
Number of branches for which the BTB did not produce a prediction. |
EventSel=E2H,UMask=00H |
CoreOnly |
| _ |
Number of bogus branches. |
EventSel=E4H,UMask=00H |
CoreOnly |
| BACLEARS |
Number of times BACLEAR is asserted. This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not. |
EventSel=E6H,UMask=00H |
CoreOnly |
| RESOURCE_ STALLS |
Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations. |
EventSel=A2H,UMask=00H |
CoreOnly |
| _ STALLS |
Number of cycles or events for partial stalls. This includes flag partial stalls. |
EventSel=D2H,UMask=00H |
CoreOnly |
| _ LOADS |
Number of segment register loads. |
EventSel=06H,UMask=00H |
CoreOnly |
| _ UNHALTED |
Number of cycles during which the processor is not halted. |
EventSel=79H,UMask=00H |
CoreOnly |
| _ EXEC |
Number of MMX Instructions Executed. Available in Intel Celeron, Pentium II and Pentium II Xeon processors only. Does not account for MOVQ and MOVD stores from register to memory. |
EventSel=B0H,UMask=00H |
CoreOnly |
| _ _ |
Number of MMX Saturating Instructions Executed. Available in Pentium II and Pentium III processors only. |
EventSel=B1H,UMask=00H |
CoreOnly |
| _ EXEC |
Number of MMX ?ops Executed. Available in Pentium II and Pentium III processors only. |
EventSel=B2H,UMask=0FH |
CoreOnly |
| _ _ |
MMX packed multiply instructions executed. MMX packed shift instructions executed. MMX pack operation instructions executed. MMX unpack operation instructions executed. MMX packed logical instructions executed. MMX packed arithmetic instructions executed. Available in Pentium II and Pentium III processors only. |
EventSel=B3H,UMask=01H 02H 04H 08H 10H 20H |
CoreOnly |
| _ |
Transitions from MMX instruction to floating-point instructions. Transitions from floating-point instructions to MMX instructions. Available in Pentium II and Pentium III processors only. |
EventSel=CCH,UMask=00H 01H |
CoreOnly |
| _ |
Number of MMX Assists (that is, the number of EMMS instructions executed). Available in Pentium II and Pentium III processors only. |
EventSel=CDH,UMask=00H |
CoreOnly |
| _ |
Number of MMX Instructions Retired. Available in Pentium II processors only. |
EventSel=CEH,UMask=00H |
CoreOnly |
| _ STALLS |
Number of Segment Register Renaming Stalls: Available in Pentium II and Pentium III processors only. |
EventSel=D4H |
CoreOnly |
| _ STALLS |
Segment register ES |
EventSel=D4H,UMask=02H |
CoreOnly |
| _ STALLS |
Segment register DS |
EventSel=D4H,UMask=04H |
CoreOnly |
| _ STALLS |
Segment register FS |
EventSel=D4H,UMask=08H |
CoreOnly |
| _ STALLS |
Segment register FS |
EventSel=D4H,UMask=0FH |
CoreOnly |
| _ RENAMES |
Number of Segment Register Renames: Available in Pentium II and Pentium III processors only. |
EventSel=D5H |
CoreOnly |
|
Segment register ES |
EventSel=D5H,UMask=01H |
CoreOnly |
|
Segment register DS |
EventSel=D5H,UMask=02H |
CoreOnly |
|
Segment register FS |
EventSel=D5H,UMask=04H |
CoreOnly |
|
Segment register FS |
EventSel=D5H,UMask=08H |
CoreOnly |
| _ RENAMES |
Number of segment register rename events retired. Available in Pentium II and Pentium III processors only. |
EventSel=D6H,UMask=00H |
CoreOnly |
| _ |
Number of memory data reads (internal data cache hit and miss combined). Split cycle reads are counted individually. Data Memory Reads that are part of TLB miss processing are not included. These events may occur at a maximum of two per clock. I/O is not included. |
EventSel=00H |
CoreOnly |
| _ |
Number of memory data writes (internal data cache hit and miss combined); I/O not included. Split cycle writes are counted individually. These events may occur at a maximum of two per clock. I/O is not included. |
EventSel=01H |
CoreOnly |
| _ |
Number of misses to the data cache translation look-aside buffer. |
EventSel=0H2 |
CoreOnly |
| _ |
Number of memory read accesses that miss the internal data cache whether or not the access is cacheable or noncacheable. Additional reads to the same cache line after the first BRDY# of the burst line fill is returned but before the final (fourth) BRDY# has been returned, will not cause the counter to be incremented additional times. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included. |
EventSel=03H |
CoreOnly |
| DATA WRITE MISS |
Number of memory write accesses that miss the internal data cache whether or not the access is cacheable or noncacheable. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included. |
EventSel=04H |
CoreOnly |
| _ M-_- _ |
Number of write hits to exclusive or modified lines in the data cache. These are the writes that may be held up if EWBE# is inactive. These events may occur a maximum of two per clock. |
EventSel=05H |
CoreOnly |
| _ LINES_ _ |
Number of dirty lines (all) that are written back, regardless of the cause. Replacements and internal and external snoops can all cause writeback and are counted. |
EventSel=06H |
CoreOnly |
| EXTERNAL_ SNOOPS |
Number of accepted external snoops whether they hit in the code cache or data cache or neither. Assertions of EADS# outside of the sampling interval are not counted, and no internal snoops are counted. |
EventSel=07H |
CoreOnly |
| _ _ HITS |
Number of external snoops to the data cache. Snoop hits to a valid line in either the data cache, the data line fill buffer, or one of the write back buffers are all counted as hits. |
EventSel=08H |
CoreOnly |
| MEMORY ACCESSES IN BOTH PIPES |
Number of data memory reads or writes that are paired in both pipes of the pipeline. These accesses are not necessarily run in parallel due to cache misses, bank conflicts, etc. |
EventSel=09H |
CoreOnly |
| BANK CONFLICTS |
Number of actual bank conflicts. |
EventSel=0AH |
CoreOnly |
| MISALIGNED DATA MEMORY OR I/O REFERENCES |
Number of memory or I/O reads or writes that are misaligned. A 2- or 4-byte access is misaligned when it crosses a 4- byte boundary; an 8-byte access is misaligned when it crosses an 8-byte boundary. Ten byte accesses are treated as two separate accesses of 8 and 2 bytes each. |
EventSel=0BH |
CoreOnly |
| CODE READ |
Number of instruction reads; whether the read is cacheable or noncacheable. Individual 8-byte noncacheable instruction reads are counted. |
EventSel=0CH |
CoreOnly |
| CODE TLB MISS |
Number of instruction reads that miss the code TLB whether the read is cacheable or noncacheable. Individual 8-byte noncacheable instruction reads are counted. |
EventSel=0DH |
CoreOnly |
| CODE CACHE MISS |
Number of instruction reads that miss the internal code cache; whether the read is cacheable or noncacheable. Individual 8-byte noncacheable instruction reads are counted. |
EventSel=0EH |
CoreOnly |
| ANY SEGMENT REGISTER LOADED |
Number of writes into any segment register in real or protected mode including the LDTR, GDTR, IDTR, and TR. Segment loads are caused by explicit segment register load instructions, far control transfers, and task switches. Far control transfers and task switches causing a privilege level change will signal this event twice. Interrupts and exceptions may initiate a far control transfer. |
EventSel=0FH |
CoreOnly |
| Reserved |
|
EventSel=10H |
CoreOnly |
| Reserved |
|
EventSel=11H |
CoreOnly |
| Branches |
Number of taken and not taken branches, including: conditional branches, jumps, calls, returns, software interrupts, and interrupt returns. Also counted as taken branches are serializing instructions, VERR and VERW instructions, some segment descriptor loads, hardware interrupts (including FLUSH#), and programmatic exceptions that invoke a trap or fault handler. The pipe is not necessarily flushed. The number of branches actually executed is measured, not the number of predicted branches. |
EventSel=12H |
CoreOnly |
| _ |
Number of BTB hits that occur. Hits are counted only for those instructions that are actually executed. |
EventSel=13H |
CoreOnly |
| _ _ |
Number of taken branches or BTB hits that occur. This event type is a logical OR of taken branches and BTB hits. It represents an event that may cause a hit in the BTB. Specifically, it is either a candidate for a space in the BTB or it is already in the BTB. |
EventSel=14H |
CoreOnly |
| PIPELINE FLUSHES |
Number of pipeline flushes that occur Pipeline flushes are caused by BTB misses on taken branches, mispredictions, exceptions, interrupts, and some segment descriptor loads. The counter will not be incremented for serializing instructions (serializing instructions cause the prefetch queue to be flushed but will not trigger the Pipeline Flushed event counter) and software interrupts (software interrupts do not flush the pipeline). |
EventSel=15H |
CoreOnly |
| INSTRUCTIONS_ EXECUTED |
Number of instructions executed (up to two per clock). Invocations of a fault handler are considered instructions. All hardware and software interrupts and exceptions will also cause the count to be incremented. Repeat prefixed string instructions will only increment this counter once despite the fact that the repeat loop executes the same instruction multiple times until the loop criteria is satisfied. This applies to all the Repeat string instruction prefixes (i.e., REP, REPE, REPZ, REPNE, and REPNZ). This counter will also only increment once per each HLT instruction executed regardless of how many cycles the processor remains in the HALT state. |
EventSel=16H |
CoreOnly |
| INSTRUCTIONS_EXECUTED_V_PIPE |
Number of instructions executed in the V-pipe. The event indicates the number of instructions that were paired. This event is the same as the 16H event except it only counts the number of instructions actually executed in the V-pipe. |
EventSel=17H |
CoreOnly |
| _ DURATION |
Number of clocks while a bus cycle is in progress. This event measures bus use. The count includes HLDA, AHOLD, and BOFF# clocks. |
EventSel=18H |
CoreOnly |
| _ _ DURATION |
Number of clocks while the pipeline is stalled due to full write buffers. Full write buffers stall data memory read misses, data memory write misses, and data memory write hits to S- state lines. Stalls on I/O accesses are not included. |
EventSel=19H |
CoreOnly |
| _ _ _ DURATION |
Number of clocks while the pipeline is stalled while waiting for data memory reads. Data TLB Miss processing is also included in the count. The pipeline stalls while a data memory read is in progress including attempts to read that are not bypassed while a line is being filled. |
EventSel=1AH |
CoreOnly |
| STALL ON WRITE TO AN E- OR M- STATE LINE |
Number of stalls on writes to E- or M- state lines. |
EventSel=1BH |
CoreOnly |
| LOCKED BUS CYCLE |
Number of locked bus cycles that occur as the result of the LOCK prefix or LOCK instruction, page-table updates, and descriptor table updates. Only the read portion of the locked read-modify-write is counted. Split locked cycles (SCYC active) count as two separate accesses. Cycles restarted due to BOFF# are not re-counted. |
EventSel=1CH |
CoreOnly |
| I/O READ OR WRITE CYCLE |
Number of bus cycles directed to I/O space. Misaligned I/O accesses will generate two bus cycles. Bus cycles restarted due to BOFF# are not re-counted. |
EventSel=1DH |
CoreOnly |
| NONCACHEABLE_ _ |
Number of noncacheable instruction or data memory read bus cycles. The count includes read cycles caused by TLB misses, but does not include read cycles to I/O space. Cycles restarted due to BOFF# are not re-counted. |
EventSel=1EH |
CoreOnly |
| _ STALLS |
Number of address generation interlock (AGI) stalls. An AGI occurring in both the U- and V- pipelines in the same clock signals this event twice. An AGI occurs when the instruction in the execute stage of either of U- or V-pipelines is writing to either the index or base address register of an instruction in the D2 (address generation) stage of either the U- or V- pipelines. |
EventSel=1FH |
CoreOnly |
| Reserved |
|
EventSel=20H |
CoreOnly |
| Reserved |
|
EventSel=21H |
CoreOnly |
| FLOPS |
Number of floating-point operations that occur. Number of floating-point adds, subtracts, multiplies, divides, remainders, and square roots are counted. The transcendental instructions consist of multiple adds and multiplies and will signal this event multiple times. Instructions generating the divide-by-zero, negative square root, special operand, or stack exceptions will not be counted. Instructions generating all other floating-point exceptions will be counted. The integer multiply instructions and other instructions which use the x87 FPU will be counted. |
EventSel=22H |
CoreOnly |
| BREAKPOINT MATCH ON DR0 REGISTER |
Number of matches on register DR0 breakpoint. The counters is incremented regardless if the breakpoints are enabled or not. However, if breakpoints are not enabled, code breakpoint matches will not be checked for instructions executed in the V-pipe and will not cause this counter to be incremented. (They are checked on instruction executed in the U-pipe only when breakpoints are not enabled.) These events correspond to the signals driven on the BP[3:0] pins. Refer to Chapter 17, "Debug, Branch Profile, TSC, and Intel® Resource Director Technology (Intel® RDT) Features" for more information. |
EventSel=23H |
CoreOnly |
| BREAKPOINT MATCH ON DR1 REGISTER |
Number of matches on register DR1 breakpoint. See comment for 23H event. |
EventSel=24H |
CoreOnly |
| BREAKPOINT MATCH ON DR2 REGISTER |
Number of matches on register DR2 breakpoint. See comment for 23H event. |
EventSel=25H |
CoreOnly |
| BREAKPOINT MATCH ON DR3 REGISTER |
Number of matches on register DR3 breakpoint. See comment for 23H event. |
EventSel=26H |
CoreOnly |
| HARDWARE INTERRUPTS |
Number of taken INTR and NMI interrupts. |
EventSel=27H |
CoreOnly |
| _ WRITE |
Number of memory data reads and/or writes (internal data cache hit and miss combined). Split cycle reads and writes are counted individually. Data Memory Reads that are part of TLB miss processing are not included. These events may occur at a maximum of two per clock. I/O is not included. |
EventSel=28H |
CoreOnly |
| _ _ MISS |
Number of memory read and/or write accesses that miss the internal data cache, whether or not the access is cacheable or noncacheable. Additional reads to the same cache line after the first BRDY# of the burst line fill is returned but before the final (fourth) BRDY# has been returned, will not cause the counter to be incremented additional times. Data accesses that are part of TLB miss processing are not included. Accesses directed to I/O space are not included. |
EventSel=29H |
CoreOnly |
| _ LATENCY (Counter 0) |
The time from LRM bus ownership request to bus ownership granted (that is, the time from the earlier of a PBREQ (0), PHITM# or HITM# assertion to a PBGNT assertion) The ratio of the 2AH events counted on counter 0 and counter 1 is the average stall time due to bus ownership conflict. |
EventSel=2AH |
CoreOnly |
| BUS OWNERSHIP TRANSFERS (Counter 1) |
The number of buss ownership transfers (that is, the number of PBREQ (0) assertions The ratio of the 2AH events counted on counter 0 and counter 1 is the average stall time due to bus ownership conflict. |
EventSel=2AH |
CoreOnly |
| MMX_INSTRUCTIONS_EXECUTED_U_PIPE (Counter 0) |
Number of MMX instructions executed in the U-pipe |
EventSel=2BH |
CoreOnly |
| MMX_INSTRUCTIONS_EXECUTED_V_PIPE (Counter 1) |
Number of MMX instructions executed in the V-pipe |
EventSel=2BH |
CoreOnly |
| _- _ SHARING (Counter 0) |
Number of times a processor identified a hit to a modified line due to a memory access in the other processor (PHITM (O)) If the average memory latencies of the system are known, this event enables the user to count the Write Backs on PHITM(O) penalty and the Latency on Hit Modified(I) penalty. |
EventSel=2CH |
CoreOnly |
| _ SHARING (Counter 1) |
Number of shared data lines in the L1 cache (PHIT (O)) |
EventSel=2CH |
CoreOnly |
| EMMS_INSTRUCTIONS_EXECUTED (Counter 0) |
Number of EMMS instructions executed |
EventSel=2DH |
CoreOnly |
| TRANSITIONS_ _ _ INSTRUCTIONS (Counter 1) |
Number of transitions between MMX and floating-point instructions or vice versa An even count indicates the processor is in MMX state. an odd count indicates it is in FP state. This event counts the first floating-point instruction following an MMX instruction or first MMX instruction following a floating-point instruction. The count may be used to estimate the penalty in transitions between floating-point state and MMX state. |
EventSel=2DH |
CoreOnly |
| _ _ PROCESSOR_ ACTIVITY (Counter 0) |
Number of clocks the bus is busy due to the processor's own activity (the bus activity that is caused by the processor) |
EventSel=2EH |
CoreOnly |
| _ NONCACHEABLE_ MEMORY (Counter 1) |
Number of write accesses to noncacheable memory The count includes write cycles caused by TLB misses and I/O write cycles. Cycles restarted due to BOFF# are not re-counted. |
EventSel=2EH |
CoreOnly |
| SATURATING_MMX_INSTRUCTIONS_EXECUTED (Counter 0) |
Number of saturating MMX instructions executed, independently of whether they actually saturated. |
EventSel=2FH |
CoreOnly |
| SATURATIONS_ PERFORMED (Counter 1) |
Number of MMX instructions that used saturating arithmetic when at least one of its results actually saturated If an MMX instruction operating on 4 doublewords saturated in three out of the four results, the counter will be incremented by one only. |
EventSel=2FH |
CoreOnly |
| _ _ _ (Counter 0) |
Number of cycles the processor is not idle due to HLT instruction This event will enable the user to calculate "net CPI". Note that during the time that the processor is executing the HLT instruction, the Time-Stamp Counter is not disabled. Since this event is controlled by the Counter Controls CC0, CC1 it can be used to calculate the CPI at CPL=3, which the TSC cannot provide. |
EventSel=30H |
CoreOnly |
| _ _ _ (Counter 1) |
Number of clocks the pipeline is stalled due to a data cache translation look- aside buffer (TLB) miss |
EventSel=30H |
CoreOnly |
| MMX_INSTRUCTION_DATA_READS (Counter 0) |
Number of MMX instruction data reads |
EventSel=31H |
CoreOnly |
| MMX_INSTRUCTION_DATA_READ_MISSES (Counter 1) |
Number of MMX instruction data read misses |
EventSel=31H |
CoreOnly |
| _ _ (Counter 0) |
Number of clocks while pipe is stalled due to a floating-point freeze |
EventSel=32H |
CoreOnly |
| _ (Counter 1) |
Number of taken branches |
EventSel=32H |
CoreOnly |
| D1_STARVATION_ _ EMPTY (Counter 0) |
Number of times D1 stage cannot issue ANY instructions since the FIFO buffer is empty The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer. |
EventSel=33H |
CoreOnly |
| D1_STARVATION_ _ _ FIFO (Counter 1) |
Number of times the D1 stage issues a single instruction (since the FIFO buffer had just one instruction ready) The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer. When combined with the previously defined events, Instruction Executed (16H) and Instruction Executed in the V-pipe (17H), this event enables the user to calculate the numbers of time pairing rules prevented issuing of two instructions. |
EventSel=33H |
CoreOnly |
| MMX_INSTRUCTION_DATA_WRITES (Counter 0) |
Number of data writes caused by MMX instructions |
EventSel=34H |
CoreOnly |
| MMX_INSTRUCTION_DATA_WRITE_MISSES (Counter 1) |
Number of data write misses caused by MMX instructions |
EventSel=34H |
CoreOnly |
| PIPELINE_FLUSHES_DUE_TO_WRONG_BRANCH_PREDICTIONS (Counter 0) |
Number of pipeline flushes due to wrong branch predictions resolved in either the E-stage or the WB-stage The count includes any pipeline flush due to a branch that the pipeline did not follow correctly. It includes cases where a branch was not in the BTB, cases where a branch was in the BTB but was mispredicted, and cases where a branch was correctly predicted but to the wrong address. Branches are resolved in either the Execute stage (E-stage) or the Writeback stage (WB-stage). In the later case, the misprediction penalty is larger by one clock. The difference between the 35H event count in counter 0 and counter 1 is the number of E-stage resolved branches. |
EventSel=35H |
CoreOnly |
| PIPELINE_FLUSHES_DUE_TO_WRONG_BRANCH_PREDICTIONS_RESOLVED_IN_WB_STAGE (Counter 1) |
Number of pipeline flushes due to wrong branch predictions resolved in the WB-stage See note for event 35H (Counter 0). |
EventSel=35H |
CoreOnly |
| MISALIGNED_DATA_MEMORY_REFERENCE_ON_MMX_INSTRUCTIONS (Counter 0) |
Number of misaligned data memory references when executing MMX instructions |
EventSel=36H |
CoreOnly |
| PIPELINE_STALL_FOR_MMX_INSTRUCTION_DATA_MEMORY_READS (Counter 1) |
Number clocks during pipeline stalls caused by waits form MMX instruction data memory reads T1: |
EventSel=36H |
CoreOnly |
| MISPREDICTED_OR_UNPREDICTED_RETURNS (Counter 1) |
Number of returns predicted incorrectly or not predicted at all The count is the difference between the total number of executed returns and the number of returns that were correctly predicted. Only RET instructions are counted (for example, IRET instructions are not counted). |
EventSel=37H |
CoreOnly |
| PREDICTED_ RETURNS (Counter 1) |
Number of predicted returns (whether they are predicted correctly and incorrectly Only RET instructions are counted (for example, IRET instructions are not counted). |
EventSel=37H |
CoreOnly |
| _ _ (Counter 0) |
Number of clocks the pipe is stalled since the destination of previous MMX multiply instruction is not ready yet The counter will not be incremented if there is another cause for a stall. For each occurrence of a multiply interlock, this event will be counted twice (if the stalled instruction comes on the next clock after the multiply) or by once (if the stalled instruction comes two clocks after the multiply). |
EventSel=38H |
CoreOnly |
| MOVD/MOVQ_ _ _ _ OPERATION (Counter 1) |
Number of clocks a MOVD/MOVQ instruction store is stalled in D2 stage due to a previous MMX operation with a destination to be used in the store instruction. |
EventSel=38H |
CoreOnly |
| RETURNS (Counter 0) |
Number or returns executed. Only RET instructions are counted; IRET instructions are not counted. Any exception taken on a RET instruction and any interrupt recognized by the processor on the instruction boundary prior to the execution of the RET instruction will also cause this counter to be incremented. |
EventSel=39H |
CoreOnly |
| Reserved |
|
EventSel=39H |
CoreOnly |
| _ ENTRIES (Counter 0) |
Number of false entries in the Branch Target Buffer False entries are causes for misprediction other than a wrong prediction. |
EventSel=3AH |
CoreOnly |
| _ _ NOT-TAKEN_ BRANCH (Counter 1) |
Number of times the BTB predicted a not-taken branch as taken |
EventSel=3AH |
CoreOnly |
| FULL_WRITE_BUFFER_STALL_DURATION_WHILE_EXECUTING_MMX_INSTRUCTIONS (Counter 0) |
Number of clocks while the pipeline is stalled due to full write buffers while executing MMX instructions |
EventSel=3BH |
CoreOnly |
| STALL_ON_MMX_INSTRUCTION_WRITE_TO_E_OR_M_STATE_LINE |
Number of clocks during stalls on MMX instructions writing to E- or M-state lines |
EventSel=3BH |
CoreOnly |