ST6x86 P90+, P100+, P120+, P133+, P150+, P166+ 80 to 150 MHz 3.52 Volt ST6x86 CPU PRELIMINARY DATA Sixth-Generation Superscalar Superpipelined Architecture - Dual 7-stage integer pipelines - High performance on-chip FPU with 64-bit interface - Operating frequencies of 80 MHz and above - 16-KByte write-back cache Optimum Performance for Windows95 - Intelligent instruction dispatch - Out-of-order completion - Register renaming - Data forwarding - Branch prediction - Speculative execution X86 Instruction Set Compatible - Runs Windows 95, Windows NT, DOS, UNIX, Novell, OS/2, Solaris, and others 64-Bit Data Bus - P54C socket compatible for quick time to market. The ST6x86 microprocessor is a superscalar, superpipelined CPU that provides sixth-generation performance for x86 software. Since the ST6x86 CPU is fully compatible with the x86 instruction set, it is capable of executing a wide range of existing operating systems and applications, including Windows 95, DOS, Unix, Windows NT, Novell, OS/2, and Solaris. The ST6x86 CPU achieves top performance levels through the use of two optimized superpipelined integer units and an on-chip floating point unit. The superpipelined architecture reduces timing constraints and allows the ST6x86 CPU to operate at core frequencies starting at 80 MHz. In addition, the ST6x86 CPU's integer and floating point units are optimized for maximum instruction throughput by using advanced architectural techniques, including register renaming, out-of-order execution, data forwarding, branch prediction, and speculative execution. These design innovations eliminate many data dependencies and resource conflicts that provide optimum performance for Windows 95 software. BLOCK DIAGRAM May 1996 This is advanced information on a new product now in development or undergoing evaluation. Details are subject to change without notice. ST6x86 ii GENERAL INDEX Chapter Page 1.0 ARCHITECTURE OVERVIEW 1.1 1.2 1.3 1.4 1.5 1.6 Major Functional Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Integer Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 Cache Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 Memory Management Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 Floating Point Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 Bus Interface Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 2.0 PROGRAMMING INTERFACE 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 Processor Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 Instruction Set Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Register Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 System Register Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11 Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40 Memory Addressing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-41 Memory Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-52 Interrupts and Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-55 System Management Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63 Shutdown and Halt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-69 Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-71 Virtual 8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-74 Floating Point Unit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-75 3.0 ST6x86 BUS INTERFACE 3.1 3.2 3.3 Signal Description Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Signal Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Functional Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-24 4.0 ELECTRICAL SPECIFICATIONS 4.1 4.2 4.3 4.4 4.5 Electrical Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 Absolute Maximum Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Recommended Operating Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 DC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 AC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 5.0 MECHANICAL SPECIFICATIONS 5.1 5.2 296-Pin SPGA Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Thermal Resistances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 6.0 INSTRUCTION SET 6.1 6.2 6.3 6.4 6.5 Instruction Set Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 General Instruction Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 CPUID Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 Instruction Set Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 FPU Clock Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-40 iii LIST OF FIGURES Figure Page Figure 1.1. Integer Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 Figure 1.2. Cache Unit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13 Figure 1.3. Paging Mechanism within the Memory Management Unit . . . . . . . . . . . . . . . . . . . . . . . . 1-14 Figure 2.1. Application Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 Figure 2.2. General Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 Figure 2.3. Segment Selector in Protected Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Figure 2.4. EFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 Figure 2.5. System Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 Figure 2.6. Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13 Figure 2.7. Descriptor Table Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 Figure 2.8. Application and System Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Figure 2.9. Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19 Figure 2.10. Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20 Figure 2.11. 32-Bit Task State Segment (TSS) Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21 Figure 2.12. 16-Bit Task State Segment (TSS) Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22 Figure 2.13. ST6x86 Configuration Control Register 0 (CCRO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25 Figure 2.14. ST6x86 Configuration Control Register 1 (CCR1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26 Figure 2.15. ST6x86 Configuration Control Register 2 (CCR2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27 Figure 2.16. ST6x86 Configuration Control Register 3 (CCR3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28 Figure 2.17. ST6x86 Configuration Control Register 4 (CCR4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29 Figure 2.18. ST6x86 Configuration Control Register 5 (CCR5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30 Figure 2.19. Address Region Registers (ARR0 - ARR7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31 Figure 2.20. Region Control Registers (RCR0-RCR7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34 Figure 2.21. Device Identification Register 0 (DIR0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36 Figure 2.22. Device Identification Register 1 (DIR1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36 Figure 2.23. Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37 Figure 2.24. Memory and I/O Address Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40 Figure 2.25. Offset Address Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42 Figure 2.26. Real Mode Address Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-43 Figure 2.27. Protected Mode Address Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-43 Figure 2.28. Selector Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44 Figure 2.29. Paging Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45 Figure 2.30. Traditional Paging Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-46 Figure 2.31. Directory and Page Table Entry (DTE and PTE) Format . . . . . . . . . . . . . . . . . . . . . . . . 2-46 Figure 2.32. TLB Test Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48 Figure 2.33. Variable-Size Paging Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51 Figure 2.34. Unified Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-53 Figure 2.35. Cache Test Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-54 Figure 2.36. Error Code Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-62 Figure 2.37. System Management Memory Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63 Figure 2.38. SMI Execution Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-64 Figure 2.39. SMM Memory Space Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-65 Figure 2.40. SMM and Suspend Mode State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-70 Figure 2.41. FPU Tag Word Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-76 Figure 2.42. FPU Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-76 Figure 2.43. FPU Mode Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-77 Figure 3.1. ST6x86 CPU Functional Signal Groupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 Figure 3.2. RESET Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-24 Figure 3.3. ST6x86 CPU Bus State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26 iv LIST OF FIGURES Figure Page Figure 3.4. Non-Pipelined Single Transfer Read Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29 Figure 3.5. Non-Pipelined Single Transfer Write Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 Figure 3.6. Non-Pipelined Burst Read Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-32 Figure 3.7. Burst Cycle with Wait States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-33 Figure 3.8. "1+4" Burst Read Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-34 Figure 3.9. Non-Pipelined Burst Write Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-36 Figure 3.10. Pipelined Single Transfer Read Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-37 Figure 3.11. Pipelined Burst Read Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-38 Figure 3.12. Read Cycle Followed by Pipelined Write Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39 Figure 3.13. Interrupt Acknowledge Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40 Figure 3.14. SMIACT# Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-41 Figure 3.15. SMM I/O Trap Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-42 Figure 3.16. Cache Invalidation Using FLUSH# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-43 Figure 3.17 External Write Buffer Empty (EWBE#) Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-44 Figure 3.18. Requesting Hold from an Idle Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-45 Figure 3.19. Requesting Hold During a Non-Pipelined Bus Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-46 Figure 3.20. Requesting Hold During a Pipelined Bus Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-47 Figure 3.21. Back-Off Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-48 Figure 3.22. HOLD Inquiry Cycle that Hits on a Modified Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-50 Figure 3.23. BOFF# Inquiry Cycle that Hits on a Modified Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-51 Figure 3.24. AHOLD Inquiry Cycle that Hits on a Modified Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-52 Figure 3.25. AHOLD Inquiry Cycle During a Line Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-53 Figure 3.26. APCHK# Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-54 Figure 3.27. BHOLD and DHOLD Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-55 Figure 3.28. CPU Upper Byte Read from 32-Bit Bus Using Scatter/Gather . . . . . . . . . . . . . . . . . . . . 3-56 Figure 3.29. CPU Upper Byte Write to 32-Bit Bus Using Scatter/Gather . . . . . . . . . . . . . . . . . . . . . . 3-57 Figure 3.30. Bus Master Read from 64-Bit Memory to 32-Bit Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-58 Figure 3.31. Bus Master Write to 64-Bit Memory from 32-Bit Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-59 Figure 3.32. SUSP# Initiated Suspend Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-60 Figure 3.33. HALT Initiated Suspend Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-61 Figure 3.34. Stopping CLK During Suspend Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-62 Figure 4.1. Drive Level and Measurement Points for Switching Characteristics . . . . . . . . . . . . . . . . . 4-6 Figure 4.2. CLK Timing and Measurement Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Figure 4.3. Output Valid Delay Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 Figure 4.4. Output Float Delay Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 Figure 4.5. Input Setup and Hold Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 Figure 4.6. TCK Timing and Measurement Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12 Figure 4.7. JTAG Test Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 Figure 4.8. Test Reset Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13 Figure 5.1. 296-Pin SPGA Package Pin Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Figure 5.2. 296-Pin SPGA Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 Figure 5.3 Typical HeatSink/Fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 Figure 6.1. Instruction Set Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 v LIST OF TABLES Table Page Table 1.1. Register Renaming with WAR Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 Table 1.2. Register Renaming with WAW Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 Table 1.3. Example of Operand Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 Table 1.4. Result Forwarding Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9 Table 1.5. Example of Data Bypassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Table 2.1. Initialized Register Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Table 2.2. Segment Register Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 Table 2.3. EFLAGS Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 Table 2.4. CRO Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 Table 2.5. Effects of Various Combinations of EM, TS, and MP Bits . . . . . . . . . . . . . . . . . . . . . . . . . 2-14 Table 2.6 Descriptor Table Registers and Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 Table 2.7. Segment Descriptor Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 Table 2.8. TYPE Field Definitions with DT = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 Table 2.9. TYPE Field Definitions with DT = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18 Table 2.10. Gate Descriptor Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19 Table 2.11. ST6x86 CPU Configuration Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24 Table 2.12. CCRO Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25 Table 2.13. CCR1 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26 Table 2.14. CCR2 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27 Table 2.15. CCR3 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28 Table 2.16. CCR4 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29 Table 2.17. CCR5 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30 Table 2.18. ARR0 - ARR7 Register Index Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32 Table 2.19. Bit Definitions for SIZE Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32 Table 2.20. RCR0-RCR7 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34 Table 2.21. DiR0 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36 Table 2.22. DIR 1 Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36 Table 2.23. DR6 and DR7 Debug Register Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-38 Table 2.24. Memory Addressing Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42 Table 2.25. Directory and Page Table Entry (DTE and PTE) Bit Definitions . . . . . . . . . . . . . . . . . . . 2-47 Table 2.26. TLB Test Register Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49 Table 2.27. TR6 Attribute Bit Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50 Table 2.28. TR6 Command Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50 Table 2.29. Cache Test Register Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-54 Table 2.30. Interrupt Vector Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-58 Table 2.31. Interrupt and Exception Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-60 Table 2.32. Exception Changes in Real Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-61 Table 2.33. Error Code Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-62 Table 2.34. Requirements for Recognizing SMI# and SMINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-64 Table 2.35. SMM Memory Space Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-66 Table 2.36. SMM Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-67 Table 2.37. Descriptor Types Used for Control Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-73 Table 2.38. FPU Status Register Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-76 Table 2.39. FPU Mode Control Register Bit Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-77 Table 3.1. ST6x86 CPU Signals Sorted by Signal Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Table 3.2. Pins Sampled During RESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6 Table 3.3. Signal States During RESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Table 3.4. Byte Enable Signal to Data Bus Byte Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 Table 3.5. Parity Bit to Data Byte Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9 vi LIST OF TABLES Table Page Table 3.6. Bus Cycle Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11 Table 3.7. Effects of WB/WT# on Cache Line State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Table 3.8. Signal States During Bus Hold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16 Table 3.9. Scatter/Gather Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19 Table 3.10. Byte Enable Map for Scatter/Gather Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 Table 3.11. Signal States During Suspend Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22 Table 3.12. ST6x86 CPU Bus States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 Table 3.13. Bus State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27 Table 3.14. "1+4" Burst Address Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-34 Table 3.15. Linear Burst Address Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-35 Table 4.1. Pins Connected to Internal Pull-Up and Pull Down Resistors . . . . . . . . . . . . . . . . . . . . . . . 4-1 Table 4.2. Absolute Maximum Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 Table 4.3. Recommended Operating Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 Table 4.4. DC Characteristics (at Recommended Operating Conditions) . . . . . . . . . . . . . . . . . . . . . . 4-4 Table 4.5. Drive Level and Measurement Points for Switching Characteristics . . . . . . . . . . . . . . . . . . 4-6 Table 4.6. Clock Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Table 4.7. Output Valid Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 Table 4.8. Output Float Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9 Table 4.9. Input Setup Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 Table 4.10. Input Hold Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 Table 4.11. JTAG AC Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12 Table 5.1. 296-Pin SPGA Package Signal Names Sorted by Pin Number . . . . . . . . . . . . . . . . . . . . . 5-2 Table 5.2. 296-Pin SPGA Package Pin Numbers Sorted by Signal Name . . . . . . . . . . . . . . . . . . . . . 5-3 Table 5.3. 296-Pin SPGA Package Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 Table 5.4 Required CA to Maintain 70C Case Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 Table 5.5 Heatsink/Fan Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 Table 6.1. Instruction Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 Table 6.2. Instruction Prefix Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 Table 6.3. w Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 Table 6.4. d Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 Table 6.5. s Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 Table 6.6. eee Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 Table 6.7. mod r/m Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6 Table 6.8. mod r/m Field Encoding Dependent on w Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 Table 6.9. reg Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 Table 6.10. sreg3 Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 Table 6.11. sreg2 Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 Table 6.12. ss Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Table 6.13. Index Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Table 6.14. mod base Field Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 Table 6.15. CPUID Data Returned When EAX = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 Table 6.16. CPUID Data Returned When EAX = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11 Table 6.17. CPU Clock Count Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Table 6.18. Flag Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Table 6.19. Action of Instruction on Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14 Table 6.21 FPU Clock Count Table Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-40 Table 6.22 6x86 FPU Instruction Set Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-41 vii ST6x86 1.0 ARCHITECTURE OVERVIEW The ST6x86 CPU is a leader in the sixth generation of high performance, x86-compatible microprocessors. Increased performance is accomplished by the use of superscalar and superpipelined design techniques. The ST6x86 CPU is superscalar in that it contains two separate pipelines that allow multiple instructions to be processed at the same time. The use of advanced processing technology and the increased number of pipeline stages (s uperpi pel i ni ng) allow the ST6x86 CPU to achieve clocks rates of 80 MHz and above. Through the use of unique architectural features, the ST6x86 processor eliminates many data dependencies and resource conflicts, resulting in optimal performance for both 16-bit and 32-bit x86 software. The ST6x86 CPU contains two caches: a 16-KByte dual-ported unified cache and a 256-byte instruction line cache. Since the unified cache can store instructions and data in any ratio, the unified cache offers a higher hit rate than separate data and instruction caches of equal size. An increase in overall cache-to-integer unit bandwidth is achieved by supplementing the unified cache with a small, high-speed, fully associative instruction line cache. The inclusion of the instruction line cache avoids excessive conflicts between code and data accesses in the unified cache. The on-chip FPU allows floating point instructions to execute in parallel with integer instructions and features a 64-bit data interface. The FPU incorporates a four-deep instruction queue and a four-deep store queue to facilitate parallel execution. The ST6x86 CPU operates from a 3.52 volt power supply resulting in reasonable power consumption at all frequencies. In addition, the ST6x86 CPU incorporates a low power suspend mode, stop clock capability, and system management mode (SMM) for power sensitive applications. 1.1 Major Functional Blocks The ST6x86 processor consists of five major functional blocks, as shown in the overall block diagram on the first page of this manual: - Integer Unit Cache Unit Memory Management Unit Floating Point Unit Bus Interface Unit Instructions are executed in the X and Y pipelines within the Integer Unit and also in the Floating Point Unit (FPU). The Cache Unit stores the most recently used data and instructions to allow fast access to the information by the Integer Unit and FPU. 1-1 ST6x86 Physical addresses are calculated by the Memory Management Unit and passed to the Cache Unit and the Bus Interface Unit (BIU). The BIU provides the interface between the external system board and the processor's internal execution units. Figure 1.1. Integer Unit 1-2 1.2 Integer Unit The Integer Unit (Figure 1.1) provides parallel instruction execution using two seven-stage integer pipelines. Each of the two pipelines, X and Y, can process several instructions simultaneously. ST6x86 The Integer Unit consists of the following pipeline stages: - Instruction Fetch (IF) - Instruction Decode 1 (ID1) - Instruction Decode 2 (ID2) - Address Calculation 1 (AC1) - Address Calculation 2 (AC2) - Execute (EX) - Write-Back (WB) The instruction decode and address calculation functions are both divided into superpipelined stages. 1.2.1 Pipeline Stages The Instruction Fetch (IF) stage, shared by both the X and Y pipelines, fetches 16 bytes of code from the cache unit in a single clock cycle. Within this section, the code stream is checked for any branch instructions that could affect normal program sequencing. If an unc ondi t ional or conditional branch is detected, branch predic t i on logic within the IF stage generates a predicted target address for the instruction. The IF stage then begins fetching instructions at the predicted address. The superpipelined Instruction Decode function contains the ID1 and ID2 stages. ID1, shared by both pipelines, evaluates the code stream provided by the IF stage and determines the number of bytes in each instruction. Up to two instructions per clock are delivered to the ID2 stages, one in each pipeline. The Address Calculation function contains two stages, AC1 and AC2. If the instruction refers to a memory operand, the AC1 calculates a linear memory address for the instruction. The AC2 stage performs any required memory management functions, cache accesses, and register file accesses. If a floating point instruction is detected by AC2, the instruction is sent to the FPU for processing. The Execute (EX) stage executes instructions using the operands provided by the address calculation stage. The Write-Back (WB) stage is the last IU stage. The WB stage stores execution results either to a register file within the IU or to a write buffer in the cache control unit. 1.2.2 Out-of-Order Processing If an instruction executes faster than the previous instruction in the other pipeline, the instructions may complete out of order. All instructions are processed in order, up to the EX stage. While in the EX and WB stages, instructions may be completed out of order. If there is a data dependency between two instructions, the necessary hardware interlocks are enforced to ensure correct program execution. Even though instructions may complete out of order, exceptions and writes resulting from the instructions are always issued in program order. The ID2 stages decode instructions and send the decoded instructions to either the X or Y pipeline for execution. The particular pipeline is chosen, based on which instructions are already in each pipeline and how fast they are expected to flow through the remaining pipeline stages. 1-3 ST6x86 1.2.3 Pipeline Selection 1.2.4 Data Dependency Solutions In most cases, instructions are processed in either pipeline and without pairing constraints on the instructions. However, certain instructions are processed only in the X pipeline: When two instructions that are executing in parallel require access to the same data or register, one of the following types of data dependencies may occur: - Branch instructions - Floating point instructions - Exclusive instructions Branch and floating point instructions may be paired with a second instruction in the Y pipeline. Exclusive Instructions cannot be paired with instructions in the Y pipeline. These instructions typically require multiple memory accesses. Although exclusive instructions may not be paired, hardware from both pipelines is used to accelerate instruction completion. Listed below are the ST6x86 CPU exclusive instruction types: - Protected mode segment loads - Special register accesses (Control, Debug, and Test Registers) - String instructions - Multiply and divide - I/O port accesses - Push all (PUSHA) and pop all (POPA) - Intersegment jumps, calls, and returns 1-4 - Read-After-Write (RAW) - Write-After-Read (WAR) - Write-After-Write (WAW) Data dependencies typically force serialized execution of instructions. However, the ST6x86 CPU implements three mechanisms that allow parallel execution of instructions containing data dependencies: - Register Renaming - Data Forwarding - Data Bypassing The following sections provide detailed examples of these mechanisms. 1.2.4.1 Register Renaming The ST6x86 CPU contains 32 physical general purpose registers. Each of the 32 registers in the register file can be temporarily assigned as one of the general purpose registers defined by the x86 architecture (EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP). For each register write operation a new physical register is selected to allow previous data to be retained temporarily. Register renaming effectively removes all WAW and WAR dependencies. The programmer does not have to consider register renaming; it is completely transparent to both the operating system and application software. ST6x86 Example #1 - Register Renaming Eliminates Write-After-Read (WAR) Dependency Note: In this and the following examples the original instruction order is shown in parentheses. A WAR dependency exists when the first in a pair of instructions reads a logical register, and the second instruction writes to the same logical register. This type of dependency is illustrated by the pair of instructions shown below: In the absence of register renaming, the ADD instruction in the Y pipe would have to be stalled to allow the MOV instruction in the X pipe to read the AX register. X PIPE Y PIPE (1) MOV BX, AX (2) ADD AX, CX BX AX AX AX + CX The ST6x86 CPU, however, avoids the Y pipe stall (Table 1.1). As each instruction executes, the results are placed in new physical registers to avoid the possibility of overwriting a logical register value and to allow the two instructions to complete in parallel (or out of order) rather than in sequence. Table 1.1. Register Renaming with WAR Dependency Physical Register Contents Action Instruction Reg0 Reg1 Reg2 (Initial) AX BX CX MOV BX, AX AX ADD AX, CX Reg3 CX BX CX BX Reg4 AX Pipe X Reg3 Reg0 Y Reg4 Reg0 + Reg2 Note: The representation of the MOV and ADD instructions in the final column of Table 1.1 are completely independent. 1-5 ST6x86 Example #2 - Register Renaming Eliminates Write-After-Write (WAW) Dependency A WAW dependency occurs when two consecutive instructions perform writes to the same logical register. This type of dependency is illustrated by the pair of instructions shown below: X PIPE Y PIPE (1) ADD AX, BX (2) MOV AX, [mem] AX AX + BX AX [mem] Without register renaming, the MOV instruction in the Y pipe would have to be stalled to guarantee that the ADD instruction in the X pipe would write its results to the AX register first. The ST6x86 CPU uses register renaming and avoids the Y pipe stall. The contents of the AX and BX registers are placed in physical registers (Table 1.2). As each instruction executes, the results are placed in new physical registers to avoid the possibility of overwriting a logical register value and to allow the two instructions to complete in parallel (or out of order) rather than in sequence. Table 1.2. Register Renaming with WAW Dependency Physical Register Contents Action Instruction (Initial) Reg0 Reg1 AX BX ADD AX, BX BX MOV AX, [mem] BX Reg2 Reg3 AX AX Pipe X Reg2 Reg0 + Reg1 Y Reg3 [mem] Note: All subsequent reads of the logical register AX will refer to Reg 3, the result of the MOV instruction. 1-6 ST6x86 1.2.4.2 Data Forwarding Register renaming alone cannot remove RAW dependencies. The ST6x86 CPU uses two types of data forwarding in conjunction with register renaming to eliminate RAW dependencies: - Operand Forwarding - Result Forwarding Operand forwarding takes place when the first in a pair of instructions performs a move from register or memory, and the data that is read by the first instruction is required by the second instruction. The ST6x86 CPU performs the read operation and makes the data read available to both instructions simultaneously. Result forwarding takes place when the first in a pair of instructions performs an operation (such as an ADD) and the result is required by the second instruction to perform a move to a register or memory. The ST6x86 CPU performs the required operation and stores the results of the operation to the destination of both instructions simultaneously. 1-7 ST6x86 Example #3 - Operand Forwarding Eliminates Read-After-Write (RAW) Dependency A RAW dependen c y occurs when the first in a pair of instructions performs a write, and the second instruction reads the same register. This type of depen denc y is illustrated by the pair of instructions shown below in the X and Y pipelines: X PIPE Y PIPE (1) MOV AX, [mem] (2) ADD BX, AX AX [mem] BX AX + BX The ST6x86 CPU uses operand forwarding and avoids a Y pipe stall (Table 1.3). Operand forwarding allows simultaneous execution of both instructions by first reading memory and then making the results available to both pipelines in parallel. Operand forwarding can only occur if the first instruction does not modify its source data. In other words, the instruction is a move type instruction (for example, MOV, POP, LEA). Operand forwarding occurs for both register and memory operands. The size of the first instruction destination and the second instruction source must match. Table 1.3. Example of Operand Forwarding Physical Register Contents Action Instruction (Initial) MOV AX, [mem] ADD BX, AX Reg0 Reg1 AX BX BX Reg2 Reg3 AX AX BX Pipe X Reg2 [mem] Y Reg3 [mem] + Reg1 1-8 ST6x86 Example #4 - Result Forwarding Eliminates Read-After-Write (RAW) Dependency In this example, a RAW d ependenc y occurs when the first in a pair of instructions performs a write, and the second instruction reads the same register. This dependency is illustrated by the pair of instructions in the X and Y pipelines, as shown below: X PIPE Y PIPE (1) ADD AX, BX (2) MOV [mem], AX AX AX + BX [mem] AX The ST6x86 CPU uses result forwarding and avoids a Y pipe stall (Table 1.4). Instead of transferring the contents of the AX register to memory, the result of the previous ADD instruction (Reg0 + Reg1) is written directly to memory, thereby saving a clock cycle. The second instruction must be a move instruction and the destination of the second instruction may be either a register or memory. Table 1.4. Result Forwarding Example Physical Register Contents Action Instruction (Initial) Reg0 Reg1 AX BX Reg2 Pipe ADD AX, BX BX AX X Reg2 Reg0 + Reg1 MOV [mem], AX BX AX Y [mem] Reg0 +Reg1 1-9 ST6x86 1.2.4.3 Data Bypassing In addition to register renaming and data forwarding, the ST6x86 CPU implements a third data dependenc y-res ol uti on technique called data bypassing. Data bypassing reduces the performance penalty of those memory data RAW dependencies that cannot be eliminated by data forwarding. Data bypassing is implemented when the first in a pair of instructions writes to memory and the second instruction reads the same data from memory. The ST6x86 CPU retains the data from the first instruction and passes it to the second instruction, thereby eliminating a memory read cycle. Data bypassing only occurs for cacheable memory locations. Example #1Data Bypassing Read-After-Write (RAW) Dependency with In this example, a RAW dependency occurs when the first in a pair of instructions performs a write to memory and the second instruction reads the same memory location. This dependency is illustrated by the pair of instructions in the X and Y pipelines as shown below: X PIPE Y PIPE (1) ADD [mem], AX (2) SUB BX, [mem] [mem] [mem] + AX BX BX - [mem] The ST6x86 CPU uses data bypassing and stalls the Y pipe for only one clock by eliminating the Y pipe's memory read cycle (Table 1.5). Instead of reading memory in the Y pipe, the result of the previous instruction ([mem] + Reg0) is used to subtract from Reg1, thereby saving a memory access cycle. Table 1.5. Example of Data Bypassing Physical Register Contents Action Instruction Reg0 Reg1 (Initial) AX BX ADD [mem], AX AX BX SUB BX, [mem] AX Reg2 BX Pipe X [mem] [mem] + Reg0 Y Reg2 Reg1 - {[mem] + Reg0} 1-10 ST6x86 1.2.5 Branch Control Branch instructions occur on average every four to six instructions in x86-compatible programs. When the normal sequential flow of a program changes due to a branch instruction, the pipeline stages may stall while waiting for the CPU to calculate, retrieve, and decode the new instruction stream. The ST6x86 CPU minimizes the performance degradation and latency of branch instructions through the use of branch prediction and speculative execution. 1.2.5.1 Branch Prediction The ST6x86 CPU uses a 256-entry, 4-way set associative Branch Target Buffer (BTB) to store branch target addresses and branch prediction information. During the fetch stage, the instruction stream is checked for the presence of branch instructions. If an unconditional branch instruction is encountered, the ST6x86 CPU accesses the BTB to check for the branch instruction's target address. If the branch instruction's target address is found in the BTB, the ST6x86 CPU begins fetching at the target address specified by the BTB. In case of conditional branches, the BTB also provides history information to indicate whether the branch is more likely to be taken or not taken. If the conditional branch instruction is found in the BTB, the ST6x86 CPU begins fetching instructions at the predicted target address. If the conditional branch misses in the BTB, the ST6x86 CPU predicts that the branch will not be taken, and instruction fetching continues with the next sequential instruction. The decision to fetch the taken or not taken target address is based on a four-state branch prediction algorithm. Once fetched, a conditional branch instruction is first decoded and then dispatched to the X pipeline only. The conditional branch instruction proceeds through the X pipeline and is then resolved in either the EX stage or the WB stage. The conditional branch is resolved in the EX stage, if the instruction responsible for setting the condition codes is completed prior to the execution of the branch. If the instruction that sets the condition codes is executed in parallel with the branch, the conditional branch instruction is resolved in the WB stage. Correctly predicted branch instructions execute in a single core clock. If resolution of a branch indicates that a misprediction has occurred, the ST6x86 CPU flushes the pipeline and starts fetching from the correct target address. The ST6x86 CPU prefetches both the predicted and the non-predicted path for each conditional branch, thereby eliminating the cache access cycle on a misprediction. If the branch is resolved in the EX stage, the resulting misprediction latency is four cycles. If the branch is resolved in the WB stage, the latency is five cycles. Since the target address of return (RET) instructions is dynamic rather than static, the ST6x86 CPU caches target addresses for RET instructions in an ei ght -en t ry return stack rather than in the BTB. The return address is pushed on the return stack during a CALL instruction and popped during the corresponding RET instruction. 1-11 ST6x86 1.2.5.2 Speculative Execution The ST6x86 CPU is capable of speculative execution following a floating point instruction or predicted branch. Speculative execution allows the pipelines to continuously execute instructions following a branch without stalling the pipelines waiting for branch resolution. The same mechanism is used to execute floating point instructions (see Section 1.5) in parallel with integer instructions. The ST6x86 CPU is capable of up to four levels of speculation (i.e., combinations of four conditional branches and floating point operations). After generating the fetch address using branch prediction, the CPU checkpoints the machine state (registers, flags, and processor environment), increments the speculation level counter, and begins operating on the predicted instruction stream. Once the branch instruction is resolved, the CPU decreases the speculation level. For a correctly predicted branch, the status of the checkpointed resources is cleared. For a branch misprediction, the ST6x86 processor generates the correct fetch address and uses the checkpointed values to restore the machine state in a single clock. In order to maintain compatibility, writes that result from speculatively executed instructions are not permitted to update the cache or external memory until the appropriate branch is resolved. Speculative execution continues until one of the following conditions occurs: 1-12 1) A branch or floating point operation is decoded and the speculation level is already at four. 2) An exception or a fault occurs. 3) The write buffers are full. 4) An attempt is made to modify a non-checkpointed resource (i.e., segment registers, system flags). 1.3 Cache Units The ST6x86 CPU employs two caches, the Unified Cache and the Instruction Line Cache (Figure 1.2). 1.3.1 Unified Cache The 16-KByte unified write-back cache functions as the primary data cache and as the secondary instruction cache. Configured as a four-way set-associative cache, the cache stores up to 16 KByt es of code and data in 512 lines. The cache is dual-ported and allows any two of the following operations to occur in parallel: - Code fetch - Data read (X pipe, Y pipeline or FPU) - Data write (X pipe, Y pipeline or FPU) The unified cache uses a pseudo-LRU replacement algorithm and can be configured to allocate new lines on read misses only or on read and write misses. More inf orm ati on concerning the unified cache can be found in 2.7.1 (Page 52). ST6x86 1.3.2 Instruction Line Cache The fully associative 256-byte instruction line cache serves as the primary instruction cache. The instruction line cache is filled from the unified cache through the data bus. Fetches from the integer unit that hit in the instruction line cache do not access the unified cache. If an instruction line cache miss occurs, the instruction line data from the unified cache is transferred to the instruction line cache and the integer unit, simultaneously. The instruction line cache uses a pseudo-LRU replacement algorithm. To ensure proper operation in the case of self-modifying code, any writes to the unified cache are checked against the contents of the instruction line cache. If a hit occurs in the instruction line cache, the appropriate line is invalidated. Figure 1.2. Cache Unit Operations Instruction Data Integer Unit Instruction Address IF Instruction Line Cache 256-Byte Fully Associative, 8 Lines X Y Pipe Pipe Data Bypass Aligner FPU Data Bus Bus Interface Unit Unified Cache 16-KByte, 4-Way Set Associative, 128 Lines/Set X, Y Linear Address Instruction Line Cache Miss Address Cache Tags Set 0 Set 1 Set 2 Set 3 Modified X, Y Physical Addresses Memory Management Unit (TLB) = Dual Bus = Single Bus 1739503 1-13 ST6x86 1.4 Memory Management Unit 1.4.1 Variable-Size Paging Mechanism The Memory Management Unit (MMU), shown in Figure 1.3, translates the linear address supplied by the IU into a physical address to be used by the unified cache and the bus interface. Memory management procedures are x86 compatible, adhering to standard paging mechanisms. The ST6x86 variable-size paging mechanism allows software to map pages between 4 KBytes and 4 GBytes in size. The large contiguous memories provided by this mechanism help avoid TLB (Translation Lookaside Buffer) thrashing [see Section 2.6.4] associated with some operating systems and applications. For example, use of a single large page instead of a series of small 4-KByte pages can greatly improve performance in an application using a large video memory buffer. The ST6x86 MMU includes two paging mechanisms (Figure 1.3), a traditional paging mechanism, and a ST6x86 variable-size paging mechanism. Figure 1.3. Paging Mechanism within the Memory Management Unit Control Variable-Size Paging Mechanism Linear Address 3 127 Main TLB DTE Cache 0 0 7 Victim TLB 0 DTE Directory Table PTE Physical Page Page Table Page Frame CR3 Control Register Traditional Paging Mechanism 1728901 1-14 ST6x86 1.4.2 Traditional Paging Mechanism FPU Parallel Execution The traditional paging mechanism has been enhanced on the ST6x86 CPU with the addition of the Directory Table Entry (DTE) cache and the Victim TLB. The main TLB (Translation Lookaside Buffer) is a direct-mapped 128-entry cache for page table entries. The ST6x86 CPU executes integer instructions in parallel with FPU instructions. Integer instructions may complete out of order with respect to the FPU instructions. The ST6x86 CPU maintains x86 compatibility by signaling exceptions and issuing write cycles in program order. The four-entry fully associative DTE cache stores the most recent DTE accesses. If a Page Table Entry (PTE) miss occurs followed by a DTE hit, only a single memory access to the PTE table is required. As previously discussed, FPU instructions are always dispatched to the integer unit's X pipeline. The address calculation stage of the X pipeline checks for memory management exceptions and accesses memory operands used by the FPU. If no exceptions are detected, the ST6x86 CPU checkpoints the state of the CPU and, during AC2, dispatches the floating point instruction to the FPU instruction queue. The ST6x86 CPU can then complete any subsequent integer instructions speculatively and out of order relative to the FPU instruction and relative to any potential FPU exceptions which may occur. The Victim TLB stores PTEs which have been displaced from the main TLB due to a TLB miss. If a PTE access occurs while the PTE is stored in the victim TLB, the PTE in the victim TLB is swapped with a PTE in the main TLB. This has the effect of selectively increasing TLB associativity. The ST6x86 CPU updates the eight-entry fully associative victim TLB on an oldest entry replacement basis. 1.5 Floating Point Unit The ST6x86 Floating Point Unit (FPU) interfaces to the integer unit and the cache unit through a 64-bit bus. The ST6x86 FPU is x87 instruction set compatible and adheres to the IEEE-754 standard. Since most applications contain FPU instructions mixed with integer instructions, the ST6x86 FPU achieves high performance by completing integer and FPU operations in parallel. As additional FPU instructions enter the pipeline, the ST6x86 CPU dispatches up to four FPU instructions to the FPU instruction queue. The ST6x86 CPU continues executing speculatively and out of order, relative to the FPU queue, until the ST6x86 CPU encounters one of the conditions that causes speculative execution to halt. As the FPU completes instructions, the speculation level decreases and the checkpointed resources are available for reuse in subsequent operations. The ST6x86 FPU also uses a set of four write buffers to prevent stalls due to speculative writes. 1.6 Bus Interface Unit The Bus Interface Unit (BIU) provides the signals and timing required by external circuitry. The signal descriptions and bus interface timing information is provided in Chapters 3 and 4 of this manual. 1-15 ST6x86 2.0 PROGRAMMING INTERFACE 2.1 Processor Initialization In this chapter, the internal operations of the ST6x86 CPU are described mainly from an application programmer's point of view. Included in this chapter are descriptions of processor initialization, the register set, memory addressing, various types of interrupts and the shutdown and halt process. An overview of real, virtual 8086, and protected operating modes is also included in this chapter. The FPU operations are described separately at the end of the chapter. The ST6x86 CPU is initialized when the RESET signal is asserted. The processor is placed in real mode and the registers listed in Table 2.1 are set to their initialized values. RESET invalidates and disables the cache and turns off paging. When RESET is asserted, the ST6x86 CPU terminates all local bus activity and all internal execution. During the entire time that RESET is asserted, the internal pipelines are flushed and no instruction execution or bus activity occurs. This manual does not--and is not intended to-- describe the ST6x86 microprocessor or its operations at the circuit level. Approximately 150 to 250 external clock cycles after RESET is negated, the processor begins executing instructions at the top of physical memory (address location FFFF FFF0h). Typically, an intersegment JUMP is placed at FFFF FFF0h. This instruction will force the processor to begin execution in the lowest 1 MByte of address space. Note: The actual time depends on the clock scaling in use. Also an additional 220 clock cycles are needed if self-test is requested 2-1 ST6x86 Table 2.1. Initialized Register Controls REGISTER REGISTER NAME INITIALIZED CONTENTS EAX Accumulator xxxx xxxxh EBX Base xxxx xxxxh ECX Count xxxx xxxxh EDX Data 05 + Device ID EBP Base Pointer xxxx xxxxh ESI Source Index xxxx xxxxh EDI Destination Index xxxx xxxxh COMMENTS 0000 0000h indicates self-test passed. Device ID = 31h or 33h (2X clock) Device ID = 35h or 37h (3X clock) ESP Stack Pointer xxxx xxxxh EFLAGS Flag Word 0000 0002h EIP Instruction Pointer 0000 FFF0h ES Extra Segment 0000h Base address set to 0000 0000h. Limit set to FFFFh. CS Code Segment F000h Base address set to FFFF 0000h. Limit set to FFFFh. SS Stack Segment 0000h Base address set to 0000 0000h. Limit set to FFFFh. DS Data Segment 0000h Base address set to 0000 0000h. Limit set to FFFFh. FS Extra Segment 0000h Base address set to 0000 0000h. Limit set to FFFFh. GS Extra Segment 0000h Base address set to 0000 0000h. Limit set to FFFFh. IDTR Interrupt Descriptor Table RegBase = 0, Limit = 3FFh ister GDTR Global Descriptor Table Register xxxx xxxxh, xxxxh LDTR Local Descriptor Table Register xxxx xxxxh, xxxxh TR Task Register xxxxh CR0 Machine Status Word 6000 0010h CR2 Control Register 2 xxxx xxxxh CR3 Control Register 3 xxxx xxxxh CCR (0-5) Configuration Control (0-5) 00h ARR (0-7) Address (0-7) 00h RCR (0-7) Region Control Registers (0-7) 00h DIR0 Device Identification 0 31h or 33h (2X clock) 35h or 37h (3X clock) DIR1 Device Identification 1 Step ID + Revision ID DR7 Debug Register 7 0000 0400h Region Note: x = Undefined value 2-2 Registers ST6x86 2.2 Instruction Set Overview The ST6x86 CPU instruction set performs nine types of general operations: - Arithmetic - Bit Manipulation - Control Transfer - Data Transfer - Floating Point - High-Level Language Support - Operating System Support - Shift/Rotate - String Manipulation All ST6x86 CPU instructions operate on as few as zero operands and as many as three operands. An NOP instruction (no operation) is an example of a zero operand instruction. Two operand instructions allow the specification of an explicit source and destination pair as part of the instruction. These two operand instructions can be divided into eight groups according to operand types: - Register to Register - Register to Memory - Memory to Register - Memory to Memory - Register to I/O - I/O to Register - Immediate Data to Register - Immediate Data to Memory An operand can be held in the instruction itself (as in the case of an immediate operand), in one of the processor's registers or I/O ports, or in memory. An immediate operand is prefetched as part of the opcode for the instruction. Chapter 6 of this manual lists each instruction in the ST6x86 CPU instruction set along with the associated opcodes, execution clock counts, and effects on the FLAGS register. 2.2.1 Lock Prefix The LOCK prefix may be placed before certain instructions that read, modify, then write back to memory. The prefix asserts the LOCK# signal to indicate to the external hardware that the CPU is in the process of running multiple indivisible memory accesses. The LOCK prefix can be used with the following instructions: - Bit Test Instructions (BTS, BTR, BTC) - Exchange Instructions (XADD, XCHG, CMPXCHG) - One-operand Arithmetic and Logical Instructions (DEC, INC, NEG, NOT) - Two-operand Arithmetic and Logical Instructions (ADC, ADD, AND, OR, SBB, SUB, XOR). An invalid opcode exception is generated if the LOCK prefix is used with any other instruction, or with the above instructions when no write operation to memory occurs (i.e., the destination is a register). The LOCK# signal can be negated to allow weak-locking for all of memory or on a regional basis. Refer to the descriptions of the NO-LOCK bit (within CCR1) and the WL bit (within RCRx) later in this chapter. Operand lengths of 8, 16, or 32 bits are supported as well as 64-or 80-bit associated with floating point instructions. Operand lengths of 8 or 32 bits are generally used when executing code written for 386- or 486-class (32-bit code) processors. Operand lengths of 8 or 16 bits are generally used when executing existing 8086 or 80286 code (16-bit code).The default length of an operand can be overridden by placing one or more instruction prefixes in front of the opcode. For example, by using prefixes, a 32-bit operand can be used with 16-bit code, or a 16-bit operand can be used with 32-bit code. 2-3 ST6x86 2.3 Register Sets 2.3.1 Application Register Set From the programmer's point of view there are 58 accessible registers in the ST6x86 CPU. These registers are grouped into two sets. The application register set contains the registers frequently used by application programmers, and the system register set contains the registers typically reserved for use by operating system programmers. The application register set, Figure 2.1 consists of the registers most often used by the applications programmer. These registers are generally accessible and are not protected from read or write access. The application register set is made up of general purpose registers, segment registers, a flag register, and an instruction pointer register. The system register set is made up of the remaining registers which include control registers, system address registers, debug registers, configuration registers, and test registers. Each of the registers is discussed in detail in the following sections. The General Purpose Register contents are frequently modified by assembly language instructions and typically contain arithmetic and logical instruction operands. Segment Registers in real mode contain the base address for each segment. In protected mode the segment registers contain segment selectors. The segment selectors provide indexing for tables (located in memory) that contain the base address and limit for each segment, as well as access control information. The Flag Register contains control bits used to reflect the status of previously executed instructions. This register also contains control bits that affect the operation of some instructions. The Instruction Pointer register points to the next instruction that the processor will execute. This register is automatically incremented by the processor as execution progresses. 2-4 ST6x86 Figure 2.1. Application Register Set 2.3.2 General Purpose Registers The general purpose registers are divided into four data registers, two pointer registers, and two index registers as shown in (Figure 2.2). The Data Registers are used by the applications programmer to manipulate data structures and to hold the results of logical and arithmetic operations. Different portions of the general data registers can be addressed by using different names. An "E" prefix identifies the complete 32-bit register. An "X" suffix without the "E" prefix identifies the lower 16 bits of the register. The lower two bytes of a data register can be addressed with an "H" suffix (identifies the upper byte) or an "L" suffix (identifies the lower byte). The _L and _H portions of a data registers act as independent registers. For example, if the AH register is written to by an instruction, the AL register bits remain unchanged. 2-5 ST6x86 Figure 2.2. General Purpose Registers 31 16 15 0 8 7 AX AH AL EAX (Accumulator) BL EBX (Base) CL ECX (Count) DL EDX (Data) BX BH CX CH DX DH SI ESI (Source Index) DI EDI (Destination Index) BP EBP (Base Pointer) SP ESP (Stack Pointer) 1746400 The Pointer and Index Registers are listed below. SI or ESI Source Index DI or EDI Destination Index SP or ESP Stack Pointer BP or EBP Base Pointer These registers can be addressed as 16- or 32-bit registers, with the "E" prefix indicating 32 bits. The pointer and index registers can be used as general purpose registers, however, some instructions use a fixed assignment of these registers. For example, repeated string operations always use ESI as the source pointer, EDI as the destination pointer, and ECX as the counter. The instructions using fixed registers include multiply and divide, I/O access, string operations, translate, loop, variable shift and rotate, and stack operations. 2-6 The ST6x86 CPU processor implements a stack using the ESP register. This stack is accessed during the PUSH and POP instructions, procedure calls, procedure returns, interrupts, exceptions, and interrupt/exception returns. The microprocessor automatically adjusts the value of the ESP during operation of these instructions.The EBP register may be used to reference data passed on the stack during procedure calls. Local data may also be placed on the stack and referenced relative to BP. This register provides a mechanism to access stack data in high-level languages. ST6x86 2.3.3 Segment Registers and Selectors Segmentation provides a means of defining data structures inside the memory space of the microprocessor. There are three basic types of segments: code, data, and stack. Segments are used automatically by the processor to determine the location in memory of code, data, and stack references. There are six 16-bit segment registers: CS Code Segment DS Data Segment ES Extra Segment SS Stack Segment FS Additional Data Segment GS Additional Data Segment. rent address size. In real mode and in virtual 8086 mode with paging disabled, the linear address is also the physical address. In virtual 8086 mode with paging enabled, the linear address is translated to the physical address using the current page tables. Paging is described in Section 2.6.4 (Page 45). In protected mode a segment register holds a Segment Selector containing a 13-bit index, a Table Indicator (TI) bit, and a two-bit Requested Privilege Level (RPL) field as shown in Figure 2.3. The Index points into a descriptor table in memory and selects one of 8192 (213) segment descriptors contained in the descriptor table. In real and virtual 8086 operating modes, a segment register holds a 16-bit segment base. The 16-bit segment is multiplied by 16 and a 16-bit or 32-bit offset is then added to it to create a linear address. The offset size is dependent on the cur- A segment descriptor is an eight-byte value used to describe a memory segment by defining the segment base, the segment limit, and access control information. To address data within a segment, a 16-bit or 32-bit offset is added to the segment's base address. Once a segment selector has been loaded into a segment register, an instruction needs only to specify the segment register and the offset. Figure 2.3. Segment Selector in Protected Mode Segment Selector 15 INDEX 3 2 TI 1 0 RPL 8191 Limit Descriptor Segment Base 0 Descriptor Table Main Memory 174170 1 2-7 ST6x86 The Table Indicator (TI) bit of the selector defines which descriptor table the index points into. If TI=0, the index references the Global Descriptor Table (GDT). If TI=1, the index references the Local Descriptor Table (LDT). The GDT and LDT are described in more detail in Section Table 2.6. Protected mode addressing is discussed further in Sections 2.6.2 and 2.6.3. The Requested Privilege Level (RPL) field in a segment selector is used to determine the Effective Privilege Level of an instruction (where RPL=0 indicates the most privileged level, and RPL=3 indicates the least privileged level). If the level requested by RPL is less than the Current Program Level (CPL), the RPL level is accepted and the Effective Privilege Level is changed to the RPL value. If the level requested by RPL is greater than CPL, the CPL overrides the requested RPL and Effective Privilege Level remains unchanged. When a segment register is loaded with a segment selector, the segment base, segment limit and access rights are loaded from the descriptor table entry into a user-invisible or hidden portion of the segment register (i.e., cached on-chip). The CPU does not access the descriptor table entry again until another segment register load occurs. If the descriptor tables are modified in memory, the segment registers must be reloaded with the new selector values by the software. The processor automatically selects an implied (default) segment register for memory references. Table 2.2 describes the selection rules. In general, data references use the selector contained in the DS register, stack references use the SS register and instruction fetches use the CS register. While some of these selections may be overridden, instruction fetches, stack operations, and the destination write of string operations cannot be overridden. Special segment override instruction prefixes allow the use of alternate segment registers including the use of the ES, FS, and GS segment registers. Table 2.2. Segment Register Selection Rules IMPLIED (DEFAULT) SEGMENT SEGMENT OVERRIDE PREFIX Code Fetch CS None Destination of PUSH, PUSHF, INT, CALL, PUSHA instructions SS None Source of POP, POPA, POPF, IRET, RET instructions SS None Destination of STOS, MOVS, REP STOS, REP MOVS instructions ES None Other data references with effective address using base registers of: EAX, EBX, ECX, EDX, ESI, EDI EBP, ESP DS CS, ES, FS, GS, SS SS CS,DS, ES, FS, GS TYPE OF MEMORY REFERENCE 2-8 ST6x86 2.3.4 Instruction Pointer Register 2.3.5 Flags Register The Instruction Pointer (EIP) register contains the offset into the current code segment of the next instruction to be executed. The register is normally incremented with each instruction execution unless implicitly modified through an interrupt, exception or an instruction that changes the sequential execution flow (e.g., JMP, CALL). The Flags Register, EFLAGS, contains status information and controls certain operations on the ST6x86 CPU microprocessor. The lower 16 bits of this register are referred to as the FLAGS register that is used when executing 8086 or 80286 code. The flag bits are shown in Figure 2.4 and defined in Table 2.3. Figure 2.4. EFLAGS Register Flags 3 1 2 2 4 3 0 0 0 0 0 0 0 0 0 0 Identification Alignment Check Virtual 8086 Mode Resume Flag Nested Task Flag I/O Privilege Level Overflow Direction Flag Interrupt Enable Trap Flag Sign Flag Zero Flag Auxiliary Carry Parity Flag Carry Flag 2 1 0 0 I D 1 1 1 9 8 7 0 0 A V C M 1 1 1 1 1 1 1 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 R N IO O D I T S Z A P C F 0 T PL F F F F F F 0 F 0 F 1 F S S S D S S A C S D A A A A A A = Arithmetic Flag, D = Debug Flag, S = System Flag, C = Control Flag 0 or 1 Indicates Reserved 1701105 2-9 ST6x86 Table 2.3. EFLAGS Bit Definitions BIT POSITIO N NAME 0 CF Carry Flag: Set when a carry out of (addition) or borrow into (subtraction) the most significant bit of the result occurs; cleared otherwise. 2 PF Parity Flag: Set when the low-order 8 bits of the result contain an even number of ones; cleared otherwise. 4 AF Auxiliary Carry Flag: Set when a carry out of (addition) or borrow into (subtraction) bit position 3 of the result occurs; cleared otherwise. 6 ZF Zero Flag: Set if result is zero; cleared otherwise. 7 SF Sign Flag: Set equal to high-order bit of result (0 indicates positive, 1 indicates negative). 8 TF Trap Enable Flag: Once set, a single-step interrupt occurs after the next instruction completes execution. TF is cleared by the single-step interrupt. 9 IF Interrupt Enable Flag: When set, maskable interrupts (INTR input pin) are acknowledged and serviced by the CPU. 10 DF Direction Flag: If DF=0, string instructions auto-increment (default) the appropriate index registers (ESI and/or EDI). If DF=1, string instructions auto-decrement the appropriate index registers. OF Overflow Flag: Set if the operation resulted in a carry or borrow into the sign bit of the result but did not result in a carry or borrow out of the high-order bit. Also set if the operation resulted in a carry or borrow out of the high-order bit but did not result in a carry or borrow into the sign bit of the result. 12, 13 IOPL I/O Privilege Level: While executing in protected mode, IOPL indicates the maximum current privilege level (CPL) permitted to execute I/O instructions without generating an exception 13 fault or consulting the I/O permission bit map. IOPL also indicates the maximum CPL allowing alteration of the IF bit when new values are popped into the EFLAGS register. 14 NT Nested Task: While executing in protected mode, NT indicates that the execution of the current task is nested within another task. 16 RF Resume Flag: Used in conjunction with debug register breakpoints. RF is checked at instruction boundaries before breakpoint exception processing. If set, any debug fault is ignored on the next instruction. 17 VM Virtual 8086 Mode: If set while in protected mode, the microprocessor switches to virtual 8086 operation handling segment loads as the 8086 does, but generating exception 13 faults on privileged opcodes. The VM bit can be set by the IRET instruction (if current privilege level=0) or by task switches at any privilege level. 18 AC Alignment Check Enable: In conjunction with the AM flag in CR0, the AC flag determines whether or not misaligned accesses to memory cause a fault. If AC is set, alignment faults are enabled. 21 ID Identification Bit: The ability to set and clear this bit indicates that the CPUID instruction is supported. The ID can be modified only if the CPUID bit in CCR4 is set. 11 2-10 FUNCTION ST6x86 2.4 System Register Set The system register set, shown in Figure 2.5, consists of registers not generally used by application programmers. These registers are typically employed by system level programmers who generate operating systems and memory management programs. The Control Registers control certain aspects of the ST6x86 microprocessor such as paging, coprocessor functions, and segment protection. When a paging exception occurs while paging is enabled, some control registers retain the linear address of the access that caused the exception. The Descriptor Table Registers and the Task Register can also be referred to as system address or memory management registers. These registers consist of two 48-bit and two 16-bit registers. These registers specify the location of the data structures that control the segmentation used by the ST6x86 microprocessor. Segmentation is one available method of memory management. The Configuration Registers are used to configure the ST6x86 CPU on-chip cache operation, power management features and System Management Mode. The configuration registers also provide information on the CPU device type and revision. The Debug Registers provide debugging facilities to enable the use of data access breakpoints and code execution breakpoints. The Test Registers provide a mechanism to test the contents of both the on-chip 16 KByte cache and the Translation Lookaside Buffer (TLB). In the following sections, the system register set is described in greater detail. 2-11 ST6x86 Figure 2.5. System Register Set 31 47 0 16 15 Page Fault Linear Address Register Page Directory Base Register 16 15 Limit Base Base Limit Selector Selector 31 Linear Breakpoint Address 0 Linear Breakpoint Address 1 Linear Breakpoint Address 2 Linear Breakpoint Address 3 Breakpoint Status Breakpoint Control 7 CCR = Configuration C ontrol Register 7 0 23 RCR0 RCR1 RCR2 RCR3 RCR4 RCR5 RCR6 RCR7 Address Address Address Address Address Address Region Region Region Region Region Region CR0 CR2 CR3 Control Registers 0 GDTR IDTR LDTR TR Descriptor Table Registers Task Register 0 DR0 DR1 DR2 DR3 DR6 DR7 Debug Registers 0 CCR0 CCR1 CCR2 CCR3 CCR4 CCR5 ARR0 CCR0 CCR1 CCR2 CCR3 CCR4 CCR5 Register 0 Register 1 Register 2 Register 3 Register 4 Register 5 ARR1 ARR2 Configuration Registers ARR3 ARR4 ARR5 ARR6 ARR7 DIR0 DIR1 Address Region Register 6 Address Region Register 7 DIR0 RCR = Region Control Register DIR1 DIR = Device Identification R eg ister 31 0 Cache Test Cache Test Cache Test TLB Test Control TLB Test Status TR3 TR4 TR5 TR6 TR7 Test Registers 17 282 00 2-12 ST6x86 2.4.1 Control Registers The Control Registers (CR0, CR2 and CR3), are shown in Figure 2-6. The CR0 register contains system control bits which configure operating modes and indicate the general state of the CPU. The lower 16 bits of CR0 are referred to as the Machine Status Word (MSW). The CR0 bit definitions are described in Table 2.4 and Table 2.5. The reserved bits in CR0 should not be modified. When paging is enabled and a page fault is generated, the CR2 register retains the 32-bit linear address of the address that caused the fault. When a double page fault occurs, CR2 contains the address for the second fault. Register CR3 contains the 20 most significant bits of the physical base address of the page directory. The page directory must always be aligned to a 4-KByte page boundary, therefore, the lower 12 bits of CR3 are not required to specify the base address. CR3 contains the Page Cache Disable (PCD) and Page Write Through (PWT) bits. During bus cycles that are not paged, the state of the PCD bit is reflected on the PCD pin and the PWT bit is driven on the PWT pin. These bus cycles include interrupt acknowledge cycles and all bus cycles, when paging is not enabled. The PCD pin should be used to control caching in an external cache. The PWT pin should be used to control write policy in an external cache. Figure 2.6. Control Registers F l ag s 3 1 0 2 4 0 0 0 0 0 0 0 I d e n t i fi c a t io n A l ig n m e n t C h e c k V ir t u a l 8 0 8 6 M o d e R e su m e F lag N e s t e d T a s k F la g I /O P r i v ile g e L e v e l O v e r f lo w D i r e c t io n F l a g I n t e r r u p t E n a b le T ra p F lag S i g n F la g Z e ro F lag A u x il ia r y C a r r y P a r it y F l a g C a r r y F la g 0 2 3 0 0 0 2 1 I D 1 9 0 0 1 8 A C 1 7 V M 1 6 R F 1 5 0 1 4 N T 1 3 1 2 IO PL 1 1 O F 1 0 D F 9 I F 8 T F 7 S F 6 Z F 5 0 4 A F 3 0 2 P F 1 1 0 C F S S S D S S A C S D A A A A A A = A r ith m e tic F l a g , D = D e b u g F l a g , 0 o r 1 In d ic a te s R e s e r v e d S = S y s te m F la g , C = C o n tr o l F l a g 1 7 0 11 0 5 2-13 ST6x86 Table 2.4. CRO Bit Definitions BIT POSITION NAME FUNCTION 0 PE Protected Mode Enable: Enables the segment based protection mechanism. If PE=1, protected mode is enabled. If PE=0, the CPU operates in real mode and addresses are formed as in an 8086-style CPU. 1 MP Monitor Processor Extension: If MP=1 and TS=1, a WAIT instruction causes Device Not Available (DNA) fault 7. The TS bit is set to 1 on task switches by the CPU. Floating point instructions are not affected by the state of the MP bit. The MP bit should be set to one during normal operations. 2 EM Emulate Processor Extension: If EM=1, all floating point instructions cause a DNA fault 7. 3 TS Task Switched: Set whenever a task switch operation is performed. Execution of a floating point instruction with TS=1 causes a DNA fault. If MP=1 and TS=1, a WAIT instruction also causes a DNA fault. 4 1 5 NE Numerics Exception. NE=1 to allow FPU exceptions to be handled by interrupt 16. NE=0 if FPU exceptions are to be handled by external interrupts. 16 WP Write Protect: Protects read-only pages from supervisor write access. WP=0 allows a read-only page to be written from privilege level 0-2. WP=1 forces a fault on a write to a read-only page from any privilege level. 18 AM Alignment Check Mask: If AM=1, the AC bit in the EFLAGS register is unmasked and allowed to enable alignment check faults. Setting AM=0 prevents AC faults from occurring. NW Not Write-Back: If NW=1, the on-chip cache operates in write-through mode. In write-through mode, all writes (including cache hits) are issued to the external bus. If NW=0, the on-chip cache operates in write-back mode. In write-back mode, writes are issued to the external bus only for a cache miss, a line replacement of a modified line, or as the result of a cache inquiry cycle. 30 CD Cache Disable: If CD=1, no further cache line fills occur. However, data already present in the cache continues to be used if the requested address hits in the cache. Writes continue to update the cache and cache invalidations due to inquiry cycles occur normally. The cache must also be invalidated to completely disable any cache activity. 31 PG Paging Enable Bit: If PG=1 and protected mode is enabled (PE=1), paging is enabled. After changing the state of PG, software must execute an unconditional branch instruction (e.g., JMP, CALL) to have the change take effect. 29 Reserved: Do not attempt to modify. Table 2.5. Effects of Various Combinations of EM, TS, and MP Bits CR0 BIT 2-14 INSTRUCTION TYPE EM TS MP WAIT ESC 0 0 0 Execute Execute 0 0 1 Execute Execute 0 1 0 Execute Fault 7 0 1 1 Fault 7 Fault 7 1 0 0 Execute Fault 7 1 0 1 Execute Fault 7 1 1 0 Execute Fault 7 1 1 1 Fault 7 Fault 7 ST6x86 Table 2.6 Descriptor Table Registers and Descriptors Descriptor Table Registers The Global, Interrupt, and Local Descriptor Table Registers (GDTR, IDTR and LDTR), shown in Figure 2.7, are used to specify the location of the data structures that control segmented memory management. The GDTR, IDTR and LDTR are loaded using the LGDT, LIDT and LLDT instructions, respectively. The values of these registers are stored using the corresponding store instructions. The GDTR and IDTR load instructions are privileged instructions when operating in protected mode. The LDTR can only be accessed in protected mode. The Global Descriptor Table Register (GDTR) holds a 32-bit linear base address and 16-bit limit for the Global Descriptor Table (GDT). The GDT is an array of up to 8192 8-byte descriptors. When a segment register is loaded from memory, the TI bit in the segment selector chooses either the GDT or the Local Descriptor Table (LDT) to locate a descriptor. If TI = 0, the index portion of the selector is used to locate the descriptor within the GDT table. The contents of the GDTR are completely visible to the programmer by using a SGDT instruction. The first descriptor in the GDT (location 0) is not used by the CPU and is referred to as the "null descriptor". The GDTR is initialized using a LGDT instruction. The Interrupt Descriptor Table Register (IDTR) holds a 32-bit linear base address and 16-bit limit for the Interrupt Descriptor Table (IDT). The IDT is an array of 256 interrupt descriptors, each of which is used to point to an interrupt service routine. Every interrupt that may occur in the system must have an associated entry in the IDT. The contents of the IDTR are completely visible to the programmer by using a SIDT instruction. The IDTR is initialized using the LIDT instruction. The Local Descriptor Table Register (LDTR) holds a 16-bit selector for the Local Descriptor Table (LDT). The LDT is an array of up to 8192 8-byte descriptors. When the LDTR is loaded, the LDTR selector indexes an LDT descriptor that must reside in the Global Descriptor Table (GDT). The base address and limit are loaded automatically and cached from the LDT descriptor within the GDT. Figure 2.7. Descriptor Table Registers 47 16 15 0 BASE ADDRESS LIMIT GDTR BASE ADDRESS LIMIT IDTR SELECTOR LDTR 1708003 2-15 ST6x86 Subsequent access to entries in the LDT use the hidden LDTR cache to obtain linear addresses. If the LDT descriptor is modified in the GDT, the LDTR must be reloaded to update the hidden portion of the LDTR. When a segment register is loaded from memory, the TI bit in the segment selector chooses either the GDT or the LDT to locate a segment descriptor. If TI = 1, the index portion of the selector is used to locate a given descriptor within the LDT. Each task in the system may be given its own LDT, managed by the operating system. The LDTs provide a method of isolating a given task's segments from other tasks in the system. The LDTR can be read or written by the LLDT and SLDT instructions. Descriptors There are three types of descriptors: - Application Segment Descriptors that define code, data and stack segments. - System Segment Descriptors that define an LDT segment or a Task State Segment (TSS) table described later in this text. - Gate Descriptors that define task gates, interrupt gates, trap gates and call gates. Application Segment Descriptors can be located in either the LDT or GDT. System Segment Descriptors can only be located in the GDT. Dependent on the gate type, gate descriptors may be located in either the GDT, LDT or IDT. Figure 2.8 illustrates the descriptor format for both Application Segment Descriptors and System Segment Descriptors. Table 2.7 lists the corresponding bit definitions. Figure 2.8. Application and System Segment Descriptors 31 BASE 31-24 24 23 22 21 20 19 16 15 14 13 12 11 A D TYPE G D 0 V LIMIT 19-16 P DPL T L BASE 15-0 LIMIT 15-0 87 0 BASE 23-16 +4 +0 1707803 2-16 ST6x86 Table 2.7. Segment Descriptor Bit Definitions BIT POSITION MEMORY OFFSET NAME 31-24 7-0 31-16 +4 +4 +0 BASE Segment base address. 32-bit linear address that points to the beginning of the segment. 19-16 15-0 +4 +0 LIMIT Segment limit. 23 +4 G Limit granularity bit: 0 = byte granularity, 1 = 4 KBytes (page) granularity. 22 +4 D Default length for operands and effective addresses. Valid for code and stack segments only: 0 = 16 bit, 1 = 32-bit. 20 +4 AVL Segment available. Segment present. DESCRIPTION 15 +4 P 14-13 +4 DPL Descriptor privilege level. 12 +4 DT Descriptor type: 0 = system, 1 = application. 11-8 +4 TYPE Segment type. See Tables 2-7 and 2-8. Table 2.8. TYPE Field Definitions with DT = 0 TYPE (BITS 11-8) DESCRIPTION 0001 TSS-16 descriptor, task not busy. 0010 LDT descriptor. 0011 TSS-16 descriptor, task busy. 1001 TSS-32 descriptor, task not busy 1011 TSS-32 descriptor, task busy. 2-17 ST6x86 Table 2.9. TYPE Field Definitions with DT = 1 TYPE APPLICATION DESCRIPTOR INFORMATION E C/D R/W A 0 0 x x data, expand up, limit is upper bound of segment 0 1 x x data, expand down, limit is lower bound of segment 1 0 x x executable, non-conforming 1 1 x x executable, conforming (runs at privilege level of calling procedure) 0 x 0 x data, non-writable 0 x 1 x data, writable 1 x 0 x executable, non-readable 1 x 1 x executable, readable x x x 0 not-accessed x x x 1 accessed 2-18 ST6x86 Gate Descriptors provide protection for executable segments operating at different privilege levels. Figure 2.9 illustrates the format for Gate Descriptors and Table 2.10 lists the corresponding bit definitions. Interrupt Gate Descriptors are used to enter a hardware interrupt service routine. Trap Gate Descriptors are used to enter exceptions or software interrupt service routines. Trap Gate and Interrupt Gate Descriptors can only be located in the IDT. Task Gate Descriptors are used to switch the CPU's context during a task switch. The selector portion of the task gate descriptor locates a Task State Segment. These descriptors can be located in the GDT, LDT or IDT tables. Call Gate Descriptors are used to enter a procedure (subroutine) that executes at the same or a more privileged level. A Call Gate Descriptor primarily defines the procedure entry point and the procedure's privilege level. Figure 2.9. Gate Descriptor 31 16 15 14 OFFSET 31-16 P 13 12 11 DPL 0 SELECTO R 15-0 8 7 TYPE 0 0 0 0 PARAMETERS OFFSET 15-0 +4 +0 1707903 Table 2.10. Gate Descriptor Bit Definitions BIT POSITION MEMORY OFFSET NAME 31-16 15-0 +4 +0 OFFSET 31-16 +0 SELECTOR 15 +4 P 14-13 +4 DPL Descriptor privilege level. Segment type: 0100 = 16-bit call gate 0101 = task gate 0110 = 16-bit interrupt gate 0111 = 16-bit trap gate 1100 = 32-bit call gate 1110 = 32-bit interrupt gate 1111 = 32-bit trap gate. 11-8 +4 TYPE 4-0 +4 PARAMETERS DESCRIPTION Offset used during a call gate to calculate the branch target. Segment selector used during a call gate to calculate the branch target. Segment present. Number of 32-bit parameters to copy from the caller's stack to the called procedure's stack (valid for calls). 2-19 ST6x86 2.4.2 Task Register The Task Register (TR) holds a 16-bit selector for the current Task State Segment (TSS) table as shown in Figure 2.10. The TR is loaded and stored via the LTR and STR instructions, respectively. The TR can only be accessed during protected mode and can only be loaded when the privilege level is 0 (most privileged). When the TR is loaded, the TR selector field indexes a TSS descriptor that must reside in the Global Descriptor Table (GDT). The contents of the selected descriptor are cached on-chip in the hidden portion of the TR. During task switching, the processor saves the current CPU state in the TSS before starting a new task. The TR points to the current TSS. The TSS can be either a 386/486-style 32-bit TSS Figure 2.11 or a 286-style 16-bit TSS type Figure 2.12. An I/O permission bit map is referenced in the 32-bit TSS by the I/O Map Base Address. Figure 2.10. Task Register 15 0 SELECTOR 1708103 2-20 ST6x86 Figure 2.11. 32-Bit Task State Segment (TSS) Table 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I/O MAP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BASE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = RESERVED 0 16 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 SELECTOR FOR TASK'S LDT 0 GS 0 FS 0 DS 0 SS 0 CS 0 ES EDI ESI EBP ESP EBX EDX ECX EAX EFLAGS EIP CR3 0 0 0 0 SS for CPL = 2 ESP for CPL = 2 0 0 0 0 SS for CPL = 1 ESP for CPL = 1 0 0 0 0 SS for CPL = 0 ESP for CPL = 0 0 0 0 0 BACK LINK (OLD TSS SELECTOR) ADDRESS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +64h +60h +5Ch +58h +54h +50h +4Ch +48h +44h +40h +3Ch +38h +34h +30h +2Ch +28h +24h +20h +1Ch +18h +14h +10h +Ch +8h +4h +0h 1708203 2-21 ST6x86 Figure 2.12. 16-Bit Task State Segment (TSS) Table SE LEC TO R FO R T AS K'S LDT DS +28 h SS +26 h CS +24 h ES +22 h DI +20 h SI +1E h BP +16 h SP +1A h BX +18 h DX +16 h CX +14 h AX +12 h FL AG S +10 h IP +E h S S FO R P RIVILEG E LEV EL 2 +C h SP F O R P RIVILEG E LEV EL 2 +A h SS F O R P RIVILEG E LEV EL 1 +8h SP F O R P RIVILEG E LEV EL 1 +6h S S FO R P RIVILEG E LEV EL 0 +4h SP F O R P RIVILEG E LEV EL 0 +2h BA CK LIN K (O LD T SS SE LE C T OR ) +0h 170880 3 2-22 +2A h ST6x86 2.4.3 6x86 Configuration Registers A set of 24 on-chip ST6x86 configuration registers are used to enable features in the ST6x86 CPU. These registers assign non-cached memory areas, set up SMM, provide CPU identification information and control various features such as cache write policy, and bus locking control. There are four groups of registers within the ST6x86 configuration register set: - 6 Configuration Control Registers (CCRx) - 8 Address Region Registers (ARRx) - 8 Region Control Registers (RCRx) - 2 Device Identification Registers (DIRx) Access to the configuration registers is achieved by writing the register index number for the configuration register to I/O port 22h. I/O port 23h is then used for data transfer. Each I/O port 23h data transfer must be preceded by a valid I/O port 22h register index selection. Otherwise, the current 22h, and the second and later I/O port 23h operations communicate through the I/O port to produce external I/O cycles. All reads from I/O port 22h produce external I/O cycles. Accesses that hit within the on-chip configuration registers do not generate external I/O cycles. After reset, configuration registers with indexes CO-CFh and FE-FFh are accessible. To prevent potential conflicts with other devices which may use ports 22 and 23h to access their registers, the remaining registers (indexes D0-FDh) are accessible only if the MAPEN(3-0) bits in CCR3 are set to 1h. See Figure 2.16 for more information on the MAPEN(3-0) bit locations. If MAPEN[3-0] = 1h, any access to indexes in the range 00-FFh will not create external I/O bus cycles. Registers with indexes C0-CFh, FE, FFh are accessible regardless of the state of MAPEN[3-0]. If the register index number is outside the C0-CFh or FE-FFh ranges, and MAPEN[3-0] are set to 0h, external I/O bus cycles occur. Table 2.11 lists the MAPEN[3-0] values required to access each ST6x86 configuration register. All bits in the configuration registers are initialized to zero following reset unless specified otherwise. Valid register index numbers include C0h to E3h, E8h, E9h, FEh and FFh (if MAPEN[3-0] = 1). 2.4.3.1 Configuration Control Registers (CCR0 - CCR5) control several functions, including non-cacheable memory, write-back regions, and SMM features. A list of the configuration registers is listed in Table 2.11. The configuration registers are described in greater detail in the following pages. 2-23 ST6x86 Table 2.11. ST6x86 CPU Configuration Registers ACRONYM REGISTER INDEX WIDTH (Bits) MAPEN VALUE NEEDED FOR ACCESS Configuration Control 0 CCR0 C0h 8 x Configuration Control 1 CCR1 C1h 8 x Configuration Control 2 CCR2 C2h 8 x Configuration Control 3 CCR3 C3h 8 x Configuration Control 4 CCR4 E8h 8 1 Configuration Control 5 CCR5 E9h 8 1 Address Region 0 ARR0 C4h - C6h 24 x Address Region 1 ARR1 C7h - C9h 24 x Address Region 2 ARR2 CAh - CCh 24 x Address Region 3 ARR3 CDh - CFh 24 x Address Region 4 ARR4 D0h - D2h 24 1 Address Region 5 ARR5 D3h - D5h 24 1 Address Region 6 ARR6 D6h - D8h 24 1 Address Region 7 ARR7 D9h - DBh 24 1 Region Control 0 RCR0 DCh 8 1 Region Control 1 RCR1 DDh 8 1 Region Control 2 RCR2 DEh 8 1 Region Control 3 RCR3 DFh 8 1 Region Control 4 RCR4 E0h 8 1 Region Control 5 RCR5 E1h 8 1 Region Control 6 RCR6 E2h 8 1 Region Control 7 RCR7 E3h 8 1 Device Identification 0 DIR0 FEh 8 x Device Identification 1 DIR1 FFh 8 x REGISTER NAME Note: x = Don't Care 2-24 ST6x86 Figure 2.13. ST6x86 Configuration Control Register 0 (CCRO) 7 6 5 4 3 2 1 0 Reserved Reserved Reserved Reserved Reserved Reserved NC1 Reserved Table 2.12. CCRO Bit Definitions BIT POSITION NAME 1 NC1 DESCRIPTION No Cache 640 KByte - 1 MByte If = 1: Address region 640 KByte to 1 MByte is non-cacheable. If = 0: Address region 640 KByte to 1 MByte is cacheable. Note: Bits 0, 2 through 7 are reserved. 2-25 ST6x86 Figure 2.14. ST6x86 Configuration Control Register 1 (CCR1) 7 6 5 4 3 2 1 0 SM3 Reserved Reserved NO_LOCK Reserved SMAC USE_SMI Reserved Table 2.13. CCR1 Bit Definitions BIT POSITION NAME DESCRIPTION USE_SMI Enable SMM and SMIACT# Pins If = 1: SMI# and SMIACT# pins are enabled. If = 0: SMI# pin ignored and SMIACT# pin is driven inactive. SMAC System Management Memory Access If = 1: Any access to addresses within the SMM address space, access system management memory instead of main memory. SMI# input is ignored. Used when initializing or testing SMM memory. If = 0: No effect on access. 4 NO_LOCK Negate LOCK# If = 1: All bus cycles are issued with LOCK# pin negated except page table accesses and interrupt acknowledge cycles. Interrupt acknowledge cycles are executed as locked cycles even though LOCK# is negated. With NO_LOCK set, previously noncacheable locked cycles are executed as unlocked cycles and therefore, may be cached. This results in higher performance. Refer to Region Control Registers for information on eliminating locked CPU bus cycles only in specific address regions. 7 SM3 SMM Address Space Address Region 3 If = 1: Address Region 3 is designated as SMM address space. 1 2 Note: Bits 0, 3, 5 and 6 are reserved. 2-26 ST6x86 Figure 2.15. ST6x86 Configuration Control Register 2 (CCR2) 7 6 5 4 3 2 1 0 USE_SUSP Reserved Reserved WPR1 SUSP_HLT LOCK_NW Reserved Reserved Table 2.14. CCR2 Bit Definitions BIT POSITION NAME DESCRIPTION 2 LOCK_NW Lock NW If = 1: NW bit in CR0 becomes read only and the CPU ignores any writes to the NW bit. If = 0: NW bit in CR0 can be modified. 3 SUSP_HLT Suspend on Halt If = 1: Execution of the HLT instruction causes the CPU to enter low power suspend mode. 4 WPR1 Write-Protect Region 1 If = 1: Designates any cacheable accesses in 640 KByte to 1 MByte address region are write protected. 7 USE_SUSP Use Suspend Mode (Enable Suspend Pins) If = 1: SUSP# and SUSPA# pins are enabled. If = 0: SUSP# pin is ignored and SUSPA# pin floats. Note: Bits 0,1, 5 and 6 are reserved. 2-27 ST6x86 Figure 2.16. ST6x86 Configuration Control Register 3 (CCR3) 7 6 5 MAPEN 4 3 2 1 0 Reserved LINBRST NMI_EN SMI_LOCK Table 2.15. CCR3 Bit Definitions BIT POSITION NAME SMI_LOCK SMI Lock If = 1: The following SMM configuration bits can only be modified while in an SMI service routine: CCR1: USE_SMI, SMAC, SM3 CCR3: NMI_EN ARR3: Starting address and block size. Once set, the features locked by SMI_LOCK cannot be unlocked until the RESET pin is asserted. 1 NMI_EN NMI Enable If = 1: NMI interrupt is recognized while servicing an SMI interrupt. NMI_EN should be set only while in SMM, after the appropriate SMI interrupt service routine has been setup. 2 LINBRST If = 1: Use linear address sequence during burst cycles. If = 0: Use "1 + 4" address sequence during burst cycles. The "1 + 4" address sequence is compatible with Pentium's burst address sequence. MAPEN MAP Enable If = 1h: All configuration registers are accessible. If = 0h: Only configuration registers with indexes C0-CFh, FEh and FFh are accessible. 0 4-7 Note: Bit 3 is reserved. 2-28 DESCRIPTION ST6x86 Figure 2.17. ST6x86 Configuration Control Register 4 (CCR4) 7 6 5 4 3 2 CPUID Reserved Reserved DTE_EN Reserved 1 0 IORT Table 2.16. CCR4 Bit Definitions BIT POSITION 0-2 4 7 NAME DESCRIPTION IORT I/O Recovery Time Specifies the minimum number of bus clocks between I/O accesses: 0h = 1 clock delay 1h = 2 clock delay 2h = 4 clock delay 3h = 8 clock delay 4h = 16 clock delay 5h = 32 clock delay (default value after RESET) 6h = 64 clock delay 7h = no delay DTE_EN Enable Directory Table Entry Cache If = 1: the Directory Table Entry cache is enabled. CPUID Enable CPUID instruction. If = 1: the ID bit in the EFLAGS register can be modified and execution of the CPUID instruction occurs as documented in section 6.3. If = 0: the ID bit in the EFLAGS register can not be modified and execution of the CPUID instruction causes an invalid opcode exception. Note: Bits 3 and bits 5 and 6 are reserved. 2-29 ST6x86 Figure 2.18. ST6x86 Configuration Control Register 5 (CCR5) 7 6 5 4 3 2 1 0 Reserved Reserved ARREN LBR1 Reserved Reserved Reserved WT_ALLOC Table 2.17. CCR5 Bit Definitions BIT POSITION NAME DESCRIPTION 0 WT_ALLOC Write-Through Allocate If = 1: New cache lines are allocated for read and write misses. If = 0: New cache lines are allocated only for read misses. 4 LBR1 Local Bus Region 1 If = 1: LBA# pin is asserted for all accesses to the 640 KByte to 1 MByte address region. ARREN Enable ARR Registers If = 1: Enables all ARR registers. If = 0: Disables the ARR registers. If SM3 is set, ARR3 is enabled regardless of the setting of ARREN. 5 Note: Bits 1 through 3 and 6 though 7 are reserved. 2-30 ST6x86 2.4.3.2 Address Region Registers The Address Region Registers (ARR0 - ARR7) (Figure 2.19) are used to specify the location and size for the eight address regions. Attributes for each address region are specified in the Region Control Registers (RCR0-RCR7). ARR7 and RCR7 are used to define system main memory and differ from ARR0-6 and RCR0-6. With non-cacheable regions defined on-chip, the ST6x86 CPU delievers optimum performance by using advanced techniques to eliminate data dependencies and resource conflicts in its execution pipelines. If KEN# is active for accesses to regions defined as non-cacheable by the RCRs, the region is not cached. The RCRs take precedence in this case. A register index, shown in Table 2.18 is used to select one of three bytes in each ARR. The starting address of the ARR address region, selected by the START ADDRESS field, must be on a block size boundary. For example, a 128 KByte block is allowed to have a starting address of 0 KBytes, 128 KBytes, 256 KBytes, and so on. The SIZE field bit definition is listed in Table 2.19. If the SIZE field is zero, the address region is of zero size and thus disabled. Figure 2.19. Address Region Registers (ARR0 - ARR7) START ADDRESS Memory Address Bits A31-A24 7 SIZE Memory Address Bits A23-A16 0 7 Memory Address Bits A15-A12 0 7 4 Size Bits 3-0 3 0 2-31 ST6x86 Table 2.18. ARR0 - ARR7 Register Index Assignment ARR Register Memory Address (A31 - A24) Memory Address (A23 - A16) Memory Address (A15 - A12) Address Region Size (3 - 0) ARR0 C4h C5h C6h C6h ARR1 C7h C8h C9h C9h ARR2 CAh CBh CCh CCh ARR3 CDh CEh CFh CFh ARR4 D0h D1h D2h D2h ARR5 D3h D4h D5h D5h ARR6 D6h D7h D8h D8h ARR7 D9h DAh DBh DBh Table 2.19. Bit Definitions for SIZE Field BLOCK SIZE SIZE (3-0) ARR0-6 . 2-32 BLOCK SIZE SIZE (3-0) ARR7 ARR0-6 ARR7 0h Disabled Disabled 8h 512 KBytes 32 MBytes 1h 4 KBytes 256 KBytes 9h 1 MBytes 64 MBytes 2h 8 KBytes 512 KBytes Ah 2 MBytes 128 MBytes 3h 16 KBytes 1 MBytes Bh 4 MBytes 256 MBytes 4h 32 KBytes 2 MBytes Ch 8 MBytes 512 MBytes 5h 64 KBytes 4 MBytes Dh 16 MBytes 1 GBytes 6h 128 KBytes 8 MBytes Eh 32 MBytes 2 GBytes 7h 256 KBytes 16 MBytes Fh 4 GBytes 4 GBytes ST6x86 2.4.3.3 Region Control Registers The Region Control Registers (RCR0 - RCR7) specify the attributes associated with the ARRx address regions. The bit definitions for the region control registers are shown in Figure 2.20 and in Table 2.20. Cacheability, weak write ordering, weak locking, write gathering, cache write through policies and control of the LBA# pin can be activated or deactivated using the attribute bits. If an address is accessed that is not in a memory region defined by the ARRx registers, the following conditions will apply: Overlapping Conditions Defined. If two regions specified by ARRx registers overlap and conflicting attributes are specified, the following attributes take precedence: - LBA# pin is asserted Write-back is disabled Writes are not gathered Strong locking takes place Strong write ordering takes place The overlapping regions are non-cacheable. - LBA# pin is asserted - If the memory address is cached, write-back is enabled if WB/WT# is returned high. - Writes are not gathered - Strong locking takes place - Strong write ordering takes place - The memory access is cached, if KEN# is returned asserted. 2-33 ST6x86 Figure 2.20. Region Control Registers (RCR0-RCR7) 7 6 5 4 3 2 1 0 Reserved Reserved NLB WT WG WL WWO RCD / RCE* *Note: RCD is defined for RCR0-RCR6. RCE is defined for RCR7. Table 2.20. RCR0-RCR7 Bit Definitions RCRx BIT POSITION 0- 6 0 RCD If = 1: Disables caching for address region specified by ARRx. 7 0 RCE If = 1: Enables caching for address region ARR7. 0- 7 1 WWO If = 1: Weak write ordering for address region specified by ARRx. 0- 7 2 WL If = 1: Weak locking for address region specified by ARRx. 0- 7 3 WG If = 1: Write gathering for address region specified by ARRx. 0- 7 4 WT If = 1: Address region specified by ARRx is write-through. 0- 7 5 NLB If = 1:LBA# pin is not asserted for access to address region specified by ARRx NAME DESCRIPTION Note: Bits 6 and 7 are reserved. Region Cache Disable (RCD). Setting RCD to a one defines the address region as non-cacheable. Whenever possible, the RCRs should be used to define non-cacheable regions rather than using external address decoding and driving the KEN# pin. Region Cache Enable (RCE). Setting RCE to a one defines the address region as cacheable. RCE is used to define the system main memory as cacheable memory. It is implied that memory outside the region is non-cacheable. Weak Write Ordering (WWO). Setting WWO=1 enables weak write ordering for that address region. Enabling WWO allows the ST6x86 CPU to issue writes in its internal cache in an order different than their order in the code stream. External writes always occur in order (strong ordering). Therefore, this should only be enabled for memory regions that are NOT sensitive to this condition. WWO should not be enabled for memory mapped I/O. WWO only applies to memory regions that have been cached and designated as write-back. It also applies to previously cached addresses even if the cache has been disabled (CD=1). Enabling WWO removes the write-ordering restriction and improves performance due to reduced pipeline stalls. 2-34 Weak Locking (WL). Setting WL=1 enables weak locking for that address region. With WL enabled, all bus cycles are issued with the LOCK# pin negated except for page table accesses and interrupt acknowledge cycles. Interrupt acknowledge cycles are executed as locked cycles even though LOCK# is negated. With WL=1, previously non-cacheable locked cycles are executed as unlocked cycles and therefore, may be cached, resulting in higher performance. The NO_LOCK bit of CCR1 enables weak locking for the entire address space. The WL bit allows weak locking only for specific address regions. WL is independent of the cacheability of the address region. ST6x86 Write Gathering (WG). Setting WG=1 enables write gathering for the associated address region. Write gathering allows multiple byte, word, or dword sequential address writes to accumulate in the on-chip write buffer. (As instructions are executed, the results are placed in a series of output buffers. These buffers are gathered into the final output buffer). When access is made to a non-sequential memory location or when the 8-byte buffer becomes full, the contents of the buffer are written on the external 64-bit data bus. Performance is enhanced by avoiding as many as seven memory write cycles. WG should not be used on memory regions that are sensitive to write cycle gathering. WG can be enabled for both cacheable and non-cacheable regions. Write Through (WT). Setting WT=1 defines the address region as write-through instead of write-back, assuming the region is cacheable. Regions where system ROM are loaded (shadowed or not) should be defined as write-through. LBA# Not Asserted (NLB). Setting NLB=1 prevents the microprocessor from asserting the Local Bus Access (LBA#) output pin for accesses to that address region. The RCR regions may be used to define non-local bus address regions. The LBA# pin could then be asserted for all regions, except those defined by the RCRs. The LBA# signal may be used by the external hardware (e.g., chipsets) as an indication that local bus accesses are occurring. 2-35 ST6x86 2.4.3.4 Device Identification Registers The Device Identification Registers (DIR0, DIR1) contain CPU identification, CPU stepping and CPU revision information. Bit definitions are shown in (Figure 2.21), Table 2.21, Figure 2.22 and Table 2.22 respectively. Data in these registers cannot be changed. These registers can be read by using I/O ports 22 and 23. The register index for DIR0 is FEh and the register index for DIR1 is FFh. Figure 2.21. Device Identification Register 0 (DIR0) 7 0 DEVID Table 2.21. DiR0 Bit Definitions BIT POSITI ON 7-0 NAME DEVID DESCRIPTION CPU Device Identification Number (read only). Figure 2.22. Device Identification Register 1 (DIR1) 7 4 3 0 SID RID Table 2.22. DIR 1 Bit Definitions 2-36 BIT POSITION NAME 7-4 SID CPU Step Identification Number (read only). 3-0 RID CPU Revision Identification (read only). DESCRIPTION ST6x86 2.4.4 Debug Registers The Debug Address Registers (DR0-DR3) each contain the linear address for one of four possible breakpoints. Each breakpoint is further specified by bits in the Debug Control Register (DR7). For each breakpoint address in DR0-DR3, there are corresponding fields L, R/W, and LEN in DR7 that specify the type of memory access associated with the breakpoint. Six debug registers (DR0-DR3, DR6 and DR7), shown in Figure 2.23, support debugging on the ST6x86 CPU. The bit definitions for the debug registers are listed in Table 2.23. Memory addresses loaded in the debug registers, referred to as "breakpoints", generate a debug exception when a memory access of the specified type occurs to the specified address. A data breakpoint can be specified for a particular kind of memory access such as a read or a write. Code breakpoints can also be set allowing debug exceptions to occur whenever a given code access (execution) occurs. The R/W field can be used to specify instruction execution as well as data access breakpoints. Instruction execution breakpoints are always taken before execution of the instruction that matches the breakpoint. The Debug Status Register (DR6) reflects conditions that were in effect at the time the debug exception occurred. The contents of the DR6 register are not automatically cleared by the processor after a debug exception occurs and, therefore, should be cleared by software at the appropriate time. The size of the debug target can be set to 1, 2, or 4 bytes. The debug registers are accessed via MOV instructions which can be executed only at privilege level 0. Figure 2.23. Debug Registers 3 3 1 0 2 9 LE N 3 R /W 3 LE N 2 R /W 2 LE N 1 R/W 1 0 0 0 0 0 0 0 2 8 0 2 7 2 6 0 2 5 2 4 0 2 3 2 2 2 1 0 2 1 0 9 0 1 8 LEN 0 0 0 1 7 1 1 6 5 1 4 1 1 1 1 3 2 1 0 9 8 7 6 5 4 3 2 1 0 L G E 3 L 3 G L 2 2 G 1 L 1 G 0 L 0 1 B 3 B 2 B 1 B 0 D R6 R/W 0 0 0 G G D 0 0 1 E 0 B T B S 0 0 0 1 1 1 1 1 1 1 D R7 BRE A KP OINT 3 LINE A R A DD R ESS D R3 B RE AK POINT 2 LINE A R A DDR E SS D R2 B REA KP OINT 1 LINE A R ADD R ESS D R1 B REA KP OINT 0 LINE A R ADD R ESS D R0 ALL BITS MARKED AS 0 OR 1 ARE RESERVED AND SHOULD NOT BE MODIFIED. 1703203 2-37 ST6x86 Code execution breakpoints may also be generated by placing the breakpoint instruction (INT 3) at the location where control is to be regained. Additionally, the single-step feature may be enabled by setting the TF flag in the EFLAGS register. This causes the processor to perform a debug exception after the execution of every instruction. Table 2.23. DR6 and DR7 Debug Register Field Definitions REGISTER DR6 NUMBER OF BITS DESCRIPTION Bi 1 Bi is set by the processor if the conditions described by DRi, R/Wi, and LENi occurred when the debug exception occurred, even if the breakpoint is not enabled via the Gi or Li bits. BT 1 BT is set by the processor before entering the debug handler if a task switch has occurred to a task with the T bit in the TSS set. BS 1 BS is set by the processor if the debug exception was triggered by the single-step execution mode (TF flag in EFLAGS set). 2 Specifies type of break for the linear address in DR0, DR1, DR3, DR4: 00 - Break on instruction execution only 01 - Break on data writes only 10 - Not used 11 - Break on data reads or writes. LENi 2 Specifies length of the linear address in DR0, DR1, DR3, DR4: 00 - One byte length 01 - Two byte length 10 - Not used 11 - Four byte length. Gi 1 If set to a 1, breakpoint in DRi is globally enabled for all tasks and is not cleared by the processor as the result of a task switch. Li 1 If set to a 1, breakpoint in DRi is locally enabled for the current task and is cleared by the processor as the result of a task switch. GD 1 Global disable of debug register access. GD bit is cleared whenever a debug exception occurs. FIELD R/Wi DR7 38 ST6x86 2.4.5 Test Registers The test registers can be used to test the on-chip unified cache and to test the main TLB. The test registers are also used to enable ST6x86 CPU variable-size paging. Test registers TR3, TR4, and TR5 are used to test the unified cache. Use of these registers is described with the memory caches later in this chapter in Section 2.7.1.1. Test registers TR6 and TR7 are used to test the TLB. Use of these test registers is described in Section 2.6.4.2. 2-39 ST6x86 2.5 Address Space The ST6x86 CPU can directly address 64 KBytes of I/O space and 4 GBytes of physical memory (Fig. 2.24). Access can be made to memory addresses between 0000 0000h and Memory Address Space. FFFF FFFFh. This 4 GByte memory space can be accessed using byte, word (16 bits), or doubleword (32 bits) format. Words and doublewords are stored in consecutive memory bytes with the low-order byte located in the lowest address. The physical address of a word or doubleword is the byte address of the low-order byte. Figure 2.24. Memory and I/O Address Spaces Physical Memory Space FFFF FFFFh I/O Adress Space FFFF F FFFh Not Accessible Physical Memory 4 GBytes 0000 FFF Fh 64 KBytes 0000 0000h 0000 0000h 6x86 Configuration Register I/O Space 0000 0023h 0000 0022h 1730900 2-40 ST6x86 I/O Address Space 2.6 Memory Addressing Methods The ST6x86 I/O address space is accessed using IN and OUT instructions to addresses referred to as "ports". The accessible I/O address space size is 64 KBytes and can be accessed through 8-bit, 16-bit or 32-bit ports. The execution of any IN or OUT instruction causes the M/IO# pin to be driven low, thereby selecting the I/O space instead of memory space. With the ST6x86 CPU, memory can be addressed using nine different addressing modes (Table 2.24). These addressing modes are used to calculate an offset address often referred to as an effective address. Depending on the operating mode of the CPU, the offset is then combined using memory management mechanisms to create a physical address that actually addresses the physical memory devices. The accessible I/O address space ranges between locations 0000 0000h and 0000 FFFFh (64 KBytes). The I/O locations (ports) 22h and 23h can be used to access the ST6x86 configuration registers. Memory management mechanisms on the ST6x86 CPU consist of segmentation and paging. Segmentation allows each program to use several independent, protected address spaces. Paging supports a memory subsystem that simulates a large address space using a small amount of RAM and disk storage for physical memory. Either or both of these mechanisms can be used for management of the ST6x86 CPU memory address space. 2-41 ST6x86 2.6.1 Offset Mechanism Figure 2.25. Offset Address Calculation The offset mechanism computes an offset (effective) address by adding together one or more of three values: a base, an index and a displacement. When present, the base is the value of one of the eight 32-bit general registers. The index if present, like the base, is a value that is in one of the eight 32-bit general purpose registers (not including the ESP register). The index differs from the base in that the index is first multiplied by a scale factor of 1, 2, 4 or 8 before the summation is made. The third component added to the memory address calculation is the displacement. The displacement is a value of up to 32-bits in length supplied as part of the instruction. Figure 2.25 illustrates the calculation of the offset address. Index Displacement Base Scaling x1, x2, x4, x8 Nine valid combinations of the base, index, scale factor and displacement can be used with the ST6x86 CPU instruction set. These combinations are listed in Table 2.24. The base and index both refer to contents of a register as indicated by [Base] and [Index]. Offset Address (Effective Address) 1706 603 Table 2.24. Memory Addressing Modules ADDRESSING MODE BASE INDEX SCALE DISPLACEMENT FACTOR (DP) (SF) Direct x Register Indirect x Based x OA = DP OA = [BASE] Index x Scaled Index x Based Index x x Based Scaled Index x x Based Index with Displacement x x Based Scaled Index with Displacement x x 2-42 OFFSET ADDRESS (OA) CALCULATION x x OA = [BASE] + DP x OA = [INDEX] + DP x OA = ([INDEX] * SF) + DP OA = [BASE] + [INDEX] x x OA = [BASE] + ([INDEX] * SF) x OA = [BASE] + [INDEX] + DP x OA = [BASE] + ([INDEX] * SF) + DP ST6x86 2.6.2 Memory Addressing Protected Mode Memory Addressing Real Mode Memory Addressing In protected mode, three mechanisms calculate a physical memory address (Figure 2.27). In real mode operation, the ST6x86 CPU only addresses the lowest 1 MByte of memory. To calculate a physical memory address, the 16-bit segment base address located in the selected segment register is multiplied by 16 and then the 16-bit offset address is added. The resulting 20-bit address is then extended. Three hexadecimal zeros are added as upper address bits to create the 32-bit physical address. Figure 2.26 illustrates the real mode address calculation. The addition of the base address and the offset address may result in a carry. Therefore, the resulting address may actually contain up to 21 significant address bits that can address memory in the first 64 KBytes above 1 MByte. - Offset Mechanism that produces the offset or effective address as in real mode. - Selector Mechanism that produces the base address. - Optional Paging Mechanism that translates a linear address to the physical memory address. The offset and base address are added together to produce the linear address. If paging is not enabled, the linear address is used as the physical memory address. If paging is enabled, the paging mechanism is used to translate the linear address into the physical address. The offset mechanism is described earlier in this section and applies to both real and protected mode. The selector and paging mechanisms are described in the following paragraphs. Figure 2.26. Real Mode Address Calculation 000h Offset Address 16 12 Offset Mechanism 20 Selected Segment Register 16 X 16 32 Linear Address (Physical Address) 20 1708304 Figure 2.27. Protected Mode Address Calculation 32 Offset Mechanism Offset Address 32 32 Selector Mechanism Segment Base Address Linear Address Optional Paging Mechanism 32 Physical Memory Address 1706504 2-43 ST6x86 2.6.3 Selector Mechanism Using segmentation, memory is divided into an arbitrary number of segments, each containing usually much less than the 232 byte (4 GByte) maximum. The six segment registers (CS, DS, SS, ES, FS and GS) each contain a 16-bit selector that is used when the register is loaded to locate a segment descriptor in either the global descriptor table (GDT) or the local descriptor table (LDT). The segment descriptor defines the base address, limit, and attributes of the selected segment and is cached on the ST6x86 CPU as a result of loading the selector. The cached descriptor contents are not visible to the programmer. When a memory reference occurs in protected mode, the linear address is generated by adding the segment base address in the hidden portion of the segment register to the offset address. If paging is not enabled, this linear address is used as the physical memory address. Figure 2.28 illustrates the operation of the selector mechanism. Figure 2.28. Selector Mechanism SELECTOR LOAD INSTRUCTION 15 Selector INDEX In Segment Register SEGMENT REGISTER SELECTED BY DECODED INSTRUCTION 0 TI RPL Segment Register Identification Segment Descriptor TI=0 Global Descriptor Table TI=1 Segment Register File and Descriptor Cache Segment Base Address Segment Descriptor Local Descriptor Table 2-44 1739100 ST6x86 2.6.4 Paging Mechanisms The paging mechanisms (Figure 2.29) translate linear addresses to their corresponding physical addresses. For traditional paging, the page size is always 4 KBytes. If ST6x86 Variable-Size Paging is selected, a page size may be as large as 4 GBytes. Use of larger page sizes allows large memory areas such as video memory to be placed in a single page, eliminating page table thrashing. Paging is activated when the PG and the PE bits within the CR0 register are set. 2.6.4.1 Traditional Paging Mechanism The traditional paging mechanism translates the 20 most significant bits of a linear address to a physical address. The linear address is divided into three fields DTI, PTI, PFO (Figure 2.30). These fields respectively select: - an entry in the directory table, - an entry in the page table selected by the directory table - the offset in the physical page selected by the page table The directory table and all the page tables can be considered as pages as they are 4-KBytes in size and are aligned on 4-KByte boundaries. Each entry in these tables is 32 bits in length. The fields within the entries are detailed in Figure 2.30 and Table 2.25. A single page directory table can address up to 4 GBytes of virtual memory (1,024 page tables-- each table can select 1,024 pages and each page contains 4 KBytes). Translation Lookaside Buffer (TLB) is made up of three caches Figure 2.30. - the DTE Cache caches directory table entries - the Main TLB caches page tables entries - the Victim TLB stores PTEs that have been evicted from the Main TLB The DTE cache is a 4-entry fully associative cache, the main TLB is a 128-entry direct mapped cache and the victim TLB is an 8-entry fully associative cache.The DTE cache caches the four most recent DTEs so that future TLB misses only require a single page table read to calculate the physical address. The DTE cache is disabled following RESET and is enabled by setting the DTE_EN bit (CCR4 bit4). Figure 2.29. Paging Mechanisms Variable-Size Paging Mechanism Physical Address Linear Address Traditional Paging Mechanism 1739600 2-45 ST6x86 Figure 2.30. Traditional Paging Mechanism Linear Address 31 22 21 Directory Table Index (DTI) 12 11 0 Page Table Index (PTI) Page Frame Offset (PFO) 127 3 DTE Cache 4 Entry Fully Associative 0 Main TLB 128 Entry Direct Mapped 4 Gb 0 7 Victim TLB 8 Entry Fully Associative 0 4 Kb 4 Kb 4 Kb Physical Page PTE DTE 0 CR3 Directory Table 0 0 0 Page Table Memory Control Register 1728800 Figure 2.31. Directory and Page Table Entry (DTE and PTE) Format 12 11 31 BASE ADDRESS 10 9 AVAILABLE 8 6 5 4 3 2 1 0 RESERVED D A P C D P W T U / S W / R P 7 Note: In DTE format, bit 6 is reserved 2-46 1708503 ST6x86 Table 2.25. Directory and Page Table Entry (DTE and PTE) Bit Definitions BIT POSITION FIELD NAME DESCRIPTIO N BASE ADDRESS Specifies the base address of the page or page table. 11-9 -- Undefined and available to the programmer. 8-7 -- Reserved and not available to the programmer. 6 D Dirty Bit. If set, indicates that a write access has occurred to the page (PTE only, undefined in DTE). 5 A Accessed Flag. If set, indicates that a read access or write access has occurred to the page. 4 PCD Page Caching Disable Flag. If set, indicates that the page is not cacheable in the on-chip cache. 3 PWT Page Write-Through Flag. If set, indicates that writes to the page or page tables that hit in the on-chip cache must update both the cache and external memory. 2 U/S User/Supervisor Attribute. If set (user), page is accessible at privilege level 3. If clear (supervisor), page is accessible only when CPL 2. 1 W/R Write/Read Attribute. If set (write), page is writable. If clear (read), page is read only. P Present Flag. If set, indicates that the page is present in RAM memory, and validates the remaining DTE/PTE bits. If clear, indicates that the page is not present in memory and the remaining DTE/PTE bits can be used by the programmer. 31-12 0 For a TLB hit, the TLB eliminates accesses to external directory and page tables. The victim TLB increases the apparent associativity of the main TLB and helps eliminate TLB trashing (unproductive TLB management). When an entry in the main TLB is replaced, a copy of the replaced entry is sent to the victim TLB before the entry in the main TLB is overwritten. If the victim TLB receives a hit, its entry is swapped with a main TLB entry. 2.6.4.2 Translation Lookaside Buffer Testing The TLB can be tested by writing to a main TLB followed by performing a TLB lookup (TLB read) to see if the expected contents are within the TLB. TLB test operations are performed using test register TR6 and TR7 shown in Figure 2.32. Tables 2.26 through 2.28 list the bit definitions for TR6 and TR7. The TLB must be flushed by the software when entries in the page tables are changed. The TLB is flushed whenever the CR3 register is loaded. A particular page can be flushed from the TLB by using the INVLPG instruction. This instruction also flushes the entire DTE cache. 2-47 ST6x86 Main TLB Write. To perform a direct write to a main TLB entry, the TR7 register is configured with the desired physical address as well as the PCD and PWT bits. The BI, HV, HD and HB bits are not used. The TR6 register is then configured with the linear address, D, U, W and V bits. The D, U, and W bits must be complements of the D#, U#, and W# bits during a write. When the TR6 register is configured, the ST6x86 CPU writes the linear and physical address into the main TLB along with the A, D, U, and W bits. The main TLB entry is selected by bits 12 through 18 of the linear address field. TLB Lookup. During a TLB lookup, the ST6x86 CPU queries the TLB with a given linear address and expected A, W, U and D values. The query returns a corresponding physical address, and the Figure 2.32. TLB Test Registers 2-48 source of the address. The address source could be from the main TLB, from the victim TLB or from the variable-size paging mechanism. The TLB lookup involves a single TR6 register write. The CMD bits are set to 0x1. The D, U, W, D#, U# and W# bits are not used during TLB lookups. After a TLB lookup, the HV, HD and HB bits in TR7 indicate which (if any) PTEs were found with the requested linear address. If a TLB entry was found for a PTE in the victim or variable size-paging cache, the BI bit in the TR7 register will contain the index of the particular entry. If multiple entries respond, only the HV, HD and HB bits are valid and all TR7 fields are undefined. ST6x86 Table 2.26. TLB Test Register Bit Definitions REGISTER NAME TR7 NAME RANGE ADR7 31-12 PCD 11 Page-level cache disable bit (PCD). Corresponds to the PCD bit of a page table entry. PWT 10 Page-level cache write-through bit (PWT). Corresponds to the PWT bit of a page table entry. BI 9-7 Cell index for victim TLB and block cache operations. HV 5 Victim TLB hit. HD 4 Main TLB hit. HB 3 Variable-Size Paging Mechanism hit. ADR6 V D, D# TR6 U, U# W, W# A, A# CMD DESCRIPTION Physical address or variable page size mechanism mask. TLB lookup: data field from the TLB. TLB write: data field written into the TLB. 31-12 Linear Address. TLB lookup: The TLB is interrogated per this address. If one and only one match occurs in the TLB, the rest of the fields in TR6 and TR7 are updated per the matching TLB entry. TLB write: A TLB entry is allocated to this linear address. 11 PTE Valid. TLB write: If set, indicates that the TLB entry contains valid data. If clear, target entry is invalidated. 10-9 Dirty Attribute Bit and its complement. Refer to Table 2.27. 8-7 User/Supervisor Attribute Bit and its complement. Refer to Table 2.27. 6-5 Write Protect bit and its complement. Refer to Table 2.27. 4-3 Accessed Bit and its complement. Used for block cache entries only. Refer to Table 2.27. 2-0 Array Command Select. Determines TLB array command. Refer to Table 2.28. 2-49 ST6x86 Table 2.27. TR6 Attribute Bit Pairs BIT BIT# EFFECT ON TLB LOOKUP EFFECT ON TLB WRITE 0 0 Do not match. Undefined. 0 1 If bit = 0, match. Bit is cleared. 1 0 If bit = 1, match. The bit is set. 1 1 If bit = 0 or 1, match. Undefined. Note: "BIT" applies to A, D, U or W fields in TR6; "BIT#" applies to A#, D#, U#, or W# fields in TR6. Table 2.28. TR6 Command Bits CMD Command 0x0 Direct write to main TLB. 0x1 TLB lookup for a linear address in all arrays. 100 Write to variable page size mask only. 110 Write to variable page size linear and physical address fields. 101 Read variable page size mask and linear address. 111 Read variable page size cache physical and linear address. Note: x = don't care 2-50 ST6x86 2.6.5 Variable-Size Paging Mechanism The Variable-Size Paging Mechanism (VSPM) is an advanced alternative to traditional paging. As shown in Figure 2.33, VSPM allows the creation of pages ranging in size from 4 KBytes to 4 GBytes. The larger page size nearly eliminates page table thrashing associated with using multiple 4-KByte pages. For example, paging 1 MByte of memory requires 256 4-KByte pages using traditional paging. The software not only incurs overhead during setting up the 256 pages, but also incurs additional overhead accessing the page tables each time a page is not found in the on-chip TLB. In contrast, a single 1-MByte page virtually eliminates the overhead. Configuring Variable-Size Pages. The VSPM is configured using TLB test registers, TR6 and TR7 (These registers are also used to test the TLB). The VSPM configuration is performed in much the same manner as when writing to a line of the TLB (Refer to Section 2.6.4.2.). The major exception to this, is that a mask field is written to the VSPM as part of the VSPM configuration. The physical address, linear address, valid bit and attribute bits in a main TLB write all have the same meaning as in a main TLB read except for that CMD=110. The BI field is used to select the VSPM cell to be written. A VSPM mask setup operation is performed when CMD=100 and a test register write is performed. During a VSPM mask setup, the TR7 address field is used as the mask field. The mask field selectively masks linear address bits 31-12 from the VSPM tag compare. This has the effect of allowing the VSPM to map pages greater than 4 KBytes. Figure 2.33. Variable-Size Paging Mechanism Linear Address 4 GByte Variable-Size Paging Mechanism < 4 GByte Physical Address Physical Page 0 Memory 0 1739200 2-51 ST6x86 After a VSPM mask setup, the valid bit, attribute bits, and the linear address are left in undefined states. Therefore, the VSPM mask setup should be performed prior to other VSPM operations. Unlike the victim and main TLBs, the VSPM operations make use of the accessed bit. During a VSPM mask or physical address write the A and A# fields are written to the VSPM. VSPM Reads. VSPM reads are performed with the address of the entry to be read in the BI field of the TR7 register and with CMD=111. The entry's and physical address is read into the TR6 and TR7 address fields as well as the valid bit, and attribute bits. If CMD=101, the linear address, mask, valid bit and attribute bits are read. 2.7 Memory Caches The ST6x86 CPU contains two memory caches as described in Chapter 1. The Unified Cache acts as the primary data cache, and secondary instruction cache. The Instruction Line Cache is the primary instruction cache and provides a high speed instruction stream for the Integer Unit. The unified cache is dual-ported allowing simultaneous access to any two unique banks. Two different banks may be accessed at the same time permitting any two of the following operations to occur in parallel: - Code fetch - Data read (X pipe, Y pipe or FPU) - Data write (X pipe, Y pipe or FPU). 2-52 2.7.1 Unified Cache MESI States The unified cache lines are assigned one of four MESI states as determined by MESI bits stored in tag memory. Each 32-byte cache line is divided into two 16-byte sectors. Each sector contains its own MESI bits. The four MESI states are described below: Modified MESI cache lines are those that have been updated by the CPU, but the corresponding main memory location has not yet been updated by an external write cycle. Modified cache lines are referred to as dirty cache lines. Exclusive MESI lines are lines that are exclusive to the ST6x86 CPU and are not duplicated within another caching agent's cache within the same system. A write to this cache line may be performed without issuing an external write cycle. Shared MESI lines may be present in another caching agent's cache within the same system. A write to this cache line forces a corresponding external write cycle. Invalid MESI lines are cache lines that do not contain any valid data. ST6x86 2.7.1.1 Unified Cache Testing The unified cache can be tested through the use of TR3, TR4, and TR5 on-chip test registers. Fields within these test registers identify which area of cache will be selected for testing. Cache Organization. The unified cache (Figure 2.34) is divided into 32-bytes lines. This cache is divided into four sets. Since a set (as well as the cache) is smaller than main memory, each line in the set corresponds to more than one line in main memory. When a cache line is allocated, bits A31-A12 of the main memory address are stored in the cache line tag. The remaining address bits are used to identify the specific 32-byte cache line (A11-A5), and the specific 4-byte entry within the cache line (A4-A2). Test Initiation. A test register operation is initiated by writing to the TR5 register shown in Figure 2.35 using a special MOV instruction. The TR5 CTL field, detailed in Table 2.29, determines the function to be performed. For cache writes, the registers TR4 and TR3 must be initialized before a write is made to TR5. Eight 4-byte accesses are required to access a complete cache line. Figure 2.34. Unified Cache 32 Bytes of Data 128 Lines SET 0 SET 1 512 L ines SET 2 SET 3 Upper Sector ENT ENT ENT ENT = 4-byte entry Low er Sector ENT ENT ENT ENT ENT Typ ical Single Line 17 39 70 0 2-53 ST6x86 Figure 2.35. Cache Test Registers RESERVED SET 31 LINE 13 12 11 10 TAG (CACHE TAG ADDRESS) 8 RESERVED 12 11 10 31 9 9 ENT 7 6 5 4 3 MESIU MESIL 8 7 6 5 4 CTL TR5 2 1 0 MRU 3 2 1 TR4 0 TR3 DATA (CACHE DATA) 31 0 1729400 Table 2.29. Cache Test Register Bit Definitions REGISTER NAME FIELD NAME RANGE SET 13 - 12 Cache set selection (one of four "sets"). LINE 11 - 5 Cache line selection (one of 128 lines). ENT 4-2 Entry selection (one of eight 4-byte entries in a line). CTL 1-0 Control field If = 00: flush cache without invalidate If = 01: write cache If = 10: read cache If = 11: no cache or test register modification TAG 31 - 12 TR5 Physical address for selected line 7-6 If If If If = 00, Modified Upper Sector MESI bits = 01, Shared Upper Sector MESI bits = 10, Exclusive Upper Sector MESI bits = 11, Invalid Upper Sector MESI bits* 5-4 If If If If = 00, Modified Lower Sector MESI bits = 01, Shared Lower Sector MESI bits = 10, Exclusive Lower Sector MESI bits = 11, Invalid Lower Sector MESI bits* MRU 3-0 Used to determine the Least Recently Used (LRU) line. DATA 31 - 0 Data written or read during a cache test. MESIU TR4 MESIL TR3 DESCRIPTION *Note: All 32 bytes should contain valid data before a line is marked as valid. 2-54 ST6x86 Write Operations. During a write, the TR3 DATA (32-bits) and TAG field information is written to the address selected by the SET, LINE, and ENT fields in TR5. Read Operations. During a read, the cache address selected by the SET, LINE and ENT fields in TR5 are used to read data into the TR3 DATA (32-bits) field. The TAG, MESI and MRU fields in TR4 are updated with the information from the selected line. TR3 holds the selected read data. Cache Flushing. A cache flush occurs during a TR5 write if the CTL field is set to zero. During flushing, the CPU's cache controller reads through all the lines in the cache. "Modified" lines are redefined as "shared" by setting the shared MESI bit. Clean lines are left in their original state. 2.8 Interrupts and Exceptions The processing of either an interrupt or an exception changes the normal sequential flow of a program by transferring program control to a selected service routine. Except for SMM interrupts, the location of the selected service routine is determined by one of the interrupt vectors stored in the interrupt descriptor table. Hardware interrupts are generated by signal sources external to the CPU. All exceptions (including so-called software interrupts) are produced internally by the CPU. 2.8.1 Interrupts External events can interrupt normal program execution by using one of the three interrupt pins on the ST6x86 CPU. - Non-maskable Interrupt (NMI pin) - Maskable Interrupt (INTR pin) - SMM Interrupt (SMI# pin). For most interrupts, program transfer to the interrupt routine occurs after the current instruction has been completed. When the execution returns to the original program, it begins immediately following the last completed instruction. With the exception of string operations, interrupts are acknowledged between instructions. Long string operations have interrupt windows between memory moves that allow interrupts to be acknowledged. The NMI interrupt cannot be masked by software and always uses interrupt vector 2 to locate its service routine. Since the interrupt vector is fixed and is supplied internally, no interrupt acknowledge bus cycles are performed. This interrupt is normally reserved for unusual situations such as parity errors and has priority over INTR interrupts. Once NMI processing has started, no additional NMIs are processed until an IRET instruction is executed, typically at the end of the NMI service routine. If NMI is re-asserted prior to execution of the IRET instruction, one and only one NMI rising edge is stored and processed after execution of the next IRET. 2-55 ST6x86 During the NMI service routine, maskable interrupts may be enabled (unmasked). If an unmasked INTR occurs during the NMI service routine, the INTR is serviced and execution returns to the NMI service routine following the next IRET. If a HALT instruction is executed within the NMI service routine, the ST6x86 CPU restarts execution only in response to RESET, an unmasked INTR or an SMM interrupt. NMI does not restart CPU execution under this condition. The INTR interrupt is unmasked when the Interrupt Enable Flag (IF) in the EFLAGS register is set to 1. When an INTR interrupt occurs, the CPU performs two locked interrupt acknowledge bus cycles. During the second cycle, the CPU reads an 8-bit vector that is supplied by an external interrupt controller. This vector selects one of the 256 possible interrupt handlers which will be executed in response to the interrupt. The SMM interrupt has higher priority than either INTR or NMI. After SMI# is asserted, program execution is passed to an SMI service routine that runs in SMM address space reserved for this purpose. The remainder of this section does not apply to the SMM interrupts. SMM interrupts are described in greater detail later in this chapter. 2-56 2.8.2 Exceptions Exceptions are generated by an interrupt instruction or a program error. Exceptions are classified as traps, faults or aborts depending on the mechanism used to report them and the restartability of the instruction that first caused the exception. A Trap Exception is reported immediately following the instruction that generated the trap exception. Trap exceptions are generated by execution of a software interrupt instruction (INTO, INT 3, INT n, BOUND), by a single-step operation or by a data breakpoint. Software interrupts can be used to simulate hardware interrupts. For example, an INT n instruction causes the processor to execute the interrupt service routine pointed to by the nth vector in the interrupt table. Execution of the interrupt service routine occurs regardless of the state of the IF flag in the EFLAGS register. The one byte INT 3, or breakpoint interrupt (vector 3), is a particular case of the INT n instruction. By inserting this one byte instruction in a program, the user can set breakpoints in the code that can be used during debug. Single-step operation is enabled by setting the TF bit in the EFLAGS register. When TF is set, the CPU generates a debug exception (vector 1) after the execution of every instruction. Data breakpoints also generate a debug exception and are specified by loading the debug registers (DR0-DR7) with the appropriate values. ST6x86 A Fault Exception is reported prior to completion of the instruction that generated the exception. By reporting the fault prior to instruction completion, the CPU is left in a state that allows the instruction to be restarted and the effects of the faulting instruction to be nullified. Fault exceptions include divide-by-zero errors, invalid opcodes, page faults and coprocessor errors. Instruction breakpoints (vector 1) are also handled as faults. After execution of the fault service routine, the instruction pointer points to the instruction that caused the fault. An Abort Exception is a type of fault exception that is severe enough that the CPU cannot restart the program at the faulting instruction. The double fault (vector 8) is the only abort exception that occurs on the ST6x86 CPU. 2.8.3 Interrupt Vectors When the CPU services an interrupt or exception, the current program's FLAGS, code segment and instruction pointer are pushed onto the stack to allow resumption of execution of the interrupted program. In protected mode, the processor also saves an error code for some exceptions. Program control is then transferred to the interrupt handler (also called the interrupt service routine). Upon execution of an IRET at the end of the service routine, program execution resumes by popping from the stack, the instruction pointer, code segment, and FLAGS. Interrupt Vector Assignments Each interrupt (except SMI#) and exception is assigned one of 256 interrupt vector numbers Table 2.30. The first 32 interrupt vector assignments are defined or reserved. INT instructions acting as software interrupts may use any of the interrupt vectors, 0 through 255. 2-57 ST6x86 Table 2.30. Interrupt Vector Assignments INTERRUPT VECTOR FUNCTION EXCEPTION TYPE 0 Divide error FAULT 1 Debug exception TRAP/FAULT* 2 NMI interrupt 3 Breakpoint TRAP 4 Interrupt on overflow TRAP 5 BOUND range exceeded FAULT 6 Invalid opcode FAULT 7 Device not available FAULT 8 Double fault ABORT 9 Reserved 10 Invalid TSS FAULT 11 Segment not present FAULT 12 Stack fault FAULT 13 General protection fault TRAP/FAULT 14 Page fault FAULT 15 Reserved 16 FPU error FAULT 17 Alignment check exception FAULT 18-31 Reserved 32-255 Maskable hardware interrupts TRAP 0-255 Programmed interrupt TRAP *Note: Data breakpoints and single-steps are traps. All other debug exceptions are faults. 2-58 ST6x86 In response to a maskable hardware interrupt (INTR), the ST6x86 CPU issues interrupt acknowledge bus cycles used to read the vector number from external hardware. These vectors should be in the range 32 - 255 as vectors 0 - 31 are reserved. Interrupt Descriptor Table The interrupt vector number is used by the ST6x86 CPU to locate an entry in the interrupt descriptor table (IDT). In real mode, each IDT entry consists of a four-byte far pointer to the beginning of the corresponding interrupt service routine. In protected mode, each IDT entry is an eight-byte descriptor. The Interrupt Descriptor Table Register (IDTR) specifies the beginning address and limit of the IDT. Following reset, the IDTR contains a base address of 0h with a limit of 3FFh. The IDT can be located anywhere in physical memory as determined by the IDTR register. The IDT may contain different types of descriptors: interrupt gates, trap gates and task gates. Interrupt gates are used primarily to enter a hardware interrupt handler. Trap gates are generally used to enter an exception handler or software interrupt handler. If an interrupt gate is used, the Interrupt Enable Flag (IF) in the EFLAGS register is cleared before the interrupt handler is entered. Task gates are used to make the transition to a new task. 2.8.4 Interrupt and Exception Priorities As the ST6x86 CPU executes instructions, it follows a consistent policy for prioritizing exceptions and hardware interrupts. The priorities for competing interrupts and exceptions are listed in Table 2.31. Debug traps for the previous instruction and the next instructions always take precedence. SMM interrupts are the next priority. When NMI and maskable INTR interrupts are both detected at the same instruction boundary, the ST6x86 microprocessor services the NMI interrupt first. The ST6x86 CPU checks for exceptions in parallel with instruction decoding and execution. Several exceptions can result from a single instruction. However, only one exception is generated upon each attempt to execute the instruction. Each exception service routine should make the appropriate corrections to the instruction and then restart the instruction. In this way, exceptions can be serviced until the instruction executes properly. The ST6x86 CPU supports instruction restart after all faults, except when an instruction causes a task switch to a task whose task state segment (TSS) is partially not present. A TSS can be partially not present if the TSS is not page aligned and one of the pages where the TSS resides is not currently in memory. 2-59 ST6x86 Table 2.31. Interrupt and Exception Priorities PRIORITY 2-60 DESCRIPTION NOTES 0 Warm Reset Caused by the assertion of WM_RST. 1 Debug traps and faults from previous Includes single-step trap and data breakpoints specified instruction. in the debug registers. 2 Debug traps for next instruction. Includes instruction execution breakpoints specified in the debug registers. 3 Hardware Cache Flush Caused by the assertion of FLUSH#. 4 SMM hardware interrupt. SMM interrupts are caused by SMI# asserted and always have highest priority. 5 Non-maskable hardware interrupt. Caused by NMI asserted. 6 Maskable hardware interrupt. Caused by INTR asserted and IF = 1. 7 Faults resulting from fetching the next Includes segment not present, general protection fault instruction. and page fault. 8 Faults resulting from instruction decoding. Includes illegal opcode, instruction too long, or privilege violation. 9 WAIT instruction and TS = 1 and MP = 1. Device not available exception generated. 10 ESC instruction and EM = 1 or TS = 1. Device not available exception generated. 11 Floating point error exception. Caused by unmasked floating point exception with NE = 1. 12 Segmentation faults (for each memory reference required by the instruction) that pre- Includes segment not present, stack fault, and general vent transferring the entire memory protection fault. operand. 13 Page Faults that prevent transferring the entire memory operand. 14 Alignment check fault. ST6x86 2.8.5 Exceptions in Real Mode Other exceptions have slightly different meanings in real mode as listed in Table 2.32. Many of the exceptions described in Table 2.31 are not applicable in real mode. Exceptions 10, 11, and 14 do not occur in real mode. Table 2.32. Exception Changes in Real Mode VECTOR NUMBER PROTECTED MODE FUNCTION REAL MODE FUNCTION 8 Double fault. Interrupt table limit overrun. 10 Invalid TSS. x 11 Segment not present. x 12 Stack fault. SS segment limit overrun. 13 General protection fault. CS, DS, ES, FS, GS segment limit overrun. 14 Page fault. x Note: x = does not occur 2-61 ST6x86 2.8.6 Error Codes When operating in protected mode, the following exceptions generate a 16-bit error code: *Double Fault *Alignment Check *Page Fault The error code is pushed onto the stack prior to entering the exception handler. The error code format is shown in Figure 2.36 and the error code bit definitions are listed in Table 2.33. Bits 15-3 (selector index) are not meaningful if the error code was generated as the result of a page fault. The error code is always zero for double faults and alignment check exceptions. *Invalid TSS *Segment Not Present *Stack Fault *General Protection Fault Figure 2.36. Error Code Format 15 3 Selector Index 2 1 0 S2 S1 S0 Table 2.33. Error Code Bit Definitions SELECTOR INDEX (BITS 15-3) FAULT TYPE S2 (BIT 2) S1 (BIT 1) S0 (BIT 0) Double Fault or Align0 ment Check 0 0 Page Fault Fault caused by: 0 = not present page 1 = page-level protection violation. Fault occurred during: Fault occurred during: 0 = read access 0 = supervisor access 1 = write access. 1 = user access. Reserved. IDT Fault Index of faulty IDT selector. Reserved. Segment Fault Index of faulty selector. TI bit of faulty selector. 2-62 0 1 If = 1, exception occurred while trying to invoke exception or hardware interrupt handler. 0 If =1, exception occurred while trying to invoke exception or hardware interrupt handler. ST6x86 2.9 System Management Mode System Management Mode (SMM) provides an additional interrupt which can be used for system power management or software transparent emulation of I/O peripherals. SMM is entered using the System Management Interrupt (SMI#) that has a higher priority than any other interrupt, including NMI. An SMI interrupt can also be triggered via software using an SMINT instruction. After an SMI interrupt, portions of the CPU state are auto- matically saved, SMM is entered, and program execution begins at the base of SMM address space (Figure 2.37). Running in SMM address space, the interrupt routine does not interfere with the operating system or any application program. Eight SMM instructions have been added to the x86 instruction set that permit software initiated SMM, and saving and restoring of the total CPU state when in SMM mode. Two SMM pins, SMI# and SMIACT#, support SMM functions. Figure 2.37. System Management Memory Address Space Potential SMM Address Space Physical Memory Space FFFF FFFFh FFFF FFFFh Physical Memory 4 GBytes 0000 0000h 4 KBytes to 4 GBytes Defined SMM Address Space SMIACT# Active 0000 0000h Non-SMM Mode SMIACT# Negated SMM Mode 1713604 2-63 ST6x86 2.9.1 SMM Operation Figure 2.38. SMI Execution Flow Diagram SMM operation is summarized in Figure 2.38. Entering SMM requires the assertion of the SMI# pin for at least two CLK periods or execution of the SMINT instruction. For the SMI# or SMINT instruction to be recognized, the following configuration register bits must be set as shown in Figure 2.38. The configuration registers are discussed in detail earlier in this chapter. Table 2.34. Requirements for Recognizing SMI# and SMINT REGISTER (Bit) SMI# CPU State Stored in SMM Address Space Header SMINT SMI CCR1 (1) 1 1 SMAC CCR1 (2) 0 1 ARR3 SIZE (3-0) >0 >0 SM3 CCR1 (7) 1 1 After recognizing SMI# or SMINT and prior to executing the SMI service routine, some of the CPU state information is changed. Prior to modification, this information is automatically saved in the SMM memory space header located at the top of SMM memory space. After the header is saved, the CPU enters real mode and begins executing the SMI service routine starting at the SMM memory base address. The SMI service routine is user definable and may contain system or power management software. If the power management software forces the CPU to power down, or the SMI service routine modifies more than what is automatically saved, the complete CPU state information can be saved. 2-64 SMI# Sampled Active or SMINT Instruction Executed CPU Enters Real Mode Execution Begins at SMM Address Space Base Address RSM Instruction Restores CPU State Using Header Information Normal Execution Resumes 1713703 ST6x86 2.9.2 SMM Memory Space Header With every SMI interrupt or SMINT instruction, certain CPU state information is automatically saved in the SMM memory space header located at the top of SMM address space as shown Figure 2.39 and Table 2.35. The header contains CPU state information that is modified when servicing an SMI interrupt. Included in this information are two pointers. The Current IP points to the instruction that was executing when the SMI was detected. Figure 2.39. SMM Memory Space Header 31 0 DR7 Top of SMM Address Space -4h EFLAGS -8h CR0 -Ch Current IP 22 21 31 -10h Next IP 16 15 0 -14h Reserved CPL Reserved CS Selector -18h CS Descriptor (Bits 63-32) -1Ch 31 CS Descriptor (Bits 31-0) 4 3 2 1 0 -20h Reserved H S P I 16 15 I/O Write Data Size Reserved -24h I/O Write Address -28h I/O Write Data -2Ch ESI or EDI -30h 1713504 2-65 ST6x86 The Next IP points to the instruction that will be executed after exiting SMM. Also saved are the contents of debug register 7 (DR7), the extended flags register (EFLAGS), and control register 0 (CR0). If SMM has been entered due to an I/O trap for a REP INSx or REP OUTSx instruction, the Current IP and Next IP fields contain the same addresses and the I and P field contain valid information. If entry into SMM was caused by an I/O trap it is useful for the programmer to know the port address, data size and data value associated with that I/O operation. This information is also saved in the header and is only valid for an I/O write operation. The I/O write information is not restored within the CPU when executing a RSM instruction. Table 2.35. SMM Memory Space Header NAME DESCRIPTION SIZE DR7 The contents of Debug Register 7. 4 Bytes EFLAGS The contents of Extended Flags Register. 4 Bytes CR0 The contents of Control Register 0. 4 Bytes Current IP The address of the instruction executed prior to servicing SMI interrupt. 4 Bytes Next IP The address of the next instruction that will be executed after exiting SMM mode. 4 Bytes CS Selector Code segment register selector for the current code segment. 2 Bytes CPL Current privilege level for current code segment. 2 Bits CS Descriptor Code segment register descriptor for the current code segment. 8 Bytes H If set indicates the processor was in a halt or shutdown prior to servicing the SMM interrupt. 1 Bit S Software SMM Entry Indicator. S = 1, if current SMM is the result of an SMINT instruction. S = 0, if current SMM is not the result of an SMINT instruction. 1 Bit P REP INSx/OUTSx Indicator. P = 1 if current instruction has a REP prefix. P = 0 if current instruction does not have a REP prefix. 1 Bit I IN, INSx, OUT, or OUTSx Indicator. I = 1 if current instruction performed is an I/O WRITE. I = 0 if current instruction performed is an I/O READ. 1 Bit I/O Write Data Size Indicates size of data for the trapped I/O write. 01h = byte 03h = word 0Fh = dword 2 Bytes I/O Write Address Processor port used for the trapped I/O write. 2 Bytes I/O Write Data Data associated with the trapped I/O write. 4 Bytes ESI or EDI Restored ESI or EDI value. Used when it is necessary to repeat a REP OUTSx or REP INSx instruction when one of the I/O cycles caused an SMI# trap. 4 Bytes Note: INSx = INS, INSB, INSW or INSD instruction. Note: OUTSx = OUTS, OUTSB, OUTSW and OUTSD instruction. 2-66 ST6x86 2.9.3 SMM Instructions The ST6x86 CPU automatically saves the minimal amount of CPU state information when entering SMM which allows fast SMI service routine entry and exit. After entering the SMI service routine, the MOV, SVDC, SVLDT and SVTS instructions can be used to save the complete CPU state information. If the SMI service routine modifies more than what is automatically saved or forces the CPU to power down, the complete CPU state information must be saved. Since the CPU is a static device, its internal state is retained when the input clock is stopped. Therefore, an entire CPU state save is not necessary prior to stopping the input clock. The new SMM instructions, listed in Table 2.36, can only be executed if: 1) SMI# = 0 2) SM3 = 1 3) ARR3 SIZE > 0 4) Current Privilege Level = 0 5) SMAC bit is set or the CPU is in an SMI service routine. If the above conditions are not met and an attempt is made to execute an SVDC, RSDC, SVLDT, RSLDT, SVTS, RSTS, SMINT or RSM instruction, an invalid opcode exception is generated. These instructions can be executed outside of defined SMM space provided the above conditions are met. The SMINT instruction may be used as a software controlled mechanism to enter SMM. Table 2.36. SMM Instruction Set INSTRUCTION SVDC RSDC OPCODE FORMAT DESCRIPTION 0F 78 [mod sreg3 r/m] SVDC mem80, sreg3 0F 79 [mod sreg3 r/m] Restore Segment Register and Descriptor Restores reg (DS, ES, FS, GS, or SS) from mem80. RSDC sreg3, mem80 Use RSM to restore CS. Save Segment Register and Descriptor Saves reg (DS, ES, FS, GS, or SS) to mem80. Note: Processing "RSDC CS, Mem80" will produce an exception. SVLDT 0F 7A [mod 000 r/m] SVLDT mem80 Save LDTR and Descriptor Saves Local Descriptor Table (LDTR) to mem80. RSLDT 0F 7B [mod 000 r/m] RSLDT mem80 Restore LDTR and Descriptor Restores Local Descriptor Table (LDTR) from mem80. SVTS 0F 7C [mod 000 r/m] SVTS mem80 Save TSR and Descriptor Saves Task State Register (TSR) to mem80. RSTS 0F 7D [mod 000 r/m] RSTS mem80 Restore TSR and Descriptor Restores Task State Register (TSR) from mem80. SMINT 0F 7E SMINT Software SMM Entry CPU enters SMM mode. CPU state information is saved in SMM memory space header and execution begins at SMM base address. RSM 0F AA RSM Resume Normal Mode Exits SMM mode. The CPU state is restored using the SMM memory space header and execution resumes at interrupted point. Note: mem80 = 80-bit memory location 2-67 ST6x86 All of the SMM instructions (except RSM and SMINT) save or restore 80 bits of data, allowing the saved values to include the hidden portion of the register contents. 2.9.4 SMM Memory Space SMM memory space is defined by setting the SM3 bit and specifying the base address and size of the SMM memory space in the ARR3 register. The base address must be a multiple of the SMM memory space size. For example, a 32 KByte SMM memory space must be located at a 32 KByte address boundary. The memory space size can range from 4 KBytes to 4 GBytes. SMM memory space accesses are always non-cacheable. SMM accesses ignore the state of the A20M# input pin and drive the A20 address bit to the unmasked value. SMM memory space can be accessed while in normal mode by setting the SMAC bit in the CCR1 register. This feature may be used to initialize the SMM memory space. 2.9.5 SMI Service Routine Execution Upon entry into SMM, after the SMM header has been saved, the CR0, EFLAGS, and DR7 registers are set to their reset values. The Code Segment (CS) register is loaded with the base, as defined by the ARR3 register, and a limit of 4 GBytes. The SMI service routine then begins execution at the SMM base address in real mode. 2-68 The programmer must save the value of any registers that may be changed by the SMI service routine. For data accesses immediately after entering the SMI service routine, the programmer must use CS as a segment override. I/O port access is possible during the routine but care must be taken to save registers modified by the I/O instructions. Before using a segment register, the register and the register's descriptor cache contents should be saved using the SVDC instruction. While executing in the SMM space, execution flow can transfer to normal memory locations. Hardware interrupts, (INTRs and NMIs), may be serviced during a SMI service routine. If interrupts are to be serviced while executing in the SMM memory space, the SMM memory space must be within the 0 to 1 MByte address range to guarantee proper return to the SMI service routine after handling the interrupt. INTRs are automatically disabled when entering SMM since the IF flag is set to its reset value. Once in SMM, the INTR can be enabled by setting the IF flag. NMI is also automatically disable when entering SMM. Once in SMM, NMI can be enabled by setting NMI_EN in CCR3. If NMI is not enabled, the CPU latches one NMI event and services the interrupt after NMI has been enabled or after exiting SMM through the RSM instruction. Within the SMI service routine, protected mode may be entered and exited as required, and real or protected mode device drivers may be called. ST6x86 To exit the SMI service routine, a Resume (RSM) instruction, rather than an IRET, is executed. The RSM instruction causes the ST6x86 processor to restore the CPU state using the SMM header information and resume execution at the interrupted point. If the full CPU state was saved by the programmer, the stored values should be reloaded prior to executing the RSM instruction using the MOV, RSDC, RSLDT and RSTS instructions. CPU States Related to SMM and Suspend Mode The state diagram shown in Figure 2.40 illustrates the various CPU states associated with SMM and suspend mode. While in the SMI service routine, the ST6x86 CPU can enter suspend mode either by (1) executing a halt (HLT) instruction or (2) by asserting the SUSP# input. During SMM operations and while in SUSP# initiated suspend mode, an occurrence of SMI#, NMI, or INTR is latched. (In order for INTR to be latched, the IF flag must be set.) The INTR or NMI is serviced after exiting suspend mode. 2.10 Shutdown and Halt The Halt Instruction (HLT) stops program execution and prevents the processor from using the local bus until restarted. The ST6x86 CPU then issues a special Stop Grant bus cycle and enters a low-power suspend mode if the SUSP_HLT bit in CCR2 is set. SMI, NMI, INTR with interrupts enabled (IF bit in EFLAGS=1), WM_RST or RESET forces the CPU out of the halt state. If interrupted, the saved code segment and instruction pointer specify the instruction following the HLT. Shutdown occurs when a severe error is detected that prevents further processing. An NMI input can bring the processor out of shutdown if the IDT limit is large enough to contain the NMI interrupt vector and the stack has enough room to contain the vector and flag information. Otherwise, shutdown can only be exited by a processor reset. If suspend mode is entered via a HLT instruction from the operating system or application software, the reception of an SMI# interrupt causes the CPU to exit suspend mode and enter SMM. 2-69 ST6x86 Figure 2.40. SMM and Suspend Mode State Diagram NMI or INTR Suspend Mode (SUSPA# = 0) Interrupt Service Routine HLT* IRET* NMI or INTR SUSP#=0 OS/Application RESET SUSP#=1 Software Suspend Mode (SUSPA# = 0) (INTR, NMI and SMI latched) SMI#=0 SMI# = 0 SMINT* RSM* Non-SMM Operations SMM Operations SMI Service Routine (SMI#=0) HLT* Suspend Mode (SUSPA# = 0) IRET* INTR or NMI IRET* SUSP#=0 Interrupt Service Routine * Instructions 2-70 INTR and NMI SUSP#=1 Suspend Mode (SUSPA# = 0) (INTR and NMI latched) Interrupt Service Routine 1715903 ST6x86 2.11 Protection 2.11.1 Privilege Levels Segment protection and page protection are safeguards built into the ST6x86 CPU protected mode architecture which deny unauthorized or incorrect access to selected memory addresses. These safeguards allow multitasking programs to be isolated from each other and from the operating system. Page protection is discussed earlier in this chapter. This section concentrates on segment protection. The values for privilege levels range between 0 and 3. Level 0 is the highest privilege level (most privileged), and level 3 is the lowest privilege level (least privileged). The privilege level in real mode is effectively 0. Selectors and descriptors are the key elements in the segment protection mechanism. The segment base address, size, and privilege level are established by a segment descriptor. Privilege levels control the use of privileged instructions, I/O instructions and access to segments and segment descriptors. Selectors are used to locate segment descriptors. Segment accesses are divided into two basic types, those involving code segments (e.g., control transfers) and those involving data accesses. The ability of a task to access a segment depends on the: - segment type - instruction requesting access - type of descriptor used to define the segment - associated privilege levels (described below). Data stored in a segment can be accessed only by code executing at the same or a more privileged level. A code segment or procedure can only be called by a task executing at the same or a less privileged level. The Descriptor Privilege Level (DPL) is the privilege level defined for a segment in the segment descriptor. The DPL field specifies the minimum privilege level needed to access the memory segment pointed to by the descriptor. The Current Privilege Level (CPL) is defined as the current task's privilege level. The CPL of an executing task is stored in the hidden portion of the code segment register and essentially is the DPL for the current code segment. The Requested Privilege Level (RPL) specifies a selector's privilege level and is used to distinguish between the privilege level of a routine actually accessing memory (the CPL), and the privilege level of the original requestor (the RPL) of the memory access. The lesser of the RPL and CPL is called the effective privilege level (EPL). Therefore, if RPL = 0 in a segment selector, the effective privilege level is always determined by the CPL. If RPL = 3, the effective privilege level is always 3 regardless of the CPL. For a memory access to succeed, the effective privilege level (EPL) must be at least as privileged as the descriptor privilege level (EPL DPL). If the EPL is less privileged than the DPL (EPL > DPL), a general protection fault is generated. For example, if a segment has a DPL = 2, an instruction accessing the segment only succeeds if executed with an EPL 2. 2-71 ST6x86 2.11.2 I/O Privilege Levels 2.11.3 Privilege Level Transfers The I/O Privilege Level (IOPL) allows the operating system executing at CPL=0 to define the least privileged level at which IOPL-sensitive instructions can unconditionally be used. The IOPL-sensitive instructions include CLI, IN, OUT, INS, OUTS, REP INS, REP OUTS, and STI. Modification of the IF bit in the EFLAGS register is also sensitive to the I/O privilege level. The IOPL is stored in the EFLAGS register. A task's CPL can be changed only through intersegment control transfers using gates or task switches to a code segment with a different privilege level. Control transfers result from exception and interrupt servicing and from execution of the CALL, JMP, INT, IRET and RET instructions. An I/O permission bit map is available as defined by the 32-bit Task State Segment (TSS). Since each task can have its own TSS, access to individual processor I/O ports can be granted through separate I/O permission bit maps. If CPL IOPL, IOPL-sensitive operations can be performed. If CPL > IOPL, a general protection fault is generated if the current task is associated with a 16-bit TSS. If the current task is associated with a 32-bit TSS and CPL > IOPL, the CPU consults the I/O permission bitmap in the TSS to determine on a port-by-port basis whether or not I/O instructions (IN, OUT, INS, OUTS, REP INS, REP OUTS) are permitted, and the remaining IOPL-sensitive operations generate a general protection fault. 2-72 There are five types of control transfers that are summarized in Table 2.37. Control transfers can be made only when the operation causing the control transfer references the correct descriptor type. Any violation of these descriptor usage rules causes a general protection fault. Any control transfer that changes the CPL within a task results in a change of stack. The initial values for the stack segment (SS) and stack pointer (ESP) for privilege levels 0, 1, and 2 are stored in the TSS. During a CALL control transfer, the SS and ESP are loaded with the new stack pointer and the previous stack pointer is saved on the new stack. When returning to the original privilege level, the RET or IRET instruction restores the less-privileged stack. ST6x86 Table 2.37. Descriptor Types Used for Control Transfer TYPE OF CONTROL TRANSFER Intersegment within the same privilege level. OPERATION TYPES JMP, CALL, RET, IRET* DESCRIPTOR REFERENCED Code Segment CALL Gate Call Intersegment to the same or a more privileged Interrupt Instruction, level. Exception, External Trap or Interrupt Gate Interrupt within task (could change CPL level). Interrupt DESCRIPTOR TABLE GDT or LDT GDT or LDT IDT Intersegment to a less privileged level (changes RET, IRET* task CPL). Code Segment GDT or LDT Task Switch via TSS CALL, JMP Task State Segment GDT CALL, JMP Task Gate GDT or LDT Task Switch via Task Gate IRET**, Interrupt Instruction, Exception, External Task Gate Interrupt IDT * NT (Nested Task bit in EFLAGS) = 0 ** NT (Nested Task bit in EFLAGS) = 1 Gates Gate descriptors provide protection for privilege transfers among executable segments. Gates are used to transition to routines of the same or a more privileged level. Call gates, interrupt gates and trap gates are used for privilege transfers within a task. Task gates are used to transfer between tasks. Gates conform to the standard rules of privilege. In other words, gates can be accessed by a task if the effective privilege level (EPL) is the same or more privileged than the gate descriptor's privilege level (DPL). 2.11.4 Initialization and Transition to Protected Mode The ST6x86 microprocessor switches to real mode immediately after RESET. While operating in real mode, the system tables and registers should be initialized. The GDTR and IDTR must point to a valid GDT and IDT, respectively. The GDT must contain descriptors which describe the initial code and data segments. The processor can be placed in protected mode by setting the PE bit in the CR0 register. After enabling protected mode, the CS register should be loaded and the instruction decode queue should be flushed by executing an intersegment JMP. Finally, all data segment registers should be initialized with appropriate selector values. 2-73 ST6x86 2.12 Virtual 8086 Mode 2.12.2 V86 Protection Both real mode and virtual 8086 (V86) mode are supported by the ST6x86 CPU allowing execution of 8086 application programs and 8086 operating systems. V86 mode allows the execution of 8086-type applications, yet still permits use of the ST6x86 CPU paging mechanism. V86 tasks run at privilege level 3. When loaded, all segment limits are set to FFFFh (64K) as in real mode. All V86 tasks operate with the least amount of privilege (level 3) and are subject to all of the ST6x86 CPU protected mode protection checks. As a result, any attempt to execute a privileged instruction within a V86 task results in a general protection fault. 2.12.1 V86 Memory Addressing While in V86 mode, segment registers are used in an identical fashion to real mode. The contents of the segment register are multiplied by 16 and added to the offset to form the segment base linear address. The ST6x86 CPU permits the operating system to select which programs use the V86 address mechanism and which programs use protected mode addressing for each task. The ST6x86 CPU also permits the use of paging when operating in V86 mode. Using paging, the 1-MByte address space of the V86 task can be mapped to anywhere in the 4-GByte linear address space of the ST6x86 CPU. The paging hardware allows multiple V86 tasks to run concurrently, and provides protection and operating system isolation. The paging hardware must be enabled to run multiple V86 tasks or to relocate the address space of a V86 task to physical address space greater than 1 MByte. 2-74 In V86 mode, a slightly different set of instructions are sensitive to the I/O privilege level (IOPL) than in protected mode. These instructions are: CLI, INT n, IRET, POPF, PUSHF, and STI. The INT3, INTO and BOUND variations of the INT instruction are not IOPL sensitive. 2.12.3 V86 Interrupt Handling To fully support the emulation of an 8086-type machine, interrupts in V86 mode are handled as follows. When an interrupt or exception is serviced in V86 mode, program execution transfers to the interrupt service routine at privilege level 0 (i.e., transition from V86 to protected mode occurs) and the VM bit in the EFLAGS register is cleared. The protected mode interrupt service routine then determines if the interrupt came from a protected mode or V86 application by examining the VM bit in the EFLAGS image stored on the stack. The interrupt service routine may then choose to allow the 8086 operating system to handle the interrupt or may emulate the function of the interrupt handler. Following completion of the interrupt service routine, an IRET instruction restores the EFLAGS register (restores VM=1) and segment selectors and control returns to the interrupted V86 task. ST6x86 2.12.4 Entering and Leaving V86 Mode V86 mode is entered from protected mode by either executing an IRET instruction at CPL = 0 or by task switching. If an IRET is used, the stack must contain an EFLAGS image with VM = 1. If a task switch is used, the TSS must contain an EFLAGS image containing a 1 in the VM bit position. The POPF instruction cannot be used to enter V86 mode since the state of the VM bit is not affected. V86 mode can only be exited as the result of an interrupt or exception. The transition out must use a 32-bit trap or interrupt gate which must point to a non-conforming privilege level 0 segment (DPL = 0), or a 32-bit TSS. These restrictions are required to permit the trap handler to IRET back to the V86 program. 2.13 Floating Point Unit Operations The ST6x86 CPU includes an on-chip FPU that provides the user access to a complete set of floating point instructions (see Chapter 6). Information is passed to and from the FPU using eight data registers accessed in a stack-like manner, a control register, and a status register. The ST6x86 CPU also provides a data register tag word which improves context switching and performance by maintaining empty/non-empty status for each of the eight data registers. In addition, registers in the CPU contain pointers to (a) the memory location containing the current instruction word and (b) the memory location containing the operand associated with the current instruction word (if any). FPU Tag Word Register. The ST6x86 CPU maintains a tag word register ((Figure 2.41)) comprised of two bits for each physical data register. Tag Word fields assume one of four values depending on the contents of their associated data registers, Valid (00), Zero (01), Special (10), and Empty (11). Note: Denormal, Infinity, QNaN, SNaN and unsupported formats are tagged as "Special". Tag values are maintained transparently by the ST6x86 CPU and are only available to the programmer indirectly through the FSTENV and FSAVE instructions. FPU Control and Status Registers. The FPU circuitry communicates information about its status and the results of operations to the programmer via the status register. The FPU status register is comprised of bit fields that reflect exception status, operation execution status, register status, operand class, and comparison results. The FPU status register bit definitions are shown in Figure 2.42 and Table 2.38. The FPU Mode Control Register (MCR) is used by the CPU to specify the operating mode of the FPU. The MCR contains bit fields which specify the rounding mode to be used, the precision by which to calculate results, and the exception conditions which should be reported to the CPU via traps. The user controls precision, rounding, and exception reporting by setting or clearing appropriate bits in the MCR. The FPU mode control register bit definitions are shown in Figure 2.43 and Table 2.39. 2-75 ST6x86 Figure 2.41. FPU Tag Word Register 15 14 13 Tag(7) 12 11 Tag(6) 10 9 Tag(5) 8 7 Tag(4) 6 5 Tag(3) 4 3 Tag(2) 2 1 Tag(1) 0 Tag(0) Figure 2.42. FPU Status Register 15 B C3 S 12 11 S S 8 C2 C1 C0 7 ES 4 SF P U 3 O 0 Z Table 2.38. FPU Status Register Bit Definitions BIT POSITION NAME DESCRIPTION 15 B Copy of the ES bit. (ES is bit 7 in this table.) 14, 10 - 8 C3 - C0 Condition code bits. 13 - 11 SSS Top of stack register number which points to the current TOS. 7 ES Error indicator. Set to 1 if an unmasked exception is detected. 6 SF Stack Fault or invalid register operation bit. 5 P Precision error exception bit. 4 U Underflow error exception bit. 3 O Overflow error exception bit. 2 Z Divide by zero exception bit. 1 D Denormalized operand error exception bit. 0 I Invalid operation exception bit. 2-76 D I ST6x86 Figure 2.43. FPU Mode Control Register 15 - 12 - - - 11 RC 8 RC PC 7 4 PC - - P 3 U O 0 Z D I Table 2.39. FPU Mode Control Register Bit Definitions BIT POSITION NAME DESCRIPTIO N Rounding Control bits: 11 - 10 RC 00 01 10 11 Round to nearest or even Round towards minus infinity Round towards plus infinity Truncate Precision Control bits: 9-8 PC 00 01 10 11 24-bit mantissa Reserved 53-bit mantissa 64-bit mantissa 5 P Precision error exception bit mask. 4 U Underflow error exception bit mask. 3 O Overflow error exception bit mask. 2 Z Divide by zero exception bit mask. 1 D Denormalized operand error exception bit mask. 0 I Invalid operation exception bit mask. 2-77 ST6x86 3.0 ST6x86 BUS INTERFACE The signals used in the ST6x86 CPU bus interface are described in this chapter. Figure 3.1 shows the signal directions and the major signal groupings. A description of each signal and their reference to the text are provided in Table 3.1. Figure 3.1. ST6x86 CPU Functional Signal Groupings INTR Clock Control Reset NMI CLK CLKMUL SMI# RESET EWBE# WM_RST FLUSH# KEN# A31 - A3 Address Bus PCD Cache Control PWT BE7# - BE0# WB/WT# A20M# BOFF# BREQ Address Parity Interrupt Control AP APCHK# HOLD Bus Arbitration HLDA Data Bus D63 - D0 AHOLD EADS# Data Parity DP7 - DP0 6x86 HIT# CPU HITM# PCHK# CACHE# Cache Coherency INV FERR# IGNNE# FPU Error D/C# Bus Cycle Definition LOCK# BHOLD M/IO# DHOLD SCYC LBA# W/R# QDUMP# ADS# SUSPA# SUSP# Scatter Gather Buffer Power Management ADSC# Bus Cycle Control BRDY# BRDYC# NA# SMIACT# TCK TDI TDO JTAG TMS TRST# 1737900 3-1 ST6x86 3.1 Signal Description Table The Signal Summary Table (3.1) describes the signals in their active state unless otherwise mentioned. Signals containing slashes (/) have logic levels defined as "1/0." For example the signal W/R#, is defined as write when W/R#=1, and as read when W/R#=0. Signals ending with a "#" character are active low. Table 3.1. ST6x86 CPU Signals Sorted by Signal Name Signal Name A20M# A31-A3 ADS# ADSC# AHOLD AP APCHK# BE7#-BE0# BHOLD BOFF# BRDY# BRDYC# BREQ Description A20 Mask causes the CPU to mask (force to 0) the A20 address bit when driving the external address bus or performing an internal cache access. A20M# is provided to emulate the 1 MByte address wrap-around that occurs on the 8086. Snoop addressing is not effected. The Address Bus, in conjunction with the Byte Enable signals (BE7#-BE0#), provides addresses for physical memory and external I/O devices. During cache inquiry cycles, A31-A5 are used as inputs to perform cache line invalidations. Address Strobe begins a memory/I/O cycle and indicates the address bus (A31-A3, BE7#-BE0#) and bus cycle definition signals (CACHE#, D/C#, LOCK#, M/IO#, PCD, PWT, SCYC, W/R#) are valid. Cache Address Strobe performs the same function as ADS#. Address Hold allows another bus master access to the ST6x86 CPU address bus for a cache inquiry cycle. In response to the assertion of AHOLD, the CPU floats AP and A31-A3 in the following clock cycle. Address Parity is the even parity output signal for address lines A31-A5 (A4 and A3 are excluded). During cache inquiry cycles, AP is the even-parity input to the CPU, and is sampled with EADS# to produce correct parity check status on the APCHK# output. Address Parity Check Status is asserted during a cache inquiry cycle if an address bus parity error has been detected. APCHK# is valid two clocks after EADS# is sampled active. APCHK# will remain asserted for one clock cycle if a parity error is detected. The Byte Enables, in conjunction with the address lines, determine the active data bytes transferred during a memory or I/O bus cycle. Byte Enable Hold forces the byte enables (BE7#-BE0#) to float during the next clock cycle. The ST6x86 CPU continues to generate additional bus cycles while BHOLD is asserted. While BHOLD is asserted, the byte enables are driven by an external source and select which data bytes are accessed through the scatter/gather buffer. BHOLD is ignored if the scatter/gather interface is disabled. Back-Off forces the ST6x86 CPU to abort the current bus cycle and relinquish control of the CPU local bus during the next clock cycle. The ST6x86 CPU enters the bus hold state and remains in this state until BOFF# is negated. Burst Ready indicates that the current transfer within a burst cycle, or the current single transfer cycle, can be terminated. The ST6x86 CPU samples BRDY# in the second and subsequent clocks of a bus cycle. BRDY# is active during address hold states. Cache Burst Ready performs the same function as BRDY# and is logically ORed with BRDY# within the ST6x86 CPU. Bus Request is asserted by the ST6x86 CPU when an internal bus cycle is pending. The ST6x86 CPU always asserts BREQ, along with ADS#, during the first clock of a bus cycle. If a bus cycle is pending, BREQ is asserted during the bus hold and address hold states. If no additional bus cycles are pending, BREQ is negated prior to termination of the current cycle. I/O Reference Input Page 3-8 3-state I/O Page 3-8 Output Output Input Page 3-12 Page 3-12 Page 3-17 3-state I/O Page 3-9 Output Page 3-9 3-state I/O Page 3-8 Input Input Page 3-19 Page 3-15 Input Page 3-12 Input Page 3-12 Output Page 3-15 3-2 ST6x86 Table 3.1. ST6x86 CPU Signals Sorted by Signal Name (Continued) Signal Name CACHE# CLK CLKMUL D63-D0 D/C# DHOLD DP7-DP0 EADS# EWBE# FERR# FLUSH# HIT# HITM# 3-3 Description Cacheability Status indicates that a read bus cycle is a potentially cacheable cycle; or that a write bus cycle is a cache line write-back or line replacement burst cycle. If CACHE# is asserted for a read cycle and KEN# is asserted by the system, the read cycle becomes a cache line fill burst cycle. Clock provides the fundamental timing for the ST6x86 CPU. The frequency of the ST6x86 CPU input clock determines the operating frequency of the CPU's bus. External timing is defined referenced to the rising edge of CLK. The Clock Multiplier input is sampled during RESET to determine the ST6x86 CPU core operating frequency. If CLKMUL=0 or is left unconnected, the core frequency is 2x the frequency of the CLK input. If CLKMUL=1, the core frequency is 3x the frequency of CLK. Data Bus signals are three-state, bi-directional signals which provide the data path between the ST6x86 CPU and external memory and I/O devices. The data bus is only driven while a write cycle is active (state=T2). The data bus is floated when DHOLD is asserted. Data/Control Status. If high, indicates that the current bus cycle is an I/O or memory data access cycle. If low, indicates a code fetch or special bus cycle such as a halt, prefetch, or interrupt acknowledge bus cycle. D/C# is driven valid in the same clock as ADS# is asserted. Data Bus Hold forces the ST6x86 CPU to float the data bus (D63-D0) and the data parity lines (DP7-DP0) in the next clock. While DHOLD is asserted, only the data and data parity buses are disabled. The current bus cycle remains active and is completed in the normal fashion in response to BRDY#. The ST6x86 CPU generates additional bus cycles while DHOLD is asserted. DHOLD is ignored if the scatter/gather interface is disabled. Data Parity signals provide parity for the data bus, one data parity bit per data byte. Even parity is driven on DP7-DP0 for all data write cycles. DP7-DP0 are read by the ST6x86 CPU during read cycles to check for even parity. The data parity bus is only driven while a write cycle is active (state=T2). External Address Strobe indicates that a valid cache inquiry address is being driven on the ST6x86 CPU address bus (A31-A5) and AP. The state of INV at the time EADS# is sampled active determines the final state of the cache line. A cache inquiry cycle using EADS# may be run while the ST6x86 CPU is in the address hold or bus hold state. External Write Buffer Empty indicates that there are no pending write cycles in the external system. EWBE# is sampled only during I/O and memory write cycles. If EWBE# is negated, the ST6x86 CPU delays all subsequent writes to on-chip cache lines in the "exclusive" or "modified" state until EWBE# is asserted. FPU Error Status indicates an unmasked floating point error has occurred. FERR# is asserted during execution of the FPU instruction that caused the error. FERR# does not float during bus hold states. Cache Flush forces the ST6x86 CPU to flush the cache. External interrupts and additional FLUSH# assertions are ignored during the flush. Cache inquiry cycles are permitted during the flush. Cache Hit indicates that the current cache inquiry address has been found in the cache (modified, exclusive or shared states). HIT# is valid two clocks after EADS# is sampled active, and remains valid until the next cache inquiry cycle. Cache Hit Modified Data indicates that the current cache inquiry address has been found in the cache and dirty data exists in the cache line (modified state). The ST6x86 CPU does not accept additional cache inquiry cycles while HITM# is asserted. HITM# is valid two clocks after EADS#. I/O Reference Output Page 3-10 Input Page 3-6 Input Page 3-6 3-state I/O Page 3-9 Output Page 3-10 Input Page 3-20 3-state I/O Page 3-9 Input Page 3-17 Input Page 3-13 Output Page 3-18 Input Page 3-13 Output Page 3-17 Output Page 3-17 ST6x86 Table 3.1. ST6x86 CPU Signals Sorted by Signal Name (Continued) Signal Name HLDA HOLD IGNNE# INTR INV KEN# LBA# LOCK# M/IO# NA# NMI PCD PCHK# PWT Description I/O Hold Acknowledge indicates that the ST6x86 CPU has responded to the HOLD input and relinquished control of the local bus. The ST6x86 CPU Output continues to operate during bus hold as long as the on-chip cache can satisfy bus requests. Hold Request indicates that another bus master has requested control of Input the CPU's local bus. Ignore Numeric Error forces the ST6x86 CPU to ignore any pending unmasked FPU errors and allows continued execution of floating point instructions. Maskable Interrupt forces the processor to suspend execution of the current instruction stream and begin execution of an interrupt service routine. The INTR input can be masked (ignored) through the IF bit in the Flags Register. Invalidate Request is sampled with EADS# to determine the final state of the cache line in the case of a cache inquiry hit. An asserted INV directs the processor to change the state of the cache line to "invalid". A negated INV directs the processor to change the state of the cache line to "shared." Cache Enable allows the data being returned during the current cycle to be placed in the CPU's cache. When the ST6x86 CPU is performing a cacheable code fetch or memory data read cycle (CACHE# asserted), and KEN# is sampled asserted, the cycle is transformed into a 32-byte cache line fill. KEN# is sampled with the first asserted BRDY# or NA# for the cycle. Local Bus Access indicates that the current bus cycle is for an address within the local bus address region. If LBA# is asserted during a CPU write cycle with BE3#-BE0# negated, the ST6x86 CPU automatically maps the upper DWORD of data to the lower DWORD of the data bus. LBA# floats if scatter/gather pins are disabled. Lock Status indicates that other system bus masters are denied access to the local bus. The ST6x86 CPU does not enter the bus hold state in response to HOLD while LOCK# is asserted. Memory/IO Status. If high, indicates that the current bus cycle is a memory cycle (read or write). If low, indicates that the current bus cycle is an I/O cycle (read or write, interrupt acknowledge, or special cycle). Next Address requests the next pending bus cycle address and cycle definition information. If either the current or next bus cycle is a locked cycle, a line replacement, a write-back cycle, or if there is no pending bus cycle, the ST6x86 CPU does not start a pipelined bus cycle regardless of the state of NA#. Non-Maskable Interrupt Request forces the processor to suspend execution of the current instruction stream and begin execution of an NMI interrupt service routine. Page Cache Disable reflects the state of the PCD page attribute bit in the page table entry or the directory table entry. If paging is disabled, or for cycles that are not paged, the PCD pin is driven low. PCD is masked by the cache disable (CD) bit in CR0, and floats during bus hold states. Data Parity Check indicates that a data bus parity error has occurred during a read operation. PCHK# is only valid during the second clock immediately after read data is returned to the ST6x86 CPU (BRDY# asserted) and is inactive otherwise. Parity errors signaled by a logic low on PCHK# have no effect on processor execution. Page Write Through reflects the state of the PWT page attribute bit in the page table entry or the directory table entry. PWT pin is negated during cycles that are not paged, or if paging is disabled. PWT takes priority over WB/WT#. Reference Page 3-15 Page 3-15 Input Page 3-18 Input Page 3-12 Input Page 3-17 Input Page 3-14 Output Page 3-20 Output Page 3-10 Output Page 3-10 Input Page 3-12 Input Page 3-13 Output Page 3-14 Output Page 3-9 Output Page 3-14 3-4 ST6x86 Table 3.1. ST6x86 CPU Signals Sorted by Signal Name (Continued) Signal Name QDUMP# RESET SCYC SMI# SMIACT# SUSP# SUSPA# TCK TDI TDO TMS TRST# WB/WT# WM_RST W/R# 3-5 Description Q Buffer Dump is used to dump the contents of the scatter/gather buffer onto the data bus. The data bytes specified by the byte enables (BE7#-BE0#) are driven onto the data bus during the clock after QDUMP# is sampled asserted. QDUMP# is ignored if the scatter/gather pins are disabled. Reset suspends all operations in progress and places the ST6x86 CPU into a reset state. Reset forces the CPU to begin executing in a known state. All data in the on-chip caches is invalidated. Split Locked Cycle indicates that the current bus cycle is part of a misaligned locked transfer. SCYC is defined for locked cycles only. A misaligned transfer is defined as any transfer that crosses an 8-byte boundary. SMM Interrupt forces the processor to save the CPU state to the top of SMM memory and to begin execution of the SMI service routine at the beginning of the defined SMM memory space. An SMI is a higher-priority interrupt than an NMI. SMM Interrupt Active indicates that the processor is operating in System Management Mode. SMIACT# does not float during bus hold states. Suspend Request requests that the CPU enter suspend mode. SUSP# is ignored following RESET and is enabled by setting the SUSP bit in CCR2. Suspend Acknowledge indicates that the ST6x86 CPU has entered low-power suspend mode. SUSPA# floats following RESET and is enabled by setting the SUSP bit in CCR2. Test Clock (JTAG) is the clock input used by the ST6x86 CPU's boundary scan (JTAG) test logic. Test Data In (JTAG) is the serial data input used by the ST6x86 CPU's boundary scan (JTAG) test logic. Test Data Out (JTAG) is the serial data output used by the ST6x86 CPU's boundary scan (JTAG) test logic. Test Mode Select (JTAG) is the control input used by the ST6x86 CPU's boundary scan (JTAG) test logic. Test Mode Reset (JTAG) initializes the ST6x86 CPU's boundary scan (JTAG) test logic. Write-Back/Write-Through is sampled during cache line fills to define the cache line write policy. If high, the cache line write policy is write-back. If low, the cache line write policy is write-through. (PWT forces write-through policy when PWT=1.) Warm Reset forces the ST6x86 CPU to complete the current instruction and then places the ST6x86 CPU in a known state. Once WM_RST is sampled active by the CPU, the reset sequence begins on the next instruction boundary. WM_RST does not change the state of the configuration registers, the on-chip cache, the write buffers and the FPU registers. WM_RST is sampled during reset. Write/Read Status. If high, indicates that the current memory, or I/O bus cycle is a write cycle. If low, indicates that the current bus cycle is a read cycle. I/O Reference Input Page 3-20 Input Page 3-6 Output Page 3-10 Input Page 3-13 Output Page 3-12 Input Page 3-21 Output Page 3-21 Input Page 3-23 Input Page 3-23 Output Page 3-23 Input Page 3-23 Input Page 3-23 Input Page 3-14 Input Page 3-8 Output Page 3-10 ST6x86 3.2 Signal Descriptions 3.2.2 Reset Control The following paragraphs provide additional information about the ST6x86 CPU signals. For ease of this discussion, the signals are divided into 16 functional groups as illustrated in Figure 3.1. The ST6x86 CPU output signals are initialized to their reset states during the CPU reset sequence, as shown in Table 3.3. The signal states given in Table 3.3 assume that HOLD, AHOLD, and BOFF# are negated. 3.2.1 Clock Control The Clock Input (CLK) signal, supplied by the system, is the timing reference use by the ST6x86 CPU bus interface. All external timing parameters are defined with respect to the CLK rising edge. The CLK signal enters the ST6x86 CPU where it is doubled or tripled to produce the ST6x86 CPU internal clock signal. During power on, the CLK signal must be running even if CLK does not meet AC specifications. The Clock Multiplier (CLKMUL) input is sampled during RESET to determine the CPU's core operating frequency. If CLKMUL=0, the core frequency is 2x the frequency of the CLK input. If CLKMUL=1, the core frequency is 3x the frequency of the CLK input. The CLKMUL input is connected to an internal pull-down resistor. Therefore, if CLKMUL is left unconnected, the core frequency defaults to 2x the input CLK. CLKMUL should be connected to Vss, to Vcc through a pull-up, or left unconnected. CLKMUL should not be connected to a switching signal. Asserting RESET suspends all operations in progress and places the ST6x86 CPU in a reset state. RESET is an asynchronous signal but must meet specified setup and hold times to guarantee recognition at a particular clock edge. On system power-up, RESET must be held asserted for at least 1 msec after Vcc and CLK have reached specified DC and AC limits. This delay allows the CPU's clock circuit to stabilize and guarantees proper completion of the reset sequence. During normal operation, RESET must be asserted for at least 15 CLK periods in order to guarantee the proper reset sequence is executed. When RESET negates (on its falling edge), the pins listed in Table 3.2 determine if certain ST6x86 CPU functions are enabled. Table 3.2. Pins Sampled During RESET SIGNAL NAME FLUSH# QDUMP# WM_RST 3-6 DESCRIPTION If = 0, three-state test mode enabled. If = 0, scatter/gather interface enabled. If = 1, built-in self test initiated. ST6x86 Table 3.3. Signal States During RESET SIGNAL LINE A20M# A31-A3 ADS# ADSC# AHOLD AP APCHK# BE7#-BE0# BHOLD BOFF# BRDY# BRDYC# BREQ CACHE# D(63-0) D/C# DHOLD DP(7-0) EADS# EWBE# FERR# FLUSH# HIT# HITM# HLDA HOLD IGNNE# 3-7 STATE Ignored Undefined until first ADS# 1 1 Recognized Undefined until first ADS# 1 Undefined until first ADS# Ignored Recognized Ignored Ignored 0 Undefined until first ADS# Float Undefined until first ADS# Ignored Float Ignored Ignored 1 Initiates three-state test mode 1 1 Responds to HOLD Recognized Ignored SIGNAL LINE INTR INV KEN# LBA# LOCK# M/IO# NA# NMI PCD PCHK# PWT QDUMP# RESET SCYC SMI# SMIACT# SUSP# SUSPA# TCK TDI TDO TMS TRST# W/R# WB/WT# WM_RST STATE Ignored Ignored Ignored 1 1 Undefined until first ADS# Ignored Ignored Undefined until first ADS# 1 Undefined until first ADS# Enables scatter/gather interface pins 1 Undefined until first ADS# Ignored 1 Ignored Float Recognized Recognized Responds to TCK, TDI, TMS, TRST# Recognized Recognized Undefined until first ADS# Ignored Initiates self-test ST6x86 Warm Reset (WM_RST) allows the ST6x86 CPU to complete the current instruction and then places the ST6x86 CPU in a known state. WM_RST is an asynchronous signal, but must meet specified setup and hold times in order to guarantee recognition at a particular CLK edge. Once WM_RST is sampled active by the CPU, the reset sequence begins on the next instruction boundary. WM_RST differs from RESET in that the contents of the on-chip cache, the write buffers, the configuration registers and the floating point registers contents remain unchanged. Following completion of the internal reset sequence, normal processor execution begins even if WM_RST remains asserted. If RESET and WM_RST are asserted simultaneously, WM_RST is ignored and RESET takes priority. If WM_RST is asserted at the falling edge of RESET, built-in self test (BIST) is initiated. 3.2.3 Address Bus The Address Bus (A31-A3) lines provide the physical memory and external I/O device addresses. A31-A5 are bi-directional signals used by the ST6x86 CPU to drive addresses to both memory devices and I/O devices. During cache inquiry cycles the ST6x86 CPU receives addresses from the system using signals A31-A5. Using signals A31-A3, the ST6x86 CPU can address a 4-GByte memory address space. Using signals A15-A3, the ST6x86 CPU can address a 64-KByte I/O space through the processor's I/O ports. During I/O accesses, signals A31-A16 are driven low. A31-A3 float during bus hold and address hold states. 3-8 The Byte Enable (BE7#-BE0#) lines are bi-directional signals that define the valid data bytes within the 64-bit data bus. The correlation between the enable signals and data bytes is shown in Table 3.4. Table 3.4. Byte Enable Signal to Data Bus Byte Correlation BYTE ENABLE CORRESPONDING DATA BYTE BE7# BE6# BE5# BE4# BE3# BE2# BE1# BE0# D63-D56 D55-D48 D47-D40 D39-D32 D31-D24 D23-D16 D15-D8 D7-D0 During a cache line fill, (burst read or "1+4" burst read) the ST6x86 CPU expects data to be returned as if all data bytes are enabled, regardless of the state of the byte enables. BE7#-BE0# float during bus hold and byte enable hold states. Address Bit 20 Mask (A20M#) is an active low input which causes the ST6x86 CPU to mask (force low) physical address bit 20 when driving the external address bus or when performing an internal cache access. Asserting A20M# emulates the 1 MByte address wrap-around that occurs on the 8086. The A20 signal is never masked during write-back cycles, inquiry cycles, system management address space accesses or when paging is enabled, regardless of the state of the A20M# input. ST6x86 3.2.4 Address Parity 3.2.6 Data Parity Address Parity (AP) is a bi-directional signal which provides the parity associated with address lines A31-A5. (A4 and A3 are not included in the parity determination.) During ST6x86 CPU generated bus cycles, while the address bus lines are driven, AP becomes an output supplying even address parity. During cache inquiry cycles, AP becomes an input and is sampled by EADS#. During cache inquiry cycles, even-parity must be placed on the AP line to guarantee an accurate result on the APCHK# (Address Parity Check Status) pin. The Data Parity Bus (DP7-DP0) provides and receives parity data for each of the eight data bus bytes (Table 3.5). The ST6x86 CPU generates even parity on the bus during write cycles and accepts even parity from the system during read cycles. DP7-DP0 is driven only while a write cycle is active. Address Parity Check Status (APCHK#) is driven active by the CPU when an address bus parity error has been detected for a cache inquiry cycle. APCHK# is asserted two clocks after EADS# is sampled asserted, and remains valid for one clock only. Address parity errors signaled by APCHK# have no effect on processor execution. 3.2.5 Data Bus Data Bus (D63-D0) lines carry three-state, bi-directional signals between the ST6x86 CPU and the system (i.e., external memory and I/O devices). The data bus transfers data to the ST6x86 CPU during memory read, I/O read, and interrupt acknowledge cycles. Data is transferred from the ST6x86 CPU during memory and I/O write cycles. Data setup and hold times must be met for correct read cycle operation. The data bus is driven only while a write cycle is active. Table 3.5. Parity Bit to Data Byte Correlation PARITY BIT DATA BYTE DP7 DP6 DP5 DP4 DP3 DP2 DP1 DP0 D63-D56 D55-D48 D47-D40 D39-D32 D31-D24 D23-D16 D15-D8 D7-D0 Parity Check (PCHK#) is asserted when a data bus parity error is detected. Parity is checked during code, memory and I/O reads, and the second interrupt acknowledge cycle. Parity is not checked during the first interrupt acknowledge cycle. Parity is checked for only the active data bytes as determined by the active byte enable signals except during a cache line fill (burst read or "1+4" burst read). During a cache line fill, the ST6x86 CPU assumes all data bytes are valid and parity is checked for all data bytes regardless of the state of the byte enables. PCHK# is valid only during the second clock immediately after read data is returned to the ST6x86 CPU (BRDY# asserted). At other times PCHK# is not active. Parity errors signaled by the assertion of PCHK# have no effect on processor execution. 3-9 ST6x86 3.2.7 Bus Cycle Definition Each bus cycle is assigned a bus cycle type. The bus cycle types are defined by six three-state outputs: CACHE#, D/C#, LOCK#, M/IO#, SCYC, and W/R# as listed in Table 3.6. These bus cycle definition signals are driven valid while ADS# is active. D/C#, M/IO#, W/R#, SCYC and CACHE# remain valid until the clock following the earliest of two signals: NA# asserted, or the last BRDY# for the cycle. LOCK# continues asserted until after BRDY# is returned for the last locked bus cycle. The bus cycle definition signals float during bus hold states. Cache Cycle Indicator (CACHE#) is an output that indicates that the current bus cycle is a potentially cacheable cycle (for a read), or indicates that the current bus cycle is a cache line write-back or line replacement burst cycle (for a write). If CACHE# is asserted for a read cycle and the KEN# input is returned active by the system, the read cycle becomes a cache line fill burst cycle. Data/Control (D/C#) distinguishes between data and control operations. When high, this signal indicates that the current bus cycle is a data transfer to or from memory or I/O. When low, D/C# indicates that the current bus cycle involves a control function such as a halt, interrupt acknowledge or code fetch. 3-10 Bus Lock (LOCK#) is an active low output which, when asserted, indicates that other system bus masters are denied access to control of the CPU bus. The LOCK# signal may be explicitly activated during bus operations by including the LOCK prefix on certain instructions. LOCK# is also asserted during descriptor updates, page table accesses, interrupt acknowledge sequences and when executing the XCHG instruction. However, if the NO_LOCK bit in CCR1 is set, LOCK# is asserted only during page table accesses and interrupt acknowledge sequences. The ST6x86 CPU does not enter the bus hold state in response to HOLD while the LOCK# output is active. Memory/IO (M/IO#) distinguishes between memory and I/O operations. When high, this signal indicates that the current bus cycle is a memory read or memory write. When low, M/IO# indicates that the current bus cycle is an I/O read, I/O write, interrupt acknowledge cycle or special bus cycle. Split Cycle (SCYC) is an active high output that indicates that the current bus cycle is part of a misaligned locked transfer. SCYC is defined for locked cycles only. A misaligned transfer is defined as any transfer that crosses an 8-byte boundary. Write/Read (W/R#) distinguishes between write and read operations. When high, this signal indicates that the current bus cycle is a memory write, I/O write or a special bus cycle. When low, this signal indicates that the current cycle is a memory read, I/O read or interrupt acknowledge cycle. ST6x86 Table 3.6. Bus Cycle Types BUS CYCLE TYPE Interrupt Acknowledge Does not occur. Does not occur. Special Cycles: M/IO# D/C# W/R# CACHE# LOCK# 0 0 0 0 0 0 0 0 1 1 X X 0 1 0 0 0 1 1 1 0 0 0 1 1 1 1 0 X 0 1 X X 1 1 X 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 X 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 If BE(7-0)# = FEh: Shutdown If BE(7-0)# = FDh: Flush (INVD, WBINVD) If A4 = 0 and BE(7-0)# = FBh: Halt (HLT) If BE(7-0)# = F7h: Write-Back (WBINVD) If BE(7-0)# = EFh: Flush Acknowledge (FLUSH#) If A4 = 1 and BE(7-0)# = FBh: Stop Grant (SUSP#) Does not occur. I/O Data Read I/O Data Write Does not occur. Cacheable Memory Code Read (Burst Cycle if KEN# Returned Active) Non-cacheable Memory Code Read Does not occur. Locked Memory Data Read Cacheable Memory Data Read (Burst Cycle if KEN# Returned Active) Non-cacheable Memory Data Read Locked Memory Write Burst Memory Write 1 1 1 0 1* (Writeback or Line Replacement) Single Transfer Memory Write 1 1 1 1 1 Note: X = Don't Care *Note: LOCK# continues to be asserted during a write-back cycle that occurs following an aborted (BOFF# asserted) locked bus cycle. 3-11 ST6x86 3.2.8 Bus Cycle Control The bus cycle control signals (ADS#, ADSC#, BRDY#, BRDYC#, NA#, and SMIACT#) indicate the beginning of a bus cycle and allow system hardware to control bus cycle termination timing and address pipelining. Address Strobe (ADS#) is an active low output which indicates that the CPU has driven a valid address and bus cycle definition on the appropriate output pins. ADS# floats during bus hold states. Cache Address Strobe (ADSC#) performs the same function as ADS#. ADSC# is used to interface directly to a secondary cache controller. Burst Ready (BRDY#) is an active low input that is driven by the system to indicate that the current transfer within a burst cycle or the current single transfer bus cycle can be terminated. The CPU samples BRDY# in the second and subsequent clocks of a cycle. BRDY# is active during address hold states. Cache Burst Ready (BRDYC#) performs the same function as BRDY# and is logically ORed with BRDY internally by the CPU. BRDYC# is used to interface directly to a secondary cache controller. Next Address (NA#) is an active low input that is driven by the system to request the next pending bus cycle address and cycle definition information even though all data transfers for the current bus cycle are not complete. This new bus cycle is referred to as a "pipelined" cycle. If either the current or next bus cycle is a locked cycle, a line replacement, a write-back cycle or there is no pending bus cycle, the ST6x86 CPU does not start a pipelined bus cycle regardless of the state of the NA# input. 3-12 System Management Mode Active (SMIACT#) is an active low output which indicates that the CPU is operating in System Management Mode. SMIACT# is asserted in response to the assertion of SMI# or due to execution of the SMINT instruction. SMIACT# is also asserted during accesses to defined SMM memory if the SMAC bit in CCR1 is set. This bit allows access to SMM memory while not in SMM mode and is typically used for initialization purposes. While servicing an SMI# interrupt or SMINT instruction, SMIACT# remains asserted until a RSM instruction is executed. The RSM instruction causes the ST6x86 CPU to exit SMM mode and negate the SMIACT# output. If a cache inquiry cycle occurs while SMIACT# is active, any resulting write-back cycle is issued with SMIACT# asserted. This occurs even though the write-back cycle is intended for normal memory rather than SMM memory. During RESET, the USE_SMI bit in CCR1 is cleared. While USE_SMI is zero, SMIACT# is always negated. SMIACT# does not float during bus hold states. 3.2.9 Interrupt Control The interrupt control signals (INTR, NMI, SMI#) allow the execution of the current instruction stream to be interrupted and suspended. Maskable Interrupt Request (INTR) is an active high level-sensitive input which causes the processor to suspend execution of the current instruction stream and begin execution of an interrupt service routine. The INTR input can be masked (ignored) through the IF bit in the Flags Register. When not masked, the ST6x86 CPU responds to the INTR input by performing two locked interrupt acknowledge bus cycles. During the second interrupt acknowledge cycle, the ST6x86 CPU reads an 8-bit value, the interrupt vector, from the data bus. The 8-bit interrupt vector indicates the interrupt level that caused generation of the INTR and is used by the CPU to determine the beginning address of the interrupt service routine. To assure recognition of the INTR request, INTR must remain active until the start of the first interrupt acknowledge cycle. ST6x86 Non-Maskable Interrupt Request (NMI) is a rising-edge sensitive input which causes the processor to suspend execution of the current instruction stream and begin execution of an NMI interrupt service routine. The NMI interrupt cannot be masked by the IF bit in the Flags Register. Asserting NMI causes an interrupt which internally supplies interrupt vector 2h to the CPU core. Therefore, external interrupt acknowledge cycles are not issued. Once NMI processing has started, no additional NMIs are processed until an IRET instruction is executed, typically at the end of the NMI service routine. If NMI is re-asserted prior to execution of the IRET, one and only one NMI rising edge is stored and then processed after execution of the next IRET. System Management Interrupt Request (SMI#) is an interrupt input with higher priority than the NMI input. SMI# is a falling edge sensitive input and is sampled on every rising edge of the processor input clock. Asserting SMI# forces the processor to save the CPU state to the top of SMM memory and to begin execution of the SMI service routine at the beginning of the defined SMM memory space. After the processor internally acknowledges the SMI# interrupt, the SMIACT# output is driven low for the duration of the interrupt service routine. Once SMI# servicing has started, no additional SMI# interrupts are processed until a RSM instruction is executed. If SMI# is re-asserted prior to execution of a RSM instruction, one and only one SMI# falling edge is stored and then processed after execution of the next RSM. SMI# is ignored following reset and recognition is enabled by setting the USE_SMI bit in CCR1. 3.2.10 Cache Control The cache control signals (EWBE#, FLUSH#, KEN#, PCD, PWT, WB/WT#) are used to indicate cache status and control caching activity. External Write Buffer Empty (EWBE#) is an active low input driven by the system to indicate when there are no pending write cycles in the external system. The ST6x86 CPU samples EWBE# during write cycles (I/O and memory) only. If EWBE# is not asserted, the processor delays all subsequent writes to on-chip cache lines in the "exclusive" or "modified" state until EWBE# is asserted. Regardless of the state of EWBE#, all writes to the on-chip cache are delayed until any previously issued external write cycle is complete. This ensures that external write cycles occur in program order and is referred to as "strong write ordering". To enhance performance, "weak write ordering" may be allowed for specific address regions using the Address Region Registers (ARRs) and Region Control Registers (RCRs). Cache Flush (FLUSH#) is a falling-edge sensitive input that forces the processor to write-back all dirty data in the cache and then invalidate the entire cache contents. FLUSH# need only be asserted for a single clock but must meet specified setup and hold times to guarantee recognition at a particular clock edge. Once FLUSH# is sampled active, the ST6x86 CPU begins the cache flush sequence after completion of the current instruction. External interrupts and additional FLUSH# requests are ignored while the cache flush is in progress. However, cache inquiry cycles are permitted during the flush sequence. The ST6x86 CPU issues a flush acknowledge special cycle to indicate completion of the flush sequence. If the processor is in a halt or shutdown state, FLUSH# is recognized and the ST6x86 CPU returns to the halt or shutdown state following completion of the flush sequence. If FLUSH# is active at the falling edge of RESET, the processor enters three state test mode. 3-13 ST6x86 Cache Enable (KEN#) is an active low input which indicates that the data being returned during the current cycle is cacheable. When the ST6x86 CPU is performing a cacheable code fetch or memory data read cycle and KEN# is sampled asserted, the cycle is transformed into a cache line fill (4 transfer burst cycle) or a "1+4" cache line fill. KEN# is sampled with the first asserted BRDY# or NA# for the cycle. I/O accesses, locked reads, system management memory accesses and interrupt acknowledge cycles are never cached. Page Cache Disable (PCD) is an active high output that reflects the state of the PCD page attribute bit in the page table entry or the directory table entry. If paging is disabled or for cycles that are not paged, the PCD pin is driven low. PCD is masked by the cache disable (CD) bit in CR0 (driven high if CD=1) and floats during bus hold states. Page Write Through (PWT) is an active high output that reflects the state of the PWT page attribute bit in the page table entry or the directory table entry. If paging is disabled or for cycles that are not paged, the PWT pin is driven low. If PWT is asserted, PWT takes priority over the WB/WT# input. If PWT is asserted for either reads or writes, the cache line is saved in, or remains in, the shared (write-through) state. PWT floats during bus hold states. 3-14 The Write-Back/Write-Through (WB/WT#) input allows the system to define the write policy of the on-chip cache on a line-by-line basis. If WB/WT# is sampled high during a line fill cycle and PWT is low, the line is defined as write-back and is stored in the exclusive state. If WB/WT# is sampled high during a write to a write-through cache line (shared state) and PWT is low, the line is transitioned to write-back (exclusive state). If WB/WT# is sampled low or PWT is high, the line is defined as write-through and is stored in (line fill), or remains in (write), the shared state. Table 3.7 lists the effects of WB/WT# on the state of the cache line for various bus cycles. Table 3.7. Effects of WB/WT# on Cache Line State BUS CYCLE TYPE PWT WB/ WT# WRITE POLICY MESI STATE WriteShared through WriteLine Fill 0 1 Exclusive back WriteLine Fill 1 x Shared through Memory Write Write0 0 Shared (Note) through Memory Write Write0 1 Exclusive (Note) back Memory Write Write1 x Shared (Note) through Note: Only applies to memory writes to addresses that are currently valid in the cache. Line Fill 0 0 ST6x86 3.2.11 Bus Arbitration The bus arbitration signals (BOFF#, BREQ, HOLD, and HLDA) allow the ST6x86 CPU to relinquish control of its local bus when requested by another bus master device. Once the processor has released its bus, the bus master device can then drive the local bus signals. Back-Off (BOFF#) is an active low input that forces the ST6x86 CPU to abort the current bus cycle and relinquish control of the CPU's local bus in the next clock. The ST6x86 CPU responds to BOFF# by entering the bus hold state as listed in Table 3.8. The ST6x86 CPU remains in bus hold until BOFF# is negated. Once BOFF# is negated, the ST6x86 CPU restarts any aborted bus cycle in its entirety. Any data returned to the ST6x86 CPU while BOFF# is asserted is ignored. If BOFF# is asserted in the same clock that ADS# is asserted, the ST6x86 CPU may float ADS# while in the active low state. Bus Request (BREQ) is an active high output asserted by the ST6x86 CPU whenever a bus cycle is pending internally. The ST6x86 CPU always asserts BREQ in the first clock of a bus cycle with ADS# as well as during bus hold and address hold states if a bus cycle is pending. If no additional bus cycles are pending, BREQ is negated prior to termination of the current cycle. Bus Hold Request (HOLD) is an active high input used to indicate that another bus master requests control of the CPU's local bus. After recognizing the HOLD request and completing the current bus cycle or sequence of locked bus cycles, the ST6x86 CPU responds by floating the local bus and asserting the hold acknowledge (HLDA) output. The bus remains granted to the requesting bus master until HOLD is negated. Once HOLD is sampled negated, the ST6x86 CPU simultaneously drives the local bus and negates HLDA. Hold Acknowledge (HLDA) is an active high output used to indicate that the ST6x86 CPU has responded to the HOLD input and has relinquished control of its local bus. Table 3.8 lists the state of all the ST6x86 CPU signals during a bus hold state. The ST6x86 CPU continues to operate during bus hold states as long as the on-chip cache can satisfy bus requests. HLDA is asserted until HOLD is negated. Once HOLD is sampled negated, the ST6x86 CPU simultaneously drives the local bus and negates HLDA. 3-15 ST6x86 Table 3.8. Signal States During Bus Hold SIGNAL LINE A20M# A31-A3 ADS# ADSC# AHOLD AP APCHK# BE7#-BE0# BHOLD BOFF# BRDY# BRDYC# BREQ CACHE# D/C# D63-D0 DHOLD DP7-DP0 EADS# EWBE# FERR# FLUSH# HIT# HITM# HLDA HOLD IGNNE# STATE Recognized internally Float Float Float Ignored Float Driven Float Ignored Recognized Ignored Ignored Driven Float Float Float Ignored Float Recognized Recognized internally Driven Recognized Driven Driven Responds to HOLD Recognized Recognized internally SIGNAL LINE INTR INV KEN# LBA# LOCK# M/IO# NA# NMI PCD PCHK# PWT QDUMP# RESET SCYC SMI# SMIACT# SUSP# SUSPA# TCK TDI TDO TMS TRST# W/R# WB/WT# WM_RST STATE Recognized internally Recognized Ignored Float Float Float Ignored Recognized internally Float Driven Float Recognized Recognized Float Recognized Driven Recognized Driven Recognized Recognized Responds to TCK, TDI, TMS, TRST# Recognized Recognized Float Ignored Recognized 3-16 ST6x86 3.2.12 Cache Coherency The cache coherency signals (AHOLD, EADS#, HIT#, HITM#, and INV) are used to initiate and monitor cache inquiry cycles. These signals are intended to be used to ensure cache coherency in a uni-processor environment only. Contact Cyrix for additional specifications on maintaining coherency in a multi-processor environment. Address Hold Request (AHOLD) is an active high input which forces the ST6x86 CPU to float A31-A3 and AP in the next clock cycle. While AHOLD is asserted, only the address bus is disabled. The current bus cycle remains active and can be completed in the normal fashion. The ST6x86 CPU does not generate additional bus cycles while AHOLD is asserted except write-back cycles in response to a cache inquiry cycle. External Address Strobe (EADS#) is an active low input used to indicate to the ST6x86 CPU that a valid cache inquiry address is being driven on the ST6x86 CPU address bus (A31-A5) and AP. The ST6x86 CPU checks the on-chip cache for this address. If the address is present in the cache the HIT# signal is asserted. If the data associated with the inquiry address is "dirty" (modified state), the HITM# signal is also asserted. If dirty data exists, a write-back cycle is issued to update external memory with the dirty data. Additional cache inquiry cycles are ignored while HITM# is asserted. Hit on Cache Line (HIT#) is an active low output used to indicate that the current cache inquiry address has been found in the cache (modified, exclusive or shared states). HIT# is valid two clocks after EADS# is sampled active, and remains valid until the next cache inquiry cycle. Hit on Modified Data (HITM#) is an active low output used to indicate that the current cache inquiry address has been found in the cache and dirty data exists in the cache line (modified state). If HITM# is asserted, a write-back cycle is issued to update external memory. HITM# is valid two clocks after EADS# is sampled active, and remains asserted until two clocks after the last BRDY# of the write-back cycle is sampled active. The ST6x86 CPU does not accept additional cache inquiry cycles while HITM# is asserted. Invalidate Request (INV) is an active high input used to determine the final state of the cache line in the case of a cache inquiry hit. INV is sampled with EADS#. A logic one on INV directs the processor to change the state of the cache line to "invalid". A logic zero on INV directs the processor to change the state of the cache line to "shared." The state of the INV pin at the time EADS# is sampled active determines the final state of the cache line. If INV is sampled high, the final state of the cache line is "invalid". If INV is sampled low, the final state of the cache line is "shared." A cache inquiry cycle using EADS# may be run while the ST6x86 CPU is in either an address hold or bus hold state. The inquiry address must be driven by an external device. 3-17 ST6x86 3.2.13 FPU Error Interface 3.2.14 Scatter/Gather Buffer Interface The FPU interface signals FERR# and IGNNE# are used to control error reporting for the on-chip floating point unit. These signals are typically used for a PC-compatible system implementation. For other applications, FPU errors are reported to the ST6x86 CPU CPU core through an internal interface. The scatter/gather buffer interface signals (BHOLD, DHOLD, LBA#, QDUMP#), in conjunction with the byte enables (BE7#-BE0#) and address hold (AHOLD), can be used by the system hardware to transfer data to/from a 32-bit peripheral interface bus. A 64-bit buffer resides in the ST6x86 CPU to assist the system in these transfers. This buffer provides scatter/gather capability during four different types of transfers as listed in Table 3.9. Floating Point Error Status (FERR#) is an active low output asserted by the ST6x86 CPU when an unmasked floating point error occurs. FERR# is asserted during execution of the FPU instruction that caused the error. FERR# does not float during bus hold states. Ignore Numeric Error (IGNNE#) is an active low input which forces the ST6x86 CPU to ignore any pending unmasked FPU errors and allows continued execution of floating point instructions. When IGNNE# is not asserted and an unmasked FPU error is pending, the ST6x86 CPU only executes the following floating point instructions: FNCLEX, FNINIT, FNSAVE, FNSTCW, FNSTENV, and FNSTSW#. IGNNE# is ignored when the NE bit in CR0 is set to a 1. 3-18 ST6x86 Table 3.9. Scatter/Gather Cycles BHOLD USED DHOLD USED QDUMP# USED CPU Write to 32-Bit Bus x -- -- CPU Read from 32-Bit Bus x -- -- 32-Bit Bus Master Write to Memory * (1) Scatter/gather buffer load from 32-bit bus master. x x -- Data sampled 1 clock after byte enables asserted. x -- x Data driven 1 clock after QDUMP# asserted. x x -- Data sampled 1 clock after byte enables asserted. CYCLE TYPE (2) Scatter/gather buffer write to memory. 32-Bit Bus Master Read from Memory * (1) Scatter/gather buffer load from memory. DATA BUS TIMING Data driven 1 clock after byte enables asserted. Data sampled 1 clock after byte enables asserted. (2) Scatter/gather buffer write to 32-bit Data driven 1 clock after QDUMP# x -x bus master. asserted. *Note: Bus master transfers using the scatter/gather buffer must be initiated while the CPU bus is in a bus hold state or an idle state. These cycles cannot occur during CPU initiated bus cycles. Byte Enable Hold (BHOLD) is an active high input that causes the ST6x86 CPU to float the byte enable outputs (BE7#-BE0#) in the next clock. While BHOLD is asserted, only the byte enables are disabled. The current bus cycle remains active and can be completed in the normal fashion. The ST6x86 CPU continues to generate additional bus cycles while BHOLD is asserted, so BHOLD should only be asserted while AHOLD is asserted. BHOLD is asserted by the external system during scatter/gather buffer cycles. While BHOLD is asserted, the byte enables are driven by an external source and indicate which bytes of the data bus should be loaded into/written out of the scatter/gather buffer. The ST6x86 CPU samples the byte enables at each rising clock edge while BHOLD is asserted. Table 3.10 lists the byte enable mappings for the scatter/gather cycles. 3-19 ST6x86 Table 3.10. Byte Enable Map for Scatter/Gather Cycles CYCLE TYPE BE7-BE0# SOURCE DESTINATION CPU Data Bus Scatter/Gather Buffer No Transfer No Transfer CPU Read from 32-Bit Bus 31-0 31-0 31-0 63-32 63-0 63-0 Scatter/Gather Buffer CPU Data Bus FF No Transfer No Transfer CPU Write to 32-Bit Bus* Fx 31-0 31-0 xF 63-32 31-0 xx 63-0 63-0 CPU Data Bus Scatter/Gather Buffer FF No Transfer No Transfer Scatter/Gather Buffer Load for Fx 31-0 31-0 32-Bit Bus Master xF 31-0 63-32 xx 63-0 63-0 Scatter/Gather Buffer CPU Data Bus FF No Transfer No Transfer Scatter/Gather Buffer Dump Fx 31-0 31-0 using QDUMP# xF 63-32 31-0 xx 63-0 63-0 *Note: If LBA# is active during a CPU write cycle with BE3-BE0# inactive, the ST6x86 CPU automatically maps the upper dword of data (D63-D32) to the lower dword of the data bus (D31-D0). FF Fx xF xx Data Bus Hold (DHOLD) is an active high input that forces the ST6x86 CPU to float the data bus lines (D63-D0) and the data parity lines (DP7-DP0) in the next clock. While DHOLD is asserted, only the data and data parity buses are disabled. The current bus cycle remains active and is completed in the normal fashion in response to BRDY#. The ST6x86 CPU generates additional bus cycles while DHOLD is asserted. To avoid writing invalid data, during a write cycle, DHOLD and BRDY# should not be asserted at the same time, Local Bus Access (LBA#) is an active low output asserted by the ST6x86 CPU for any I/O bus cycle or for any bus access that resides within a "local bus" address region as specified by the on-chip configuration registers. LBA# is asserted during the clock that ADS# is asserted and remains asserted for only one clock. LBA# is used to indicate a cycle intended to address a device using the 32-bit peripheral bus. If LBA# is active during a CPU write cycle with BE(3-0)# inactive, the ST6x86 CPU automatically maps the upper dword of data to the lower dword of the data bus. The external system asserts DHOLD during scatter/gather buffer load cycles when the ST6x86 CPU is not the bus master. While DHOLD is asserted, the data bus is driven by an external source and the information is loaded into the scatter/gather buffer based on the state of the byte enables (BHOLD asserted). The data bus is sampled one clock after the clock edge at which an active byte enable is sampled. Q Buffer Dump (QDUMP#) is an active low input asserted by the external system to dump the contents of the scatter/gather buffer to the data bus. The data bytes specified by the asserted byte enables are driven onto the data bus during the clock after QDUMP# is sampled asserted. QDUMP# must be asserted at the falling edge of RESET to enable the scatter/gather interface pins. 3-20 ST6x86 3.2.15 Power Management Interface The two power management signals (SUSP#, SUSPA#) allow the ST6x86 CPU to enter and exit suspend mode. The ST6x86 CPU also enters suspend mode as the result of executing a HALT instruction if the HALT bit is set in CCR2. Suspend mode circuitry forces the ST6x86 CPU to consume minimal power while maintaining the entire internal CPU state. Suspend Request (SUSP#) is an active low input which requests that the ST6x86 CPU enter suspend mode. After recognition of an active SUSP# input, the ST6x86 CPU completes execution of the current instruction, any pending decoded instructions and associated bus cycles, issues a stop grant bus cycle, and then asserts the SUSPA# output. SUSP# is ignored following RESET and is enabled by setting the SUSP bit in CCR2. The Suspend Acknowledge (SUSPA#) output indicates that the ST6x86 CPU has entered low-power suspend mode as the result of either assertion of SUSP# or execution of a HALT instruction. SUSPA# remains asserted until SUSP# is negated, or until an interrupt is serviced if suspend mode was entered via the HALT instruction. If SUSP# is asserted and then negated prior to SUSPA# assertion, SUSPA# may toggle state after SUSP# negates. The ST6x86 CPU accepts cache flush requests and cache inquiry cycles while SUSPA# is asserted. If FLUSH# is asserted, the CPU exits the low power state and services the flush request. After completion of all required write-back cycles, the CPU returns to the low power state. SUSPA# negates during the write-back cycles. Before issuing the write-back cycle, the CPU may execute several code fetches. If AHOLD, BOFF# or HOLD is asserted while SUSPA# is asserted, the CPU exits the low power state in preparation for a cache inquiry cycle. After completion of any required write-back cycles resulting from the cache inquiry, the CPU returns to the low power state only if HOLD, BOFF# and AHOLD are negated. SUSPA# negates during the write-back cycle. Table 3.11 lists the ST6x86 CPU signal states for suspend mode when initiated by either SUSP# or the HALT instruction. SUSPA# is disabled (three-state) following RESET and is enabled by setting the SUSP bit in CCR2. 3-21 ST6x86 Table 3.11. Signal States During Suspend Mode SIGNAL LINE A20M# A31-A3 ADS# ADSC# AHOLD AP APCHK# BE7#-BE0# BHOLD BOFF# BRDY# BRDYC# BREQ CACHE# D/C# D63-D0 DHOLD DP7-DP0 EADS# EWBE# FERR# FLUSH# HIT# HITM# HLDA HOLD IGNNE# 3-22 SUSP# INITIATED/ HALT INITIATED Ignored Driven 1 1 Recognized Driven 1 Driven Ignored Recognized Ignored Ignored 0 Driven Driven Float Ignored Float Recognized Ignored 1 Recognized Driven 1 Driven in response to HOLD Recognized Ignored SIGNAL LINE INTR INV KEN# LBA# LOCK# M/IO# NA# NMI PCD PCHK# PWT QDUMP# RESET SCYC SMI# SMIACT# SUSP# SUSPA# TCK TDI TDO TMS TRST# W/R# WB/WT# WM_RST SUSP# INITIATED/ HALT INITIATED Latched/Recognized Recognized Ignored 1 1 Driven Ignored Latched/Recognized Driven 1 Driven Ignored Recognized Driven Latched/Recognized 1 0 / Recognized 0 Recognized Recognized Responds to TCK, TDI, TMS, TRST# Recognized Recognized Driven Ignored Latched/Recognized ST6x86 3.2.16 JTAG Interface The ST6x86 CPU can be tested using JTAG Interface (IEEE Std. 1149.1) boundary scan test logic. The ST6x86 CPU pin state can be set according to serial data supplied to the chip. The ST6x86 CPU pin state can also be recorded and supplied as serial data. Test Clock (TCK) is the clock input used by the ST6x86 CPU boundary scan (JTAG) test logic. The rising edge of TCK is used to clock control and data information into the ST6x86 CPU using the TMS and TDI pins. The falling edge of TCK is used to clock data information out of the ST6x86 CPU using the TDO pin. Test Data Input (TDI) is the serial data input used by the ST6x86 CPU boundary scan (JTAG) test logic. TDI is sampled on the rising edge of TCK. Test Data Output (TDO) is the serial data output used by the ST6x86 CPU boundary scan (JTAG) test logic. TDO is output on the falling edge of TCK. Test Mode Select (TMS) is the control input used by the ST6x86 CPU boundary scan (JTAG) test logic. TMS is sampled on the rising edge of TCK. Test Reset (TRST#) is an active low input used to initialize the ST6x86 CPU boundary scan (JTAG) test logic. 3-23 ST6x86 3.3 Functional Timing 3.3.1 Reset Timing Figure 3.2 illustrates the required RESET timing for both a power-on reset and a reset that occurs during operation. The WM_RST, FLUSH# and QDUMP# inputs are sampled at the falling edge of RESET to determine if the ST6x86 CPU should enter built-in self-test, enable tri-state test mode or enable the scatter-gather interface pins, respectively. WM_RST, FLUSH# and QDUMP# must be valid at least two clocks prior to the RESET falling edge. Figure 3.2. RESET Timing CLK Reset Inactive = 2 CLKs Min. RESET Reset after Power-On = 15 CLKs Min. Power-On Reset = 1 msec Min. WM_RST VALID FLUSH# VALID QDUMP# VALID Note 1. ADS# asserted approximately 150-200 clocks after RESET fal ling edge if no built-in self-t est Note 2. ADS# asserted approximately 2**19 clocks after RESET fal ling edge if built-in self-test requested. Note 3. Output pins driven to specified RESET state a maximum of 2 CLKs after RESET rising edge. 1734900 3-24 ST6x86 3.3.2 Bus State Definition The ST6x86 CPU bus controller supports non-pipelined and pipelined operation as well as single transfer and burst bus cycles. During each CLK period, the bus controller exists in one of six states as listed in Table 3.12. Each of these bus states and its associated state transitions is illustrated in (Figure 3.3) and listed in Table 3.13. Table 3.12. ST6x86 CPU Bus States STATE NAME DESCRIPTIO N Ti Idle Clock During Ti, no bus cycles are in progress. BOFF# and RESET force the bus to the idle state. The bus is always in the idle state while HLDA is active. T1 First Bus Cycle Clock During the first clock of a non-pipelined bus cycle, the bus enters the T1 state. ADS# is asserted during T1 along with valid address and bus cycle definition information. T2 During the second clock of a non-pipelined bus cycle, the bus enters the T2 state. The bus remains in the T2 state for subsequent clocks of the bus cycle as long as a pipelined cycle is not initiated. During T2, valid data is driven Second and Subsequent Bus during write cycles and data is sampled during reads. BRDY# is also samCycle Clock pled during T2. The bus also enters the T2 state to complete bus cycles that were initiated as pipelined cycles but complete as the only outstanding bus cycle. T12 During the first clock of a pipelined cycle, the bus enters the T12 state. DurFirst Pipelined Bus Cycle ing T12, data is being transferred and BRDY# is sampled for the current Clock cycle at the same time that ADS# is asserted and address/bus cycle definition information is driven for the next (pipelined) cycle. T2P During the second and subsequent clocks of a pipelined bus cycle where two cycles are outstanding, the bus enters the T2P state. During T2P, data is Second and Subsequent being transferred and BRDY# is sampled for the current cycle. However, Pipelined Bus Cycle Clock valid address and bus cycle definition information continues to be driven for the next pipelined cycle. Td 3-25 Dead Clock The bus enters the Td state if a pipelined cycle was initiated that requires one idle clock to turn around the direction of the data bus. Td is required for a read followed immediately by a pipelined write, and for a write followed immediately by a pipelined read. ST6x86 Figure 3.3. ST6x86 CPU Bus State Diagram P (from any state) Ti A B F T1 E C D T2 O G H N TD L T12 I J M K T2P 1741800 3-26 ST6x86 Table 3.13. Bus State Transitions TRANSITION 3-27 CURRENT STATE NEXT STATE EQUATION A Ti Ti No Bus Cycle Pending. B Ti T1 New or Aborted Bus Cycle Pending. C T1 T2 Always. D T2 T2 Not Last BRDY# and No New Bus Cycle Pending, or Not Last BRDY# and New Bus Cycle Pending and NA# Negated. E T2 T1 Last BRDY# and New Bus Cycle Pending and HITM# Negated. F T2 Ti Last BRDY# and No New Bus Cycle Pending, or Last BRDY# and HITM# Asserted. G T2 T12 Not Last BRDY# and New Bus Cycle Pending and NA# Sampled Asserted. H T12 T2 Last BRDY# and No Dead Clock Required. I T12 Td Last BRDY# and Dead Clock Required. J T12 T2P Not Last BRDY#. K T2P T2P Not Last BRDY#. L T2P T2 Last BRDY# and No Dead Clock Required. M T2P Td Last BRDY# and Dead Clock Required. N Td T12 New Bus Cycle Pending and NA# Sampled Asserted. O Td T2 No New Bus Cycle Pending, or New Bus Cycle Pending and NA# Negated. P Any State Ti RESET Asserted, or BOFF# Asserted. ST6x86 3.3.3 Non-pipelined Bus Cycles Non-pipelined bus operation may be used for all bus cycle types. The term "non-pipelined" refers to a mode of operation where the CPU allows only one outstanding bus cycle. In other words, the current bus cycle must complete before a second bus cycle is allowed to start. 3.3.3.1 Non-pipelined Single Transfer Cycles Single transfer read cycles occur during non-cacheable memory reads, I/O read cycles, and special cycles. A non-pipelined single transfer read cycle begins with address and bus cycle definition information driven on the bus during the first clock (T1 state) of the bus cycle. The CPU then monitors the BRDY# input at the end of the second clock (T2 state). If BRDY# is asserted, the CPU reads the appropriate data and data parity lines and terminates the bus cycle. If BRDY# is not active, the CPU continues to sample the BRDY# input at the end of each subsequent cycle (T2 states). Each of the additional clocks is referred to as a wait state. The CPU uses the data parity inputs to check for even parity on the active data lines. If the CPU detects an error, the parity check output (PCHK#) asserts during the second clock following the termination of the read cycle. Figure 3.4 illustrates the functional timing for two non-pipelined single-transfer read cycles. Cycle 2 is a potentially cacheable cycle as indicated by the CACHE# output. Because this cycle is potentially cacheable, the CPU samples the KEN# input at the same clock edge that BRDY# is asserted. If KEN# is negated, the cycle terminates as shown in the diagram. If KEN# is asserted, the CPU converts this cycle into a burst cycle as described in the next section. NA# must be negated for non-pipelined operation. Pipelined bus cycles are described later in this chapter. 3-28 ST6x86 Figure 3.4. Non-Pipelined Single Transfer Read Cycles Ti T1 T2 T1 T2 T2 T2 Ti Ti Ti CLK ADS# Address, AP CYCLE 1 CYCLE 2 VALID VALID CACHE# W/R# NA# BRDY# KEN# DATA, DP PCHK# VALID Cycle 1: Non-Cacheable, 0 WaitState Read 3-29 IN IN Cycle 2: Potentially Cacheable, 2 Wait-State Read VALID 1735000 ST6x86 Single transfer write cycles occur for writes that are neither line replacement nor write-back cycles. The functional timing of two non-pipelined single transfer write cycles is shown in Figure 3.5. During a write cycle, the data and data parity lines are outputs and are driven valid during the second clock (T2 state) of the bus cycle. Data and data parity remain valid during all wait states. If the write cycle is a write to a valid cache location in the "shared" state, the WB/WT# pin is sampled with BRDY#. If WB/WT# is sampled high, the cache line transitions from the "shared" to the "exclusive" state. Figure 3.5. Non-Pipelined Single Transfer Write Cycles Ti T1 T2 T1 T2 T2 T2 Ti CLK ADS# Address, AP CYCLE 1 CYCLE 2 VALID VALID CACHE# W/R# NA# BRDY# WB/WT# VALID VALID DATA, DP OUT Cycle 1: 0 Wait-State Write OUT Cycle 2: 2 Wait-State Write 1735100 3-30 ST6x86 3.3.3.2 Non-pipelined Burst Read Cycles The ST6x86 CPU uses burst read cycles to perform cache line fills. During a burst read cycle, four 64-bit data transfers occur to fill one of the CPU's 32-byte internal cache lines. A non-pipelined burst read cycle begins with address and bus cycle definition information driven on the bus during the first clock (T1 state) of the bus cycle. The CACHE# output is always active during a burst read cycle and is driven during the T1 clock. The CPU then monitors the BRDY# input at the end of the second clock (T2 state). If BRDY# is asserted, the CPU reads the data and data parity and also checks the KEN# input. If KEN# is negated, the CPU terminates the bus cycle as a single transfer cycle. If KEN# is asserted, the CPU converts the cycle into a burst (cache line fill) by continuing to sample BRDY# at the end of each subsequent clock. BRDY# must be asserted a total of four times to complete the burst cycle. WB/WT# is sampled at the same clock edge as KEN#. In conjunction with PWT and the on-chip configuration registers, WB/WT# determines the MESI state of the cache line for the current line fill. 3-31 Each time BRDY# is sampled asserted during the burst cycle, a data transfer occurs. The CPU reads the data and data parity busses and assigns the data to an internally generated burst address. Although the CPU internally generates the burst address sequence, only the first address of the burst is driven on the external address bus. System logic must predict the burst address sequence based on the first address. Wait states may be added to any transfer within a burst by delaying the assertion of BRDY# by the desired number of clocks. The CPU checks even data parity for each of the four transfers within the burst. If the CPU detects an error, the parity check output (PCHK#) asserts during the second clock following the BRDY# assertion of the data transfer. Figure 3.6 illustrates two non-pipelined burst read cycles. The cycles shown are the fastest possible burst sequences (2-1-1-1). NA# must be negated for non-pipelined operation as shown in the diagram. Pipelined bus cycles are described later in this chapter. Figure 3.7 depicts a burst read cycle with wait states. A 3-2-2-2 burst read is shown. ST6x86 Figure 3.6. Non-Pipelined Burst Read Cycles Ti T1 T2 T2 T2 T2 T1 T2 T2 T2 T2 Ti CLK ADS# CY CLE 1 CYC LE 2 Address, AP VALID VALID CACHE# W/R# NA# BRDY# KEN# WB/WT# DATA, DP VALID IN VALID IN PCHK# IN VALID Cycle 1: 2-1-1-1 Burst Read Cycle IN VALID IN VALID VALID IN IN VALID Cycle 2: 2-1-1-1 Burst Read Cycle IN VALID VALID 1735200 3-32 ST6x86 Figure 3.7. Burst Cycle with Wait States Ti T1 T2 T2 T2 T2 T2 T2 T2 T2 Ti Ti CLK ADS # C YCLE 1 A ddre ss, AP VA LID CA CHE # W /R# BRDY # K EN# WB /WT # VALI D DATA , D P IN IN P CHK # VALID IN VA LID IN VALID VA LID Cycle 1: 3-2-2-2 Burst Re ad Cycle 1735400 Burst Cycle Address Sequence. The ST6x86 CPU provides two different address sequences for burst read cycles. The ST6x86 CPU burst cycle address sequence modes are referred to as "1+4" and "linear". After reset, the CPU default mode is "1+4". In "1+4" mode, the CPU performs a single transfer read cycle prior to the burst cycle, if the desired first address is (...xx8). During this single transfer read cycle, the CPU reads the critical data. In addition, the ST6x86 CPU samples the state of KEN#. 3-33 If KEN# is active, the CPU then performs the burst cycle with the address sequence shown in Table 3.14. The ST6x86 CPU CACHE# output is not asserted during the single read cycle prior to the burst. Therefore, CACHE# must not be used to qualify the KEN# input to the processor. In addition, if KEN# is returned active for the "1" read cycle in the "1+4", all data bytes supplied to the CPU must be valid. The CPU samples WB/WT# during the "1" read cycle, and does not resample WB/WT# during the following burst cycle. Figure 3.8 illustrates a "1+4" burst read cycle. ST6x86 Table 3.14. "1+4" Burst Address Sequence BURST CYCLE FIRST ADDRESS SINGLE READ CYCLE PRIOR TO BURST BURST CYCLE ADDRESS SEQUENCE 0 None 0-8-10-18 8 Address 8 0-8-10-18 10 None 10-18-0-8 18 Address 18 10-18-0-8 Figure 3.8. "1+4" Burst Read Cycle Ti T1 T2 T1 T2 T2 T2 T2 Ti Ti C LK ADS # C Y C LE 1 C YC LE 2 A d d ress , A P V A L ID (A 4 -A 0 = 0 8 h o r 1 8 h ) V A L ID (A 4 -A 0 = 0 0 h o r 1 0 h ) CA CHE# W /R # NA # BRD Y# KEN # K E N # m u s t b e a s s e r te d f o r b o t h c y c le s . W B /W T # DA TA, D P V A L ID IN IN P CH K# V A L ID C y c l e 1 : S i n g le t r a n s f e r r e a d IN IN V A L ID IN V A L ID V A L ID V A L ID C y c le 2 : 2 - 1 - 1 - 1 B u r s t R e a d C y c le 1 74 0 3 00 3-34 ST6x86 The address sequences for the ST6x86 CPU's linear burst mode are shown in Table 3.15. Operating the CPU in linear burst mode minimizes processor bus activity resulting in higher system performance. Linear burst mode can be enabled through the ST6x86 CPU CCR3 configuration register. Table 3.15. Linear Burst Address Sequence BURST CYCLE FIRST ADDRESS BURST CYCLE ADDRESS SEQUENCE 0 0-8-10-18 8 8-10-18-0 10 10-18-0-8 18 18-0-8-10 3-35 ST6x86 3.3.3.3 Burst Write Cycles Burst write cycles occur for line replacement and write-back cycles. Burst writes are similar to burst read cycles in that the CACHE# output is asserted and four 64-bit data transfers occur. Burst writes differ from burst reads in that the data and data parity lines are outputs rather than inputs. Also, KEN# and WB/WT# are not sampled during burst write cycles. Data and data parity for the first data transfer are driven valid during the second clock (T2 state) of the bus cycle. Once BRDY# is sampled asserted for the first data transfer, valid data and data parity for the second transfer are driven during the next clock cycle. The same timing relationship between BRDY# and data applies for the third and fourth data transfers as well. Wait states may be added to any transfer within a burst by delaying the assertion of BRDY# by the required number of clocks. As on burst read cycles, only the first address of a burst write cycle is driven on the external address bus. System logic must predict the remaining burst address sequence based on the first address. Burst write cycles always begin with a first address ending in 0 (signals A4-A0=0) and follow an ascending address sequence for the remaining transfers (0-8-10-18). Figure 3.9 illustrates two non-pipelined burst write cycles. The cycles shown are the fastest possible burst sequences (2-1-1-1). As shown, an idle clock always exists between two back-to-back burst write cycles. Therefore, the second burst write cycle in a pair of back-to-back burst writes is always issued as a non-pipelined cycle regardless of the state of the NA# input. Figure 3.9. Non-Pipelined Burst Write Cycles Ti T1 T2 T2 T2 T2 Ti* T1 T2 T2 T2 T2 Ti CLK ADS# Address, AP CYC LE 2 CYC LE 1 VALID (A4-A0 = 00h) VALID (A4-A0 = 00h) CACHE# W/R# NA# BRDY# DATA, DP OUT OUT OUT C yc le 1 : 2 - 1 - 1 - 1 B ur st Wr i t e C y cl e OUT OUT OUT Cycle 2: 2-1-1-1 Burst Write Cycle OUT OUT 1735300 *N o t e : T i st a t e a l w a y s e x is t s b e t we e n t w o b a c k- t o- b a c k b u rs t w r it e cy c l e s . 3-36 ST6x86 3.3.4 Pipelined Bus Cycles Pipelined addressing is a mode of operation where the CPU allows up to two outstanding bus cycles at any given time. Using pipelined addressing, the address of the first bus cycle is driven on the bus and while the CPU waits for the data for the first cycle, the address for a second bus cycle is issued. Pipelined bus cycles occur for all cycle types except locked cycles and burst write cycles. Pipelined cycles are initiated by asserting NA#. The CPU samples NA# at the end of each T2, T2P and Td state. KEN# and WB/WT# are sampled at either the same clock as NA# is active, or at the same clock as the first BRDY# for that cycle, whichever occurs first. The CPU issues the next address a minimum of two clocks after NA# is sampled asserted. The CPU latches the state of the NA# pin internally. Therefore, even if a new bus cycle is not pending internally at the time NA# was sampled asserted, the CPU still issues a pipelined bus cycle if an internal bus request occurs prior to completion of the current bus cycle. Once NA# is sampled asserted, the state of NA# is ignored until the current bus cycle completes. If two cycles are outstanding and the second cycle is a read, the CPU samples KEN# and WB/WT# for the second cycle when NA# is sampled asserted. Figure 3.10 and Figure 3.11 illustrate pipelined single transfer read cycles and pipelined burst read cycles, respectively. Figure 3.10. Pipelined Single Transfer Read Cycles Ti T1 T2 T2 T12 T2 T2 Ti CLK CPU enters idle bus state because ADS# Address, AP CYCLE 1 CYCLE 2 no bus cycle pending internally. VALID 1 VALID 2 CACHE# W/R# NA# BRDY# KE N # s am pl ed w hen N A # sam pl ed as s er t ed. KEN# DATA, DP IN 1 PCHK# VALID 1 C y cl e 1: N on- C ac heabl e, 2 W ai t S t at e R ea d 3-37 IN 2 C y cl e 2: P ot ent ial l y Ca c heable, P i pel i ned R ead C y c l e VALID 2 1735500 ST6x86 Figure 3.11. Pipelined Burst Read Cycles Ti T1 T2 T2 T12 T2P T2 T2 T2 T2 Ti Ti CLK ADS# Address, AP CYCLE 1 CYCLE 2 VALID 1 VALID 2 CACHE# W/R# NA# BRDY# KEN# WB/WT# DATA, DP VALID IN 1 PCHK# VALID IN 1 IN 1 VALID 1 Cycle 1: 2-1-1-1 Burst Read Cycle IN 1 VALID 1 IN 2 VALID 1 Cycle 2: Pipelined Burst Read Cycle IN 2 VALID 1 IN 2 VALID 2 IN 2 VALID 2 VALID 2 VALID 2 1741500 3-38 ST6x86 3.3.4.1 Pipelined Back-to-Back Read/Write Cycles Figure 3.12 depicts a read cycle followed by a pipelined write cycle. Under this condition, the data bus must change from an input for the read cycle to an output for the write cycle. In order to accomplish this transition without causing data bus contention, the CPU automatically inserts a "dead" (Td) clock cycle. During the Td state, the data bus floats. The CPU then drives the write data onto the bus in the following clock. The CPU also inserts a Td clock between a write cycle and a pipelined read cycle to allow the data bus to smoothly transition from an output to an input. Figure 3.12. Read Cycle Followed by Pipelined Write Cycle Ti T1 T2 T2 T12 T2P Td T2 Ti CLK ADS# Address, AP CYCLE 1 CYCLE 2 VALID 1 VALID 2 CACHE# W/R# NA# BRDY# KEN# DATA, DP IN 1 PCHK# IN 1 VALID 1 Cycle 1: 2-1-1-1 Burst Read 3-39 IN 1 IN 1 VALID 1 Cycle 2: Pipelined Write OUT 2 VALID 1 VALID 1 1735700 ST6x86 3.3.5 Interrupt Acknowledge Cycles The CPU issues interrupt acknowledge bus cycles in response to an active INTR input. Interrupt acknowledge cycles are single transfer cycles and always occur in locked pairs as shown in Figure 3.13. The CPU reads the interrupt vector from the lower eight bits of the data bus at the completion of the second interrupt acknowledge cycle. Parity is not checked during the first interrupt acknowledge cycle. M/IO#, D/C# and W/R# are always logic low during interrupt acknowledge cycles. Additionally, the address bus is driven with a value of 0000 0004h for the first interrupt acknowledge cycle and with a value of 0000 0000h for the second. A minimum of one idle clock always occurs between the two interrupt acknowledge cycles. Figure 3.13. Interrupt Acknowledge Cycles Ti T1 T2 Ti T1 T2 Ti Ti CLK Idle States = 1 CLK MIN ADS# Address CYCLE 1 CYCLE 2 0000 0004h 0000 0000h M/IO#, D/C#, W/R# LOCK# RDY#, BRDY# DATA IN IN VALID Interrupt Vector Read During Second Interrupt Acknowledge Cycle. 1735800 3-40 ST6x86 3.3.6 SMI# Interrupt Timing The CPU samples the System Management Interrupt (SMI#) input at each clock edge. At the next appropriate instruction boundary, the CPU recognizes the SMI# and completes all pending write cycles. The CPU then asserts SMIACT# and begins saving the SMM header information to the SMM address space. SMIACT# remains asserted until after execution of a RSM instruction. Figure 3.14 illustrates the functional timing of the SMIACT# signal. To facilitate using SMI# to power manage I/O peripherals, the ST6x86 CPU implements a feature called I/O trapping. If the current bus cycle is an I/O cycle and SMI# is asserted a minimum of three clocks prior to BRDY#, the CPU immediately begins execution of the SMI service routine following completion of the I/O instruction. No additional instructions are executed prior to entering the SMI service routine. I/O trap timing requirements are shown in Figure 3.15. Figure 3.14. SMIACT# Timing CLK ADS# Normal Access Normal Access Normal Access SMI Handler RDY# or BRDY# SMI# SMIACT# 1 CLK MIN 4 CLK MIN 1 CLK MIN 4 CLK MIN 1739900 3-41 ST6x86 Figure 3.15. SMM I/O Trap Timing T1 I/O C ycle (Read or W rite) T2 T2 T2 T2 T2 C LK Address, Byte Enables VAL ID AD S# BR D Y# SM I# 3 CL K Min. 17 36000 3.3.7 Cache Control Timing 3.3.7.1 Invalidating the Cache Using FLUSH# The FLUSH# input forces the CPU to write-back and invalidate the entire contents of the on-chip cache. FLUSH# is sampled at each clock edge, latched internally and then recognized internally at the next instruction boundary. Once FLUSH# is recognized, the CPU issues a series of burst write cycles to write-back any "modified" cache lines. The cache lines are invalidated as they are written back. Following completion of the write-back cycles, the CPU issues a flush acknowledge special bus cycle. The latency between when FLUSH# occurs and when the cache invalidation actually completes varies depending on: (1) the state of the processor when FLUSH# is asserted, (2) the number of modified cache lines, (3) the number of wait states inserted during the write-back cycles. Figure 3.16 illustrates the sequence of events that occur on the bus in response to a FLUSH# request. 3-42 ST6x86 Figure 3.16. Cache Invalidation Using FLUSH# CLK ADS# BRDY# Address Write-Back Cycle 0000 0004h FLUSH# Wait for Processor to Complete Current Instruction 3-43 Write-Back of all Modified Lines in Internal Cache Flush Acknowledge Special Cycle 1736100 ST6x86 3.3.7.2 EWBE# Timing During memory and I/O write cycles, the ST6x86 CPU samples the external write buffer empty (EWBE#) input. If EWBE# is negated, the CPU does not write any data to "exclusive" or "modified" internal cache lines. After sampling EWBE# negated, the CPU continues to sample EWBE# at each clock edge until it asserts. Once EWBE# is asserted, all internal cache writes are allowed. Through use of this signal, the external system may enforce strong write ordering when external write buffers are used. EWBE# functional timing is shown in Figure 3.17. Figure 3.17. External Write Buffer Empty (EWBE#) Timing T1 T2 CLK ADS# W /R # DA TA OU T E W B E# BRDY# W rite Cycle: EW BE # sampled w ith each BRDY#. N o w rite s to E or M -Sta te lines tha t h it in the inte rna l cach e. EW B E# sam pled a t e ach clo ck edge . W rites to E or M-S tate lines that hit in the internal cache can complete. 1737800 3-44 Functional Timing ST6x86 3.3.8 Bus Arbitration 3.3.8.1 HOLD and HLDA An external bus master can take control of the CPU's bus using either the HOLD/HLDA handshake signals or the back-off (BOFF#) input. Both mechanisms force the ST6x86 CPU to enter the bus hold state. Using the HOLD/HLDA handshake, an external bus master requests control of the CPU's bus by asserting the HOLD signal. In response to an active HOLD signal, the CPU completes all outstanding bus cycles, enters the bus hold state by floating the bus, and asserts the HLDA output. The CPU remains in the bus hold state until HOLD is negated. Figures 3-18, 3-19 (Page 3-47) and 3-20 (Page 3-48) illustrate the timing associated with requesting HOLD during an idle bus, during a non-pipelined bus cycle and during a pipelined bus cycle, respectively. Figure 3.18. Requesting Hold from an Idle Bus Ti Ti Ti Ti Ti T1 T2 CLK ADS# VALID Address HOLD HLDA Min One Clock 3-45 MIN Zero Clocks 1736200 Functional Timing ST6x86 Figure 3.19. Requesting Hold During a Non-Pipelined Bus Cycle T1 T2 T2 Ti Ti Ti CLK ADS# Address VALID RDY# HOLD HLDA 1736300 3-46 Functional Timing ST6x86 Figure 3.20. Requesting Hold During a Pipelined Bus Cycle Ti T1 T2 T2 T12 T2 T2 Ti Ti Ti CLK ADS# Address, AP CYCLE 1 CYCLE 2 VALID 1 VALID 2 NA# BRDY# DATA, DP IN 1 IN 2 HOLD HLDA 1736400 3-47 Functional Timing ST6x86 3.3.8.2 Back-Off Timing An external bus master requests immediate control of the CPU's bus by asserting the back-off (BOFF#) input. The CPU samples BOFF# at each clock edge and responds by floating the bus in the next clock cycle as shown in Figure 3.21. The CPU remains in the bus hold state until BOFF# is negated. If the assertion of BOFF# interrupts a bus cycle, the bus cycle is restarted in its entirety following the negation of BOFF#. If KEN# was sampled by the processor before the cycle was aborted, it must be returned with the same value during the restarted cycle. The state of WB/WT# may be changed during the restarted cycle. If BOFF# and BRDY# are active at the same clock edge, the CPU ignores BRDY#. Any data returned to the CPU with the BRDY# is also ignored. If BOFF# interrupts a burst read cycle, the CPU does not cache any data returned prior to BOFF#. However, this data may be used for internal CPU execution. Figure 3.21. Back-Off Timing 3-48 Functional Timing ST6x86 3.3.9 Cache Inquiry Cycles Cache inquiry cycles are issued by the system with the CPU in either a bus hold or address hold state. Bus hold is requested by asserting either HOLD or BOFF#, and address hold is requested by asserting AHOLD. The system initiates the cache inquiry cycle by asserting the EADS# input. The system must also drive the desired inquiry address on the address lines, and a valid state on the INV input. In response to the cache inquiry cycle, the CPU checks to see if the specified address is present in the internal cache. If the address is present in the cache, the CPU checks the MESI state of the cache line. If the line is in the "exclusive" or "shared" state, the CPU asserts the HIT# output and changes the cache line state to "invalid" if the INV input was sampled logic high with EADS#. 3-49 If the line is in the "modified" state, the CPU asserts both HIT# and HITM#. The CPU then issues a bus cycle request to write the modified cache line to external memory. HITM# remains asserted until the write-back bus cycle completes. No additional cache inquiry cycles are accepted while HITM# is asserted. Writeback cycles always start at burst address 0. Once the write-back cycle has completed, the CPU changes the cache line state to "invalid" if the INV input was sampled logic high, or "shared" if the INV input was sampled low. In addition to checking the cache, the CPU also snoops the internal line fill and cache write-back buffers in response to a cache inquiry cycle. The following sections describe the functional timing for cache inquiry cycles and the corresponding write-back cycles for the various types of inquiry cycles. Functional Timing ST6x86 3.3.9.1 Inquiry Cycles Using HOLD/HLDA Figure 3.22 illustrates an inquiry cycle where HOLD is used to force the CPU into a bus hold state. In this case, the system asserts HOLD and must wait for the CPU to respond with HLDA before issuing the cache inquiry cycle. To avoid address bus contention, EADS# should not be asserted until the second clock after HLDA as shown in the diagram. If the inquiry address hits on a modified cache line, HIT# and HITM# are asserted during the second clock following EADS#. Once HITM# asserts, the system must negate HOLD to allow the CPU to run the corresponding write-back cycle. The first cycle issued following negation of HLDA is the write-back bus cycle. Figure 3.22. HOLD Inquiry Cycle that Hits on a Modified Line T2 Ti Ti Ti Ti Ti Ti Ti Ti T1 T2 T2 T2 T2 Ti Ti CLK ADS# Address FromCPU To CPU Write-Back Cycle BRDY# HOLD HLDA EADS# INV VALID HIT# HITM# 1736600 3-50 Functional Timing ST6x86 3.3.9.2 Inquiry Cycles Using BOFF# Figure 3.23 illustrates an inquiry cycle where BOFF# is used to force the CPU into a bus hold state. In this case, the system asserts BOFF# and the CPU immediately relinquishes control of the bus in the next clock. To avoid address bus contention, EADS# should not be asserted until the second clock edge after BOFF# as shown in the diagram. If the inquiry address hits on a modified cache line, HIT# and HITM# are asserted during the second clock following EADS#. Once HITM# asserts, the system must negate BOFF# to allow the CPU to run the corresponding write-back cycle. The first cycle issued following negation of BOFF# is the write-back bus cycle. Figure 3.23. BOFF# Inquiry Cycle that Hits on a Modified Line T1 Ti Ti Ti Ti Ti T1 T2 T2 T2 T2 Ti Ti T1 T2 CLK ADS# Address From CPU To CPU Write-Back Cycle Cycle 1 (Restarted ) BRDY# BOFF# EADS# INV VALID HIT# HITM# 1736700 3-51 Functional Timing ST6x86 3.3.9.3 Inquiry Cycles Using AHOLD Figure 3.24 illustrates an inquiry cycle where AHOLD is used to force the CPU into an address hold state. In this case, the system asserts AHOLD and the CPU immediately floats the address bus in the next clock. To avoid address bus contention, EADS# should not be asserted until the second clock edge after AHOLD as shown in the diagram. If the inquiry address hits on a modified cache line, the CPU asserts HIT# and HITM# during the second clock following EADS#. The CPU then issues the write-back cycle even if AHOLD remains asserted. ADS# for the write-back cycle asserts two clocks after HITM# is asserted. To prevent the address bus and data bus from switching simultaneously, the system must adhere to the restrictions on negation of AHOLD as shown in Figure 3.24. Figure 3.24. AHOLD Inquiry Cycle that Hits on a Modified Line T1 T2 Ti Ti Ti Ti Ti T1 T2 T2 T2 T2 Ti T2 Ti CLK ADS# Address From CPU To CPU Write-Bac k Cycle BRDY# OUT Data, D P OUT OUT OUT AHOLD EADS# INV VALID HIT# HITM# 1736800 Re s t r i c t i o n s o n n e g a t i n g A H O L D: 1 . D u r i n g a w r i t e c y c l e , AH O L D s h o u l d n o t b e n e ga t e d i n t h e s a m e c l o c k t h a t B RD Y # i s a s s e r t e d . 2 . D u r i n g p i p e l i n ed b u s c y c l e s , A HO L D s h o u l d n o t b e n e g a t e d d u r i n g t h e T d c l o c k b e t we e n a r e a d c y c l e f o l l o w e d b y a p i p e l i n ed wr i t e c y c l e . 3 . W h i l e HI T M # is a s s e r t e d , A HO L D s h o u l d n o t b e n e g a t ed i n t h e s a m e c l o c k t h a t A D S # i s a s s e r t e d . 3-52 Functional Timing ST6x86 Figure 3.25 depicts an AHOLD inquiry cycle during a line fill. In this case, the write-back cycle occurs after the line fill is completed. At least one idle clock exists between the final BRDY# of the line fill and the ADS# for the write-back cycle. If the inquiry cycle hits on the address of the line fill that is in progress, the data from the line fill cycle is always used to complete the pending internal operation. However, the data is not placed in the cache if INV is sampled asserted with EADS#. The data is placed in the cache in a "shared" state if INV is sampled negated. Figure 3.25. AHOLD Inquiry Cycle During a Line Fill T1 T2 T2 T2 T2 T2 Ti T2 T1 T2 T2 T2 T2 Ti Ti CLK Write-Back Cycle Line Fill ADS# Address From CPU To CPU BRDY# Data, DP IN IN IN IN OUT OUT OUT OUT AHOLD EADS# INV VALID HIT# HITM# Note: If the inquiry cycle hits on the line fill in progress, the data from the line fill will be used to complete the pending internal operation. The line is not placed in the cache if INV is sampled asserted with EADS#. The line is placed in the cache in a "shared" 1736900 state if INV is sampled negated with EADS#. 3-53 Functional Timing ST6x86 During cache inquiry cycles, the CPU performs address parity checking using A31-A5 and the AP signal. The CPU checks for even parity and asserts the APCHK# output if a parity error is detected. Figure 3.26 illustrates the functional timing of the APCHK# output. Figure 3.26. APCHK# Timing Tx Tx Tx Tx Tx CLK EADS# Address To CPU AP To CPU APCHK# VALID 1737000 3-54 Functional Timing ST6x86 3.3.10 Scatter/Gather Buffer Interface The scatter/gather buffer interface signals, in conjunction with the byte enables and address hold, can be used by the system hardware to transfer data to/from a 32-bit peripheral interface bus. A 64-bit buffer resides in the CPU to assist the system in these transfers. As shown in Figure 3.27 when BHOLD is asserted the CPU floats the byte enable outputs (BE7#-BE0#) in the next clock. While BHOLD is asserted, only the byte enables are disabled. The current bus cycle remains active and can be completed in the normal fashion. The CPU continues to generate additional bus cycles while BHOLD is asserted, so BHOLD should only be asserted while AHOLD is asserted. Figure 3.27 also illustrates DHOLD timing. DHOLD forces the CPU to float the data and data parity buses in the next clock. While DHOLD is asserted, the current bus cycle remains active and additional bus cycles may be generated by the CPU. Figure 3.27. BHOLD and DHOLD Timing CLK DHOLD D63-D0 D1 D1 BHOLD BE7#-BE0# BEx BEx 1737100 3-55 Functional Timing ST6x86 Figure 3.28 and Figure 3.29 illustrate CPU read and write cycles that access a 32-bit device using the scatter/gather buffer. Figure 3.28. CPU Upper Byte Read from 32-Bit Bus Using Scatter/Gather CLK ADS# LBA# D63-D32 D1 (to S/G Buffer)) D2 D31-D0 D1 D1 BE7#-BE0# BE# = 0xBF From CPU BE# = 0xBF To CPU BE# From CPU BHOLD BRDY# Controller detects CPU read of upper byte to 32-bit peripheral bus via LBA# and BE#s. BHOLD is asserted in order to issue the MUX command via the BE#s (BE# = 0xBFh). The clock following BE# = 0xBFh, the CPU maps D31-D0 to D63-D32 of the scatter/gather buffer to read byte 6. 1737200 3-56 Functional Timing ST6x86 Figure 3.29. CPU Upper Byte Write to 32-Bit Bus Using Scatter/Gather CLK ADS# LBA# D63-D32 D2 D2 D31-D0 D2 (from S/G Buffer) D1 BE7#-BE0# BE# = 0xBF From CPU BE# From CPU BHOLD BRDY# During the clock following BE# = 0xBFh, Controller detects CPU write of BHOLD need not be asserted the CPU maps D63-D32 to D31-D0 upper byte to 32-bit peripheral bus because the CPU automatically via LBA# and BE#s. maps D63-D32 to D31-D0 when for transfer on 32-bit bus. 1737300 LBA# asserted and BE3-BE0 = Fh. 3-57 Functional Timing ST6x86 Figures 3.30 and 3.31 illustrate bus master reads and writes between a 32-bit device and 64-bit main memory. The CPU bus must be idle when a bus master initiates a scatter/gather cycle. Figure 3.30. Bus Master Read from 64-Bit Memory to 32-Bit Bus CLK D63-D32 D2 from Memory D31-D0 D1 from Memory BE7#-BE0# BE#=0x00 BE#=0xFF D2 D2 from S/G Buffer BE#=0x0F D1 from S/G Buffer BE#=0xF0 BHOLD DHOLD QDUMP# Controller asserts BHOLD and DHOLD BE#=0x00 causes the 64-bit data from memory to be written into CPU's buffer. to transfer data from memory to CPU's internal scatter/gather buffer. The controller negates BE# (BE=0xFF) so that data in the scatter/gathe r buffer is not corrupted and tristates the data bus to allow for a scatter operation to proceed. The controller negates DHOLD and asserts BE#=0x0F followed by 0xF0 along with QDUMP# to transfer the upper word (D2=D63-D32) followed by the lower word (D1=D31-D0), 1737400 respectively, to the 32-bit bus. 3-58 Functional Timing ST6x86 Figure 3.31. Bus Master Write to 64-Bit Memory from 32-Bit Bus CLK D32-D63 D2 (to Memory) D0-D31 BE0-BE7 D2 (to S/G Buffer) BE#=0x0F BE#=0xF0 D1 (toS/G Buffer) BE#=FFh D1 (to Memory) BE#=0x00 BHOLD DHOLD QDUMP# Controller asserts BHOLD and DHOLD to transfer data from the 32-bit bus to CPU's internal scatter/gat her buffer. 3-59 The MUX command along with a word write is issued by the controller to write D1 from the 32-bit bus into D63-D32 of CPU's buffer followed by a 2nd word write to D31-D0. The controller relinquishes control of CPU data bus, negates DHOLD and asserts QDUMP# to dump the 64-bit data on to the CPU local bus for transfer to memory. 1737500 Functional Timing ST6x86 3.3.11 Power Management Interface SUSP# Initiated Suspend Mode The ST6x86 CPU enters suspend mode when the SUSP# input is asserted and execution of the current instruction, any pending decoded instructions and associated bus cycles are completed. A stop grant bus cycle is then issued and the SUSPA# output is asserted. The CPU responds to SUSP# and asserts SUSPA# only if the SUSP bit is set in the CCR2 configuration register. SUSP# is sampled (Figure 3.32) on the rising edge of CLK. SUSP# must meet specified setup and hold times to be recognized at a particular CLK edge. The time from assertion of SUSP# to activation of SUSPA# varies depending on which instructions were decoded prior to assertion of SUSP#. The minimum time from SUSP# sampled active to SUSPA# asserted is eight CLKs. As a maximum, the CPU may execute up to two instructions and associated bus cycles prior to asserting SUSPA#. The time required for the CPU to deactivate SUSPA# once SUSP# has been sampled inactive is five CLKs. If the CPU is in a hold acknowledge state and SUSP# is asserted, the CPU may or may not enter suspend mode depending on the state of the CPU internal execution pipeline. If the CPU is in a SUSP# initiated suspend mode, one occurrence of NMI, INTR and SMI# is stored for execution once suspend mode is exited. The ST6x86 CPU also recognizes and acknowledges the HOLD, AHOLD, BOFF# and FLUSH# signals while in suspend mode. Figure 3.32. SUSP# Initiated Suspend Mode Tx Tx Ti Ti Ti Ti Tx CLK SUSP# 8 CLKs 5 CLKs SUSPA# 1737600 3-60 Functional Timing ST6x86 HALT Initiated Suspend Mode The CPU also enters suspend mode as a result of executing a HALT instruction if the HALT bit in CCR2 is set. The SUSPA# output is asserted no later than 40 CLKs following BRDY# sampled active for the HALT bus cycle as shown in Figure 3.33. Suspend mode is then exited upon recognition of an NMI, an unmasked INTR or an SMI#. SUSPA# is deactivated 10 CLKs after sampling of an active interrupt. Figure 3.33. HALT Initiated Suspend Mode Non-Pipelined HALT T1 T2 Ti Ti Ti Ti Ti Ti CLK ADS# M/IO#, BE(0, 1, 3-7)#, W/R# A3-A31, BE#2, D/C#, IO# BRDY# 10 CLKs INTR, NMI 40 CLKs (Max) SUSPA# 1737700 3-61 ST6x86 Stopping the Input Clock Once the CPU has entered suspend mode, the input clock (CLK) can be stopped and restarted without loss of any internal CPU data. The CLK input can be stopped at either a logic high or logic low state. The CPU remains suspended until CLK is restarted and suspend mode is exited as described earlier. While the CLK is stopped, the CPU can no longer sample and respond to any input stimulus. Figure 3.34 illustrates the recommended sequence for stopping the CLK using SUSP# to initiate suspend mode. CLK may be started prior to or following negation of the SUSP# input. The system must allow sufficient time for the CPU's internal PLL to lock to the desired frequency before exiting suspend mode. Figure 3.34. Stopping CLK During Suspend Mode Tx Tx Tx Tx CLK SUSP# SUSPA# 1731901 3-62 ST6x86 4.0 ELECTRICAL SPECIFICATIONS 4.1.2 Pull-Up/Pull-Down Resistors 4.1 Electrical Connections Table 4.1 lists the input pins that are internally connected to pull-up and pull-down resistors. The pull-up resistors are connected to VCC and the pull-down resistors are connected to VSS. When unused, these inputs do not require connection to external pull-up or pull-down resistors. The SUSP# pin is unique in that it is connected to a pull-up resistor only when SUSP# is not asserted. This section provides information on electrical connections, absolute maximum ratings, recommended operating conditions, DC characteristics, and AC characteristics. All voltage values in Electrical Specifications are measured with respect to VSS unless otherwise noted. 4.1.1 Power and Ground Connections and Decoupling Testing and operating the ST6x86 CPU requires the use of standard high frequency techniques to reduce parasitic effects. The high clock frequencies used in the ST6x86 CPU and its output buffer circuits can cause transient power surges when several output buffers switch output levels simultaneously. These effects can be minimized by filtering the DC power leads with low-inductance decoupling capacitors, using low impedance wiring, and by utilizing all of the VCC and GND pins. The ST6x86 CPU contains 296 pins with 53 pins connected to VCC and 53 connected to VSS (ground). Table 4.1. Pins Connected to Internal Pull-Up and Pull Down Resistors SIGNAL PIN NO. RESISTOR BRDYC# Y3 20-k pull-up CLKMUL Y33 20-k pull-down QDUMP# AL7 SMI# AB34 20-k pull-up SUSP# Y34 TCK M34 TDI N35 TMS P34 TRST# Q33 Reserved J33 Reserved W35 Reserved Y35 Reserved AN35 20-k pull-up (see text) 20-k pull-up 20-k pull-down 4-1 Absolute Maximum Ratings ST6x86 4.1.3 4.2 Absolute Maximum Ratings Unused Input Pins All inputs not used by the system designer and not listed in Table 4.1 should be connected either to ground or to VCC. Connect active-high inputs to ground through a 20 k ( 10%) pull-down resistor and active-low inputs to VCC through a 20 k ( 10%) pull-up resistor to prevent possible spurious operation. 4.1.4 NC and Reserved Pins Pins designated NC have no internal connections. Pins designated RESV or RESERVED should be left disconnected. Connecting a reserved pin to a pull-up resistor, pull-down resistor, or an active signal could cause unexpected results and possible circuit malfunctions. The following table lists absolute maximum ratings for the ST6x86 CPU microprocessors. Stresses beyond those listed under Table 4.2 limits may cause permanent damage to the device. These are stress ratings only and do not imply that operation under any conditions other than those listed under "Recommended Operating Conditions" Table 4.3 is possible. Exposure to conditions beyond Table 4.2 may (1) reduce device reliability and (2) result in premature failure even when there is no immediately apparent sign of failure. Prolonged exposure to conditions at or near the absolute maximum ratings may also result in reduced useful life and reliability. Table 4.2. Absolute Maximum Ratings PARAMETER Operating Case Temperature Storage Temperature Supply Voltage, VCC Voltage On Any Pin Input Clamp Current, IIK Output Clamp Current, IOK 4-2 MIN MAX UNITS -65 -66 -0.5 -0.5 110 150 4.0 VCC +0.5 10 25 C C V V mA mA NOTES Power Applied Power Applied Power Applied Recommended Operating Conditions ST6x86 4.3 Recommended Operating Conditions Table 4.3 presents the recommended operating conditions for the ST6x86 CPU device. Table 4.3. Recommended Operating Conditions PARAMETER MIN MAX UNITS 0 70 C 3.15 3.7 V VIH High-Level Input Voltage 2.0 5.5 V VIL Low-Level Input Voltage -0.3 0.8 V IOH High-Level Output Current All outputs except A20-A3 and W/R# A20-A3 and W/R# -1.0 -2.0 mA VO=VOH(MIN) IOL Low-Level Output Current All outputs except A20-A3 and W/R# A20-A3 and W/R# 5.0 10.0 mA VO=VOL(MAX} TC Operating Case Temperature VCC Supply Voltage 4-3 NOTES Power Applied DC Characteristics ST6x86 4.4 DC Characteristics Table 4.4. DC Characteristics (at Recommended Operating Conditions) PARAMETER VOL TYP Output Low Voltage IOL = 5 mA VOH Output High Voltage IOH = -1 mA II MIN MAX UNITS 0.4 V 2.4 NOTES V Input Leakage Current For all pins except those listed in Table 4-1. 15 A 0 < VIN < VCC Input Leakage Current For all pins with internal pull-downs. 200 A VIH = 2.4 V See Table 4-1. IIL Input Leakage Current For all pins with internal pull-ups. -400 A VIL = 0.45 V See Table 4-1. ICC Active ICC 80 MHz 100 MHz 110 MHz 120 MHz 133 MHz 3.9 4.5 4.8 5.1 5.5 4.7 5.4 5.8 6.1 6.6 A ICCSM Suspend Mode ICC 80 MHz 100 MHz 110 MHz 120 MHz 133 MHz 43 48 50 51 54 75 80 83 105 115 ICCSS Standby ICC 0 MHz (Suspended/CLK Stopped) 35 75 mA Note 4,5 CINInput Capacitance 15 pF f = 1 MHz, Note 2 COUTOutput Capacitance 20 pF f = 1 MHz, Note 2 CIOI/O Capacitance 25 pF f = 1 MHz, Note 2 CCLKCLK Capacitance 15 pF f = 1 MHz, Note 2 IIH mA Note 1, 5 Note 1, 3, 5 Notes: 1. Frequency (MHz) ratings refer to the internal clock frequency. 2. Not 100% tested. 3. All inputs at 0.4 or VCC - 0.4 (CMOS levels). All inputs held static except clock and all outputs unloaded (static IOUT = 0 mA). 4. All inputs at 0.4 or VCC - 0.4 (CMOS levels). All inputs held static and all outputs unloaded (static IOUT = 0 mA). 5. Typical, measured at VCC = 3.3 V 4-4 AC Characteristics ST6x86 4.5 AC Characteristics Tables 4-6 through 4-11 list the AC characteristics including output delays, input setup requirements, input hold requirements and output float delays. These measurements are based on the measurement points identified in Figure 4.1 and Figure 4.2. The rising clock edge reference level VREF, and other reference levels are shown in Table 4.5. Input or output signals must cross these levels during testing. Figure 4.1 shows output delay (A and B) and input setup and hold times (C and D). Input setup and hold times (C and D) are specified minimums, defining the smallest acceptable sampling window a synchronous input signal must be stable for correct operation. 4-5 AC Characteristics ST6x86 Figure 4.1. Drive Level and Measurement Points for Switching Characteristics Tx VI H D VREF CL K : VREF VI L D A B OUTPUTS: MAX MIN Valid Output n VREF VREF C VIHD INPUTS : Valid Output n+1 D Valid VREF Input VR EF VILD LE GEND: A - Maximum Output Delay Specification B - Minimum Output Delay Specification C - Minimum Input Setup Specification 1709406 D - Minimum Input Hold Specification Table 4.5. Drive Level and Measurement Points for Switching Characteristics SYMBOL VOLTAGE (Volts) VREF 1.5 VIHD 2.3 VILD 0 Note: Refer to Figure 4-1. 4-6 AC Characteristics ST6x86 Table 4.6. Clock Specifications Tcase = 0 C to 70 C, See Figure 4.2 40-MHz BUS 50-MHz BUS 55-MHz BUS 60-MHz BUS 66-MHz BUS SYMBOL PARAMETER UNITS MIN CLK Frequency MAX MIN 40 MAX MIN 50 25 MAX MIN 55 20 MIN 60 CLK Period Stability T3 CLK High Time T4 CLK Low Time 9 T5 CLK Fall Time 0.15 2 0.15 2 0.15 1.5 0.15 1.5 0.15 1.5 ns T6 CLK Rise Time 0.15 2 0.15 2 0.15 1.5 0.15 1.5 0.15 1.5 ns 9 7 4.0 7 250 4.0 4.0 15.0 MHz CLK Period 250 33.33 66.6 T2 250 16.67 MAX T1 250 18 MAX 30.0 ns 250 ps 4.0 4.0 ns 4.0 ns Figure 4.2. CLK Timing and Measu r em en t Points T1 T3 V IH(MIN) V REF V IL(MAX) CLK T6 T4 T5 1740502 4-7 ST6x86 Table 4.7. Output Valid Delays C L=50 pF, Tcase = 0 C to 70 C, See Figure 4.3 40-MHz BUS 50-MHz BUS 55-MHz BUS 60-MHz BUS 66-MHz BUS SYMBOL PARAMETER UNITS MIN MAX MIN MAX MIN MAX MIN MAX MIN MAX T7 A31-A3,BE7#-BE0#, CACHE#, D/C#, LBA#, LOCK#,PCD,PWT, SCYC, SMIACT#, W/R# 3 14 1.0 12 1.0 7.0 1.0 8.0 1.0 7.0 ns T7b ADS#, M/IO# 3 14 1.0 12 1.0 7.5 1.0 7.5 1.0 6.0 ns T8 ADSC# 3 14 1.0 12 1.0 7.0 1.0 8.0 1.0 7.0 ns T9 AP 3 14 1.0 12 1.0 8.5 1.0 8.5 1.0 8.5 ns T10 APCHK#,PCHK#, FERR# 3 16 1.0 14 1.0 8.3 1.0 7.0 1.0 7.0 ns T11 D63-D0,DP7-DP0 (Write) 3 14 1.3 12 1.3 9.0 1.3 9.0 1.3 7.5 ns T12a HIT# 3 14 1.0 12 1.0 8.0 1.0 8.0 1.0 8.0 ns T12b HITM# 3 14 1.1 12 1.1 7.0 1.1 7.0 1.1 6.0 ns T13 BREQ, HLDA 3 14 1.0 12 1.0 8.0 1.0 8.0 1.0 8.0 ns T14 SUSPA# 3 16 1.0 14 1.0 8.0 1.0 8.0 1.0 8.0 ns Figure 4.3. Output Valid Delay Timing Tx Tx Tx Tx CLK MIN MAX T7 - T14 OUTPUTS VALID n VALID n+1 1740900 4-8 ST6x86 Table 4.8. Output Float Delays C L=50 pFcase = 0 C to 70 C, See Figure 4.4 40-MHz BUS 50-MHz BUS 55-MHz BUS 60-MHz BUS 66-MHz BUS SYMBOL PARAMETER UNITS MIN MAX MIN MAX MIN MAX MIN MAX MIN MAX T15 A31-A3,ADS#, BE7#-BE0#, BREQ,CACHE#, D/C#,LBA#, LOCK#,M/IO#, PCD,PWT, SCYC,SMIACT#, W/R# 19 16 10.0 10.0 10.0 ns T16 AP 19 16 10.0 10.0 10.0 ns T17 D63-D0,DP7-DP0 (Write) 19 16 10.0 10.0 10.0 ns Figure 4.4. Output Float Delay Timing Tx Tx Tx Tx CLK T15 - T 17 OUTPUTS MIN MAX VALID 1741000 4-9 ST6x86 Table 4.9. Input Setup Times Tcase = 0 C to 70 C, See Figure 4.5 SYMBOL PARAMETER 40-MHz BUS 50-MHz BUS 55-MHz BUS 60-MHz BUS 66-MHz BUS MIN MIN MIN MIN MIN UNITS T18 A20M#, FLUSH#, IGNNE#, SUSP# 5.0 5.0 5.0 5.0 5.0 ns T19 AHOLD, BHOLD, DHOLD, HOLD 5.0 5.0 5.0 5.0 5.0 ns T20 BRDY# 5.0 5.0 5.0 5.0 5.0 ns T21 BRDYC# 5.0 5.0 5.0 5.0 5.0 ns T22 A31-A3, AP, BE7#-BE0#, 5.0 5.0 5.0 5.0 5.0 ns T22a D63-D0(Read),DP7-DP0 (Read) 3.8 3.8 3.8 3.0 3.0 ns T23 EADS#, INV 5.0 5.0 5.0 5.0 5.0 ns T24 INTR, NMI, RESET, SMI#, WM_RST 5.0 5.0 5.0 5.0 5.0 ns T25 EWBE#,KEN#,NA#, WB/WT# 5.0 5.0 4.5 4.5 4.5 ns T26 QDUMP# 5.0 5.0 5.0 5.0 5.0 ns 40-MHz BUS 50-MHz BUS 55-MHz BUS 60-MHz BUS 66-MHz BUS MIN MIN MIN MIN MIN BOFF#, Table 4.10. Input Hold Times Tcase = 0 Cto 70 C, See Figure 4.5 SYMBOL 4-10 PARAMETER UNITS T27 A20M#,FLUSH#,IGNNE#, SUSP# 3.0 2.0 1.0 1.0 1.0 ns T28 AHOLD,BHOLD,BOFF#, DHOLD, HOLD 3.0 2.0 1.0 1.0 1.0 ns T29 BRDY# 3.0 2.0 1.0 1.0 1.0 ns T30 BRDYC# 3.0 2.0 1.0 1.0 1.0 ns T31a A31-A3,AP, BE7#-BE0# 3.0 2.0 1.0 1.0 1.0 ns T31b D63-D0,DP7-DP0(Read) 3.0 2.0 2.0 2.0 2.0 ns T32 EADS#, INV 3.0 2.0 1.0 1.0 1.0 ns T33 INTR,NMI,RESET,SMI#, WM_RST 3.0 2.0 1.0 1.0 1.0 ns T34 EWBE#,KEN#,NA#,WB/WT# 3.0 2.0 1.0 1.0 1.0 ns T35 QDUMP# 3.0 2.0 1.0 1.0 1.0 ns ST6x86 Figure 4.5. Input Setup and Hold Timing Tx Tx Tx Tx CLK T18 - T26 SETUP T27 - T35 HOLD 1740600 4-11 ST6x86 Table 4.11. JTAG AC Specifications ALL BUS FREQUENCIES SYMBOL PARAMETER UNITS MIN TCK Frequency (MHz) FIGURE MAX 20 ns T36 TCK Period 50 ns 4-6 T37 TCK High Time 25 ns 4-6 T38 TCK Low Time 25 ns 4-6 T39 TCK Rise Time 5 ns 4-6 T40 TCK Fall Time 5 ns 4-6 T41 TDO Valid Delay 3 20 ns 4-7 T42 Non-test Outputs Valid Delay 3 20 ns 4-7 T43 TDO Float Delay 25 ns 4-7 T44 Non-test Outputs Float Delay 25 ns 4-7 T45 TRST# Pulse Width 40 ns 4-8 T46 TDI, TMS Setup Time 20 ns 4-7 T47 Non-test Inputs Setup Time 20 ns 4-7 T48 TDI, TMS Hold Time 13 ns 4-7 T49 Non-test Inputs Hold Time 13 ns 4-7 Figure 4.6. TCK Timing and M easu re men t Points T36 T37 VIH VREF VIL TCK T39 T38 T40 1741102 4-12 ST6x86 Figure 4.7. JTAG Test Timings 1.5 V TC K T46 T48 TDI TMS T41 T43 T42 T44 TDO OU TPU T SIGNALS T47 T49 INPUT SIGNALS 1740400 Figure 4.8. Test Reset Timing T45 TRST# 1.5 V 1741200 4-13 ST6x86 5.0 MECHANICAL SPECIFICATIONS are listed by signal name in Table 5.1 (Page 2) and by pin number in Table 5.2 (Page 3). Dimensions are shown in Table 5.2 (Page 3) and Table 5.3 (Page 5). 5.1 296-Pin SPGA Package The pin assignments for the 6x86 CPUin a 296-pin SPGA package are shown in Figure 5.1. The pins Figure 5.1. 296-Pin SPGA Package Pin Assignments 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 AN VSS AM AL VSS VSS AE AC AA Z Y U S Q VCC VCC L J VCC VCC VCC E VCC A 6x86 CPU TOP VIEW MI/O# RESV NC D60 RESV D3 D6 DP5 D7 D8 D10 D11 D12 D14 D13 D15 D42 DP1 D17 D16 D18 D20 D22 D19 D23 D21 D26 D24 VSS VCC DP2 VSS VCC D28 D25 VSS VCC D30 D27 VSS VCC DP3 D29 VSS VCC D33 D31 VSS VCC D35 D37 D32 VSS VCC D34 VSS VCC D39 D36 VSS VCC D46 D40 D44 D38 VSS VCC DP4 VSS VCC 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 VCC 8 7 D52 D43 6 5 J H VCC G D54 E F D NC C B NC 3 L K A NC 4 N M D50 D47 D41 P DP6 D48 VSS R VCC D55 D45 Q VSS D51 D49 VCC VCC D57 D53 D5 T S VSS D56 U VCC VCC D61 D58 V VCC VSS D59 NC X VSS D63 D62 VCC W VSS RESV DP7 VCC VSS RESV FERR# Y VSS CACHE# LBA# VCC VSS AHOLD INV D0 DP0 NC BRDY# AA Z VSS BRDYC# AC AB VCC KEN# EWBE# TDO D1 D9 B NA# AE AD VCC VSS BOFF# AG AF VSS RESV AJ AH VCC WB/WT# RESV TCK D4 BREQ VSS HOL D TMS D2 AK VSS NC TRST# VSS AL NC NC NC VCC NC AM AP HLDA APCHK# BHOLD VSS PWT PCHK# VCC VSS AN NC PCD SMIACT# VCC SUSP# TDI 1 NC D/C# RESV VSS G F C HIT# NC NC VCC D A20M# 2 EADS# ADSC# BE0# QDUMP# HITM# BE1# DHOLD RESV VSS H BE2# BE3# 3 NC W/R# SMI# VSS VCC K BE4# BE5# FLUSH# VSS NMI VSS M BE6# BE7# VCC VSS ADS# RESV SUSPA# VSS P N SCYC CLK VCC VSS 4 INTR VSS R RESV RESET VCC VSS RESV CLKMUL VCC T A20 A19 VCC VSS IGNNE#WM_RST VCC V A18 A17 VCC VSS A23 VSS X W A16 A15 VCC VSS A27 NC VSS VCC A14 A13 VCC VSS A21 VSS AB A12 A9 VCC VSS 5 LO CK# NC VCC VCC VSS 6 A31 A24 VCC VCC VSS 7 A26 VSS AD A5 A25 VCC VCC VSS A11 A29 A22 AF A10 A8 A7 A28 AH AG A6 A4 A3 AK AJ RESV A30 8 2 1 1740200 5-1 296-Pin SPGA Package ST6x86 Table 5.1. 296-Pin SPGA Package Signal Names Sorted by Pin Number Pin Signal Pin Signal Pin Signal Pin Signal Pin Signal Pin Signal A3 NC C29 D21 J35 D2 U35 Vss AE35 NC AL21 A20 A5 D41 C31 D17 J37 Vcc U37 Vcc AE37 Vcc AL23 A18 A7 Vcc C33 D14 K2 Vss V2 Vss AF2 Vss AL25 A16 A9 Vcc C35 D10 K4 D59 V4 AHOLD AF4 PCHK# AL27 A14 A11 Vcc C37 D9 K34 D0 V34 SUSP# AF34 A21 AL29 A12 A13 Vcc D2 D50 K36 Vss V36 Vss AF36 Vss AL31 A11 A15 Vcc D4 D48 L1 Vcc W1 Vcc AG1 Vcc AL33 A7 A17 Vcc D6 D44 L3 D61 W3 EWBE# AG3 SMIACT# AL35 A3 A19 Vcc D8 D40 L5 D60 W5 KEN# AG5 PCD AL37 Vss A21 Vcc D10 D39 L33 Vcc W33 SUSPA# AG33 A27 AM2 ADSC# A23 Vcc D12 D37 L35 NC W35 Reserved AG35 A24 AM4 EADS# A25 Vcc D14 D35 L37 Vcc W37 Vcc AG37 Vcc AM6 W/R# A27 Vcc D16 D33 M2 Vss X2 Vss AH2 Vss AM8 Vss A29 Vcc D18 DP3 M4 D62 X4 BRDY# AH4 LOCK# AM10 Vss A31 D22 D20 D30 M34 TCK X34 Reserved AH34 A26 AM12 Vss A33 D18 D22 D28 M36 Vss X36 Vss AH36 A22 AM14 Vss A35 D15 D24 D26 N1 Vcc Y1 Vcc AJ1 BREQ AM16 Vss A37 NC D26 D23 N3 D63 Y3 BRDYC# AJ3 HLDA AM18 Vss B2 NC D28 D19 N5 DP7 Y5 NA# AJ5 ADS# AM20 Vss B4 D43 D30 DP1 N33 TDO Y33 CLKMUL AJ33 A31 AM22 Vss B6 Vss D32 D12 N35 TDI Y35 Reserved AJ35 A25 AM24 Vss B8 Vss D34 D8 N37 Vcc Y37 Vcc AJ37 Vss AM26 Vss B10 Vss D36 DP0 P2 Vss Z2 Vss AK2 AP AM28 Vss B12 Vss E1 D54 P4 NC Z4 BOFF# AK4 D/C# AM30 Vss B14 Vss E3 D52 P34 TMS Z34 NC AK6 HIT# AM32 A8 B16 Vss E5 D49 P36 Vss Z36 Vss AK8 A20M# AM34 A4 B18 Vss E7 D46 Q1 Vcc AA1 Vcc AK10 BE1# AM36 A30 B20 Vss E9 D42 Q3 Reserved AA3 Reserved AK12 BE3# AN1 NC B22 Vss E33 D7 Q5 FERR# AA5 WB/WT# AK14 BE5# AN3 NC B24 Vss E35 D6 Q33 TRST# AA33 WM_RST AK16 BE7# AN5 NC B26 Vss E37 Vcc Q35 NC AA35 IGNNE# AK18 CLK AN7 FLUSH# B28 Vss F2 DP6 Q37 Vcc AA37 Vcc AK20 RESET AN9 Vcc B30 D20 F4 D51 R2 Vss AB2 Vss AK22 A19 AN11 Vcc B32 D16 F6 DP5 R4 Reserved AB4 HOLD AK24 A17 AN13 Vcc B34 D13 F34 D5 R34 BHOLD AB34 SMI# AK26 A15 AN15 Vcc B36 D11 F36 D4 R36 Vss AB36 Vss AK28 A13 AN17 Vcc C1 NC G1 Vcc S1 Vcc AC1 Vcc AK30 A9 AN19 Vcc C3 D47 G3 D55 S3 Reserved AC3 Reserved AK32 A5 AN21 Vcc C5 D45 G5 D53 S5 LBA# AC5 NC AK34 A29 AN23 Vcc C7 DP4 G33 D3 S33 Reserved AC33 NMI AK36 A28 AN25 Vcc C9 D38 G35 D1 S35 DHOLD AC35 NC AL1 NC AN27 Vcc C11 D36 G37 Vcc S37 Vcc AC37 Vcc AL3 PWT AN29 Vcc C13 D34 H2 Vss T2 Vss AD2 Vss AL5 HITM# AN31 A10 C15 D32 H4 D56 T4 MI/O# AD4 NC AL7 QDUMP# AN33 A6 C17 D31 H34 NC T34 Vcc AD34 INTR AL9 BE0# AN35 Reserved C19 D29 H36 Vss T36 Vss AD36 Vss AL11 BE2# AN37 Vss C21 D27 J1 Vcc U1 Vcc AE1 Vcc AL13 BE4# C23 D25 J3 D57 U3 CACHE# AE3 NC AL15 BE6# C25 DP2 J5 D58 U5 INV AE5 APCHK# AL17 SCYC J33 Reserve d U33 Vcc AE33 A23 AL19 Reserved C27 D24 Note: Reserved pins are reserved for future use by SGS-Thomson only. Pins marked NC are not internally connected. 5-2 269-Pin SPGA Package ST6x86 Table 5.2. 296-Pin SPGA Package Pin Numbers Sorted by Signal Name Signal Pin Signal Pin Signal Pin Signal Pin Signal Pin Signal Pin A3 AL35 CLKMUL Y33 D48 D4 NC AN3 Vcc AA37 Vss AM12 A4 AM34 D/C# AK4 D49 E5 NC AN5 Vcc AC1 Vss AM14 A5 AK32 D0 K34 D50 D2 NC B2 Vcc AC37 Vss AM16 A6 AN33 D1 G35 D51 F4 NC C1 Vcc AE1 Vss AM18 A7 AL33 D2 J35 D52 E3 NC H34 Vcc AE37 Vss AM20 A8 AM32 D3 G33 D53 G5 NC L35 Vcc AG1 Vss AM22 A9 AK30 D4 F36 D54 E1 NC P4 Vcc AG37 Vss AM24 A10 AN31 D5 F34 D55 G3 NC Q35 Vcc AN11 Vss AM26 A11 AL31 D6 E35 D56 H4 NC Z34 Vcc AN13 Vss AM28 A12 AL29 D7 E33 D57 J3 NMI AC33 Vcc AN15 Vss AM30 A13 AK28 D8 D34 D58 J5 PCD AG5 Vcc AN17 Vss AM8 A14 AL27 D9 C37 D59 K4 PCHK# AF4 Vcc AN19 Vss AN37 A15 AK26 D10 C35 D60 L5 PWT AL3 Vcc AN21 Vss B6 A16 AL25 D11 B36 D61 L3 QDUMP# AL7 Vcc AN23 Vss B8 A17 AK24 D12 D32 D62 M4 RESET AK20 Vcc AN25 Vss B10 A18 AL23 D13 B34 D63 N3 SCYC AL17 Vcc AN27 Vss B12 A19 AK22 D14 C33 DHOLD S35 Reserved AA3 Vcc AN29 Vss B14 A20 AL21 D15 A35 DP0 D36 Reserved AC3 Vcc AN9 Vss B16 A20M# AK8 D16 B32 DP1 D30 Reserved AL19 Vcc E37 Vss B18 A21 AF34 D17 C31 DP2 C25 Reserved AN35 Vcc G1 Vss B20 A22 AH36 D18 A33 DP3 D18 Reserved J33 Vcc G37 Vss B22 A23 AE33 D19 D28 DP4 C7 Reserved Q3 Vcc J1 Vss B24 A24 AG35 D20 B30 DP5 F6 Reserved R4 Vcc J37 Vss B26 A25 AJ35 D21 C29 DP6 F2 Reserved S3 Vcc L1 Vss B28 A26 AH34 D22 A31 DP7 N5 Reserved S33 Vcc L33 Vss H2 A27 AG33 D23 D26 EADS# AM4 Reserved W35 Vcc L37 Vss H36 A28 AK36 D24 C27 EWBE# W3 Reserved X34 Vcc N1 Vss K2 A29 AK34 D25 C23 FERR# Q5 Reserved Y35 Vcc N37 Vss K36 A30 AM36 D26 D24 FLUSH# AN7 SMI# AB34 Vcc Q1 Vss M2 A31 AJ33 D27 C21 HIT# AK6 SMIACT# AG3 Vcc Q37 Vss M36 ADS# AJ5 D28 D22 HITM# AL5 SUSP# V34 Vcc S1 Vss P2 ADSC# AM2 D29 C19 HLDA AJ3 SUSPA# W33 Vcc S37 Vss P36 AHOLD V4 D30 D20 HOLD AB4 TCK M34 Vcc T34 Vss R2 AP AK2 D31 C17 IGNNE# AA35 TDI N35 Vcc U1 Vss R36 APCHK# AE5 D32 C15 INTR AD34 TDO N33 Vcc U33 Vss T2 BE0# AL9 D33 D16 INV U5 TMS P34 Vcc U37 Vss T36 BE1# AK10 D34 C13 KEN# W5 TRST# Q33 Vcc W1 Vss U35 BE2# AL11 D35 D14 LBA# S5 Vcc A7 Vcc W37 Vss V2 BE3# AK12 D36 C11 LOCK# AH4 Vcc A9 Vcc Y1 Vss V36 BE4# AL13 D37 D12 MI/O# T4 Vcc A11 Vcc Y37 Vss X2 BE5# AK14 D38 C9 NA# Y5 Vcc A13 Vss AB2 Vss X36 BE6# AL15 D39 D10 NC A3 Vcc A15 Vss AB36 Vss Z2 BE7# AK16 D40 D8 NC A37 Vcc A17 Vss AD2 Vss Z36 BHOLD R34 D41 A5 NC AC35 Vcc A19 Vss AD36 WB/WT# AA5 BOFF# Z4 D42 E9 NC AC5 Vcc A21 Vss AF2 W/R# AM6 BRDY# X4 D43 B4 NC AD4 Vcc A23 Vss AF36 WM_RST AA33 BRDYC# Y3 D44 D6 NC AE3 Vcc A25 Vss AH2 BREQ D45 C5 NC AE35 Vcc A27 Vss AJ37 CACHE# U3 AJ1 D46 E7 NC AL1 Vcc A29 Vss AL37 CLK D47 C3 NC AN1 Vcc AA1 Vss AM10 AK18 Note: Reserved pins are reserved for future use by SGS-Thomson only. Pins marked NC are not internally connected. 5-3 296-Pin SPGA Package ST6x86 Figure 5.2. 296-Pin SPGA Package 5-4 Thermal Characteristics ST6x86 Table 5.3. 296-Pin SPGA Package Dimensions MILLIM ETERS INCHES SYMBOL MIN MAX MIN MAX A 3.91 4.70 0.154 0.185 A1 0.33 0.43 0.013 0.017 A2 2.51 3.07 0.099 0.121 B 0.43 0.51 0.017 0.020 D 49.28 49.91 1.940 1.965 D1 45.47 45.97 1.790 1.810 D2 31.50 Sq. 32.00 Sq. 1.240 Sq. 1.260 Sq. D3 33.99 34.59 1.338 1.362 D4 8.00 9.91 0.315 0.390 E1 2.41 2.67 0.095 0.105 E2 1.14 1.40 0.045 0.055 F - 0.127 Diag. - 0.005 Diag. L 3.05 3.30 0.120 0.130 0.065 0.085 N S1 296 (Pin Count) 1.65 2.16 5-5 Thermal Characteristics ST6x86 5.2 Thermal Characteristics The ST6x86 processor is designed to operate when the case temperature at the top center of the package is between 0C and 70C. The maximum die (junction) temperature, TJ MAX, and the maximum ambient temperature, TA MAX, can be calculated by substituting thermal resistance and maximum values for case or junction temperature and power dissipation in the following equations: Table 5.4 lists the junction-to-case and case-to-ambient thermal resistances for the SPGA package. 5-6 TJ= TC + (P * JC) TA = TJ - (P * JA) where: TA = Ambient temperature (C) TJ= Average junction temperature (C) TC= Case temperature at top center of package (C) P= Power dissipation (W) JC = Junction-to-case thermal resistance (C/W) JA = Junction-to-ambient thermal resistance (C/W) ST6x86 Table 5.4. Thermal Resistances for SPGA Package With and Without Heatsinks Thermal Resistance JC C/W CA C/W Laminar Air Flow (ft/min) 0 0 100 200 400 600 800 1.95 x 1.95 x 0.25 Heatsink 0.9 8.4 7.4 6.0 4.0 3.1 2.6 1.95 x 1.95 x 0.40 Heatsink 0.9 7.7 6.6 4.9 3.2 2.7 2.1 1.95 x 1.95 x 0.65 Heatsink 0.9 5.9 4.7 3.2 2.1 1.7 1.4 Without Heatsink 1.4 14.7 11.5 9.1 7.3 7.0 6.2 Notes: For a ST6x86 processor with 1.25 x 1.25 x 0.40 inch CuW heat spreader. Heatsinks are omni-directional pin aluminum alloy. Features are based on standard extrusion practices for a given height. Heatsink attachment was made with 0.006 inch of thermal grease applied between heatsink and case. Maximum air temperature is assumed to be 40 C 5-7 ST6x86 5-8 ST6x86 6.0 INSTRUCTION SET 6.1 Instruction Set Summary This section summarizes the ST6x86 CPU instruction set and provides detailed information on the instruction encodings. All instructions are listed in the CPU Instruction Set Summary Table (Table 6-20, Page 6-14), and the FPU Instruction Set Summary Table (Table 6-22, Page 6-30). These tables provide information on the instruction encoding, and the instruction clock counts for each instruction. The clock count values for both tables are based on the assumptions described in Section 6.3. Depending on the instruction, the ST6x86 CPU instructions follow the general instruction format shown in Figure 6.1. These instructions vary in length and can start at any byte address. An instruction consists of one or more bytes that can include: prefix byte(s), at least one opcode byte(s), mod r/m byte, s-i-b byte, address displacement byte(s) and immediate data byte(s). An instruction can be as short as one byte and as long as 15 bytes. If there are more than 15 bytes in the instruction a general protection fault (error code of 0) is generated. Figure 6.1. Instruction Set Format P P P P P P P P T T T T T T T T 7 0 7 0 optional prefix byte(s) (one opcode or two bytes) P = prefix bit T = opcode bit R = opcode bit or reg bit mod 76 R R R r/m ss 543 2 10 76 mod r/m byte index 54 3 s-i-b byte register and address mode specifier base 32 2 10 16 8 none address displacement (4, 2, 1 bytes, or none) 32 16 8 none immediate data (4, 2, 1 bytes, or none) 17 031 03 6-1 ST6x86 6.2 General Instruction Fields The fields in the general instruction format at the byte level are listed in Table 6.1. Table 6.1. Instruction Fields FIELD NAME DESCRIPTION WIDTH Optional Prefix Byte(s) Specifies segment register override, address and operand size, 1 or more bytes repeat elements in string instruction, LOCK# assertion. Opcode Byte(s) Identifies instruction operation. 1 or 2 bytes mod and r/m Byte Address mode specifier. 1 byte s-i-b Byte Scale factor, Index and Base fields. 1 byte Address Displacement Address displacement operand. 1, 2 or 4 bytes Immediate data Immediate data operand. 1, 2 or 4 bytes 6.2.1 Optional Prefix Bytes Prefix bytes can be placed in front of any instruction. The prefix modifies the operation of the next instruction only. When more than one prefix is used, the order is not important. There are five type of prefixes as follows: 1. Segment Override explicitly specifies which segment register an instruction will use for effective address calculation. 2. Address Size switches between 16- and 32-bit addressing. Selects the inverse of the default. 3. Operand Size switches between 16- and 32-bit operand size. Selects the inverse of the default. 4. Repeat is used with a string instruction which causes the instruction to be repeated for each element of the string. 5. Lock is used to assert the hardware LOCK# signal during execution of the instruction. 6-2 ST6x86 Table 6.2 lists the encodings for each of the available prefix bytes. Table 6.2. Instruction Prefix Summary PREFIX ENCODING DESCRIPTION ES: 26h Override segment default, use ES for memory operand CS: 2Eh Override segment default, use CS for memory operand SS: 36h Override segment default, use SS for memory operand DS: 3Eh Override segment default, use DS for memory operand FS: 64h Override segment default, use FS for memory operand GS: 65h Override segment default, use GS for memory operand Operand Size 66h Make operand size attribute the inverse of the default Address Size 67h Make address size attribute the inverse of the default LOCK F0h Assert LOCK# hardware signal. REPNE F2h Repeat the following string instruction. REP/REPE F3h Repeat the following string instruction. 6-3 ST6x86 6.2.2 Opcode Byte The opcode field specifies the operation to be performed by the instruction. The opcode field is either one or two bytes in length and may be further defined by additional bits in the mod r/m byte. Some operations have more than one opcode, each specifying a different form of the operation. Some opcodes name instruction groups. For example, opcode 80h names a group of operations that have an immediate operand and a register or memory operand. The reg field may appear in the second opcode byte or in the mod r/m byte. 6.2.2.1 w Field The 1-bit w field (Table 6.3) selects the operand size during 16 and 32 bit data operations. Table 6.3. w Field Encoding OPERAND SIZE w FIELD 16-BIT DATA OPERATIONS 32-BIT DATA OPERATIONS 0 8 Bits 8 Bits 1 16 Bits 32 Bits 6.2.2.2 d Field The d field (Table 6.4) determines which operand is taken as the source operand and which operand is taken as the destination. Table 6.4. d Field Encoding d FIELD DIRECTION OF OPERATON DESTINATION OPERAND SOURCE OPERAND 0 Register --> Register or Register --> Memory reg mod r/m or mod ss-index-base 1 Register --> Register or Memory --> Register mod r/m or mod ss-index-base reg 6-4 ST6x86 6.2.2.3 s Field The s field (Table 6.5) determines the size of the immediate data field. If the S bit is set, the immediate field of the OP code is 8-bits wide and is sign extended to match the operand size of the opcode. Table 6.5. s Field Encoding Immediate Field Size s FIELD 8-Bit Operand Size 16-Bit Operand Size 32-Bit Operand Size 0 (or not present) 8 bits 16 bits 32 bits 1 8 bits 8 bits (sign extended) 8 bits (sign extended) 6.2.2.4 eee Field The eee field (Table 6.6) is used to select the control, debug and test registers in the MOV instructions. The type of register and base registers selected by the eee field are listed in Table 6.6. The values shown in Table 6.6 are the only valid encodings for the eee bits. Table 6.6. eee Field Encoding eee FIELD 6-5 REGISTER TYPE BASE REGISTER 000 Control Register CR0 010 Control Register CR2 011 Control Register CR3 000 Debug Register DR0 001 Debug Register DR1 010 Debug Register DR2 011 Debug Register DR3 110 Debug Register DR6 111 Debug Register DR7 011 Test Register TR3 100 Test Register TR4 101 Test Register TR5 110 Test Register TR6 111 Test Register TR7 ST6x86 6.2.3 mod and r/m Byte The mod and r/m fields (Table 6.7), within the mod r/m byte, select the type of memory addressing to be used. Some instructions use a fixed addressing mode (e.g., PUSH or POP) and therefore, these fields are not present. Table 6.7 lists the addressing method when 16-bit addressing is used and a mod r/m byte is present. Some mod r/m field encodings are dependent on the w field and are shown in Table 6.8(Page 6-7). Table 6.7. mod r/m Field Encoding mod and r/m fields 32-BIT ADDRESS MODE with mod r/m Byte and No s-i-b Byte Present 16-BIT ADDRESS MODE with mod r/m Byte 00 000 DS:[BX+SI] DS:[EAX] 00 001 DS:[BX+DI] DS:[ECX] 00 010 DS:[BP+SI] DS:[EDX] 00 011 DS:[BP+DI] DS:[EBX] 00 100 DS:[SI] s-i-b is present (See 6.2.4) 00 101 DS:[DI] DS:[d32] 00 110 DS:[d16] DS:[ESI] 00 111 DS:[BX] DS:[EDI] 01 000 DS:[BX+SI+d8] DS:[EAX+d8] 01 001 DS:[BX+DI+d8] DS:[ECX+d8] 01 010 DS:[BP+SI+d8] DS:[EDX+d8] 01 011 DS:[BP+DI+d8] DS:[EBX+d8] 01 100 DS:[SI+d8] s-i-b is present (See 6.2.4) 01 101 DS:[DI+d8] SS:[EBP+d8] 01 110 SS:[BP+d8] DS:[ESI+d8] 01 111 DS:[BX+d8] DS:[EDI+d8] 10 000 DS:[BX+SI+d16] DS:[EAX+d32] 10 001 DS:[BX+DI+d16] DS:[ECX+d32] 10 010 DS:[BP+SI+d16] DS:[EDX+d32] 10 011 DS:[BP+DI+d16] DS:[EBX+d32] 10 100 DS:[SI+d16] s-i-b is present (See 6.2.4) 10 101 DS:[DI+d16] SS:[EBP+d32] 10 110 SS:[BP+d16] DS:[ESI+d32] 10 111 DS:[BX+d16] DS:[EDI+d32] 11 000-11 111 See Table 6-8 See Table 6-8 6-6 ST6x86 Table 6.8. mod r/m Field Encoding Dependent on w Field 16-BIT OPERATION w=0 mod r/m 16-BIT OPERATION w=1 32-BIT OPERATION w=0 32-BIT OPERATION w=1 11 000 AL AX AL EAX 11 001 CL CX CL ECX 11 010 DL DX DL EDX 11 011 BL BX BL EBX 11 100 AH SP AH ESP 11 101 CH BP CH EBP 11 110 DH SI DH ESI 11 111 BH DI BH EDI 6.2.3.1 reg Field The reg field (Table 6.9) determines which general registers are to be used. The selected register is dependent on whether a 16 or 32 bit operation is current and the status of the w bit. Table 6.9. reg Field reg 6-7 16-BIT OPERATION w Field Not Present 32-BIT OPERATION w Field Not Present 16-BIT OPERATION w=0 16-BIT OPERATION w=1 32-BIT OPERATION w=0 32-BIT OPERATION w =1 000 AX EAX AL AX AL EAX 001 CX ECX CL CX CL ECX 010 DX EDX DL DX DL EDX 011 BX EBX BL BX BL EBX 100 SP ESP AH SP AH ESP 101 BP EBP CH BP CH EBP 110 SI ESI DH SI DH ESI 111 DI EDI BH DI BH EDI ST6x86 6.2.3.2 sreg3 Field The sreg3 field (Table 6.10) is 3-bit field that is similar to the sreg2 field, but allows use of the FS and GS segment registers. Table 6.10. sreg3 Field Encoding sreg3 FIELD SEGMENT REGISTER SELECTED 000 ES 001 CS 010 SS 011 DS 100 FS 101 GS 110 undefined 111 undefined 6.2.3.3 sreg2 Field The sreg2 field (Table 6.11) is a 2-bit field that allows one of the four 286-type segment registers to be specified. Table 6.11. sreg2 Field Encoding sreg2 FIELD SEGMENT REGISTER SELECTED 00 ES 01 CS 10 SS 11 DS 6-8 ST6x86 6.2.4 s-i-b Byte The s-i-b fields provide scale factor, indexing and a base field for address selection. 6.2.4.1 ss Field The ss field (Table 6.12) specifies the scale factor used in the offset mechanism for address calculation. The scale factor multiplies the index value to provide one of the components used to calculate the offset address. Table 6.12. ss Field Encoding ss FIELD SCALE FACTOR 00 x1 01 x2 10 x4 11 x8 6.2.4.2 index Field The index field (Table 6.13) specifies the index register used by the offset mechanism for offset address calculation. When no index register is used (index field = 100), the ss value must be 00 or the effective address is undefined. Table 6.13. Index Field Encoding 6-9 Index FIELD INDEX REGISTER 000 EAX 001 ECX 010 EDX 011 EBX 100 none 101 EBP 110 ESI 111 EDI ST6x86 6.2.4.3 Base Field In (Table 6.7), the note "s-i-b present" for certain entries forces the use of the mod and base field as listed in Table 6.14. The first two digits in the first column of Table 6.14 identifies the mod bits in the mod r/m byte. The last three digits in the first column of this table identifies the base fields in the s-i-b byte. Table 6.14. mod base Field Encoding mod FIELD WITHIN mode/rm BYTE base FIELD WITHIN s-i-b BYTE 32-BIT ADDRESS MODE with mod r/m and s-i-b Bytes Present 00 000 DS:[EAX+(scaled index)] 00 001 DS:[ECX+(scaled index)] 00 010 DS:[EDX+(scaled index)] 00 011 DS:[EBX+(scaled index)] 00 100 SS:[ESP+(scaled index)] 00 101 DS:[d32+(scaled index)] 00 110 DS:[ESI+(scaled index)] 00 111 DS:[EDI+(scaled index)] 01 000 DS:[EAX+(scaled index)+d8] 01 001 DS:[ECX+(scaled index)+d8] 01 010 DS:[EDX+(scaled index)+d8] 01 011 DS:[EBX+(scaled index)+d8] 01 100 SS:[ESP+(scaled index)+d8] 01 101 SS:[EBP+(scaled index)+d8] 01 110 DS:[ESI+(scaled index)+d8] 01 111 DS:[EDI+(scaled index)+d8] 10 000 DS:[EAX+(scaled index)+d32] 10 001 DS:[ECX+(scaled index)+d32] 10 010 DS:[EDX+(scaled index)+d32] 10 011 DS:[EBX+(scaled index)+d32] 10 100 SS:[ESP+(scaled index)+d32] 10 101 SS:[EBP+(scaled index)+d32] 10 110 DS:[ESI+(scaled index)+d32] 10 111 DS:[EDI+(scaled index)+d32] 6-10 ST6x86 6.3 CPUID Instruction The ST6x86 CPU executes the CPUID instruction (opcode 0FA2) as documented in this section only if the CPUID bit in the CCR4 configuration register is set. The CPUID instruction may be used by software to determine the vendor and type of CPU. When the CPUID instruction is executed with EAX = 0, the ASCII characters "CyrixInstead" are placed in the EBX, EDX, and ECX registers as shown in Table 6.15: Table 6.15. CPUID Data Returned When EAX = 0 CONTENTS (D31 - D0) REGISTER EBX 69 i 72 r 79 y 43 C* EDX 73 s 6E n 49 I 78 x* ECX 64 d 61 a 65 e 74 t* When the CPUID instruction is executed with EAX = 1, EAX and EDX contain the values shown in Table 6.16. Table 6.16. CPUID Data Returned When EAX = 1 REGISTER CONTENTS EAX(3-0) 0 EAX(7-4) 3 EAX(11-8) 5 EAX(13-12) 0 EAX(31-14) reserved EDX If EDX = 00, FPU not on-chip. If EDX = 01, FPU on-chip. *ASCII equivalent 6-11 ST6x86 6.4 Instruction Set Tables The ST6x86 CPU instruction set is presented in two tables: Table 6.20 ". ST6x86 CPU Instruction Set Clock Count Summary" and Table 6.21 ". FPU Clock Count Table Abbreviations". Additional information concerning the FPU Instruction Set is presented in section 6.5. 6.4.1 Assumptions Made in Determining Instruction Clock Count The assumptions made in determining instruction clock counts are listed below: 1. All clock counts refer to the internal CPU internal clock frequency. For example, the clock counts for a clock-doubled ST6x86 CPU-100 refer to 100 MHz clocks while the external clock is 50 MHz. 2. The instruction has been prefetched, decoded and is ready for execution. 3. Bus cycles do not require wait states. 4. There are no local bus HOLD requests delaying processor access to the bus. 5. No exceptions are detected during instruction execution. 6. If an effective address is calculated, it does not use two general register components. One register, scaling and displacement can be used within the clock count shown. However, if the effective address calculation uses two general register components, add 1 clock to the clock count shown. 6-12 7. All clock counts assume aligned 32-bit memory/IO operands. 8. If instructions access a 32-bit operand that crosses a 64-bit boundary, add 1 clock for read or write and add 2 clocks for read and write. 9. For non-cached memory accesses, add two clocks (ST6x86 CPU with 2x clock) or four clocks (ST6x86 CPU with 3x clock). (Assumes zero wait state memory accesses). 10. Locked cycles are not cacheable. Therefore, using the LOCK prefix with an instruction adds additional clocks as specified in paragraph 9 above. 11. No parallel execution of instructions. 6.4.2 CPU Instruction Set Summary Table Abbreviations The clock counts listed in the CPU Instruction Set Summary Table are grouped by operating mode and whether there is a register/cache hit or a cache miss. In some cases, more than one clock count is shown in a column for a given instruction, or a variable is used in the clock count. The abbreviations used for these conditions are listed in Table 6.17. ST6x86 Table 6.17. CPU Clock Count Abbreviations CLOCK COUNT SYMBOL 6.4.3 EXPLANATION / Register operand/memory operand. n Number of times operation is repeated. L Level of the stack frame. | Conditional jump taken | Conditional jump not taken. (e.g. "4|1" = 4 clocks if jump taken, 1 clock if jump not taken) \ CPL IOPL \ CPL > IOPL (where CPL = Current Privilege Level, IOPL = I/O Privilege Level) m Number of parameters passed on the stack. CPU Instruction Set Summary Table Flags Table The CPU Instruction Set Summary Table lists nine flags that are affected by the execution of instructions. The conventions shown in Table 6.18 are used to identify the different flags. Table 6.19 lists the conventions used to indicate what action the instruction has on the particular flag. Table 6.18. Flag Abbreviations ABBREVIATION NAME OF FLAG OF Overflow Flag DF Direction Flag IF Interrupt Enable Flag TF Trap Flag SF Sign Flag ZF Zero Flag AF Auxiliary Flag PF Parity Flag CF Carry Flag Table 6.19. Action of Instruction on Flag INSTRUCTION TABLE SYMBOL 6-13 ACTION x Flag is modified by the instruction. - Flag is not changed by the instruction. 0 Flag is reset to "0". 1 Flag is set to "1". u Flag is undefined following execution of the instruction. 6-14 D5 0A D4 0A 3F AAD ASCII Adjust AX before Divide AAM ASCII Adjust AX after Multiply AAS ASCII Adjust AL after Subtract 1 1 1 1 1 1 1 1 0 [000w] [mod reg r/m] 0 [001w] [mod reg r/m] 8 [00sw] [mod 000 r/m]### 0 [010w] ### Memory to Register Immediate to Register/Memory Immediate to Accumulator # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) x x Register to Memory x x 1 x 1 - 0 [00dw] [11 reg r/m] Register to Register - 1 1 1 [010w] ### Immediate to Accumulator - 1 1 8 [00sw] [mod 010 r/m]### Immediate to Register/Memory x 1 ADD Integer Add 1 1 7 13-21 7 7 1 x x 7 13-21 7 7 1 [001w] [mod reg r/m] x x x x u x u u x u x u x u x u u x 1 [000w] [mod reg r/m] x u x x u Memory to Register - - - - - Register to Memory - - - - - Reg/ Cache Hit 1 - - - - - Reg/ Cache Hit 1 x u u u u OF DF IF TF SF ZF AF PF CF b b Real Mode h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1 [00dw] [11 reg r/m] OPCODE FLAGS REAL MODE CLOCK COUNT Register to Register ADC Add with Carry 37 AAA ASCII Adjust AL after Add INSTRUCTION Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary . ST6x86 - - - - - - - - - x - - x - - - - - - 20 0F C[1 reg] BSWAP Byte Swap 0F A3 [mod reg r/m] Register/Memory, Register 0F BB [mod reg r/m] Register/Memory, Register # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 0F BA [mod 111 r/m]# Register/Memory, Immediate BTC Test Bit and Complement 0F BA [mod 100 r/m]# Register/Memory, Immediate BT Test Bit 0F BD [mod reg r/m] Register, Register/Memory BSR Scan Bit Reverse Register, Register/Memory - - - - - - - - - - - - - - - - - - - - - - x - - x - x - - - - 5/6 3 5/6 2 4 3 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - 3 - - - BSF Scan Bit Forward 0F BC [mod reg r/m] - - 11 62 [mod reg r/m] 63 [mod reg r/m] - If In Range If Out of Range (Int 5) BOUND Check Array Boundaries From Register/Memory - 5/6 3 5/6 2 4 3 3 11 20+INT 9 1 1 2 [010w] ### Immediate to Accumulator - 1 1 8 [00sw] [mod 100 r/m]### Immediate to Register/Memory ARPL Adjust Requested Privilege Level 1 1 2 [001w] [mod reg r/m] Memory to Register x 0 1 x u 1 x 2 [000w] [mod reg r/m] - Register to Memory - Reg/ Cache Hit 1 - Reg/ Cache Hit 1 0 OF DF IF TF SF ZF AF PF CF b b b b b, e a b Real Mode h h h h g,h,j,k,r h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 2 [00dw] [11 reg r/m] OPCODE REAL MODE CLOCK COUNT Register to Register AND Boolean AND INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-15 6-16 0F B3 [mod reg r/m] Register/Memory, Register 0F AB [mod reg r/m] Register (short form) # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 0F BA [mod 101 r/m] Register/Memory BTS Test Bit and Set 0F BA [mod 110 r/m]# OPCODE Register/Memory, Immediate BTR Test Bit and Reset INSTRUCTION - - - - - - - - - - - - - x - x 5/6 3 5/6 3 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - OF DF IF TF SF ZF AF PF CF FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 5/6 3 5/6 3 Reg/ Cache Hit b b Real Mode h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 - - 5 4 3 96 118 110 35+2m 26 15 1 1/3 1 1/3 124 102 32-bit Task to 16-bit TSS 32-bit Task to 32-bit TSS 6-17 # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 32-bit Task to V86 Task 116 16-bit Task to V86 Task 122 114 40+2m 31 100 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 8 20 16-bit Task to 32-bit TSS 16-bit Task to 16-bit TSS Call Gate to Different Privilege Level m Par's Call Gate to Different Privilege No Parameters Call Gate to Same Privilege Indirect Intersegment FF [mod 011 r/m] - - 98 - 120 - 32-bit Task to V86 Task - Reg/ Cache Hit 32-bit Task to 32-bit TSS - Reg/ Cache Hit b Real Mode h,j,k,r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 112 - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT 32-bit Task to 16-bit TSS selector] 9A [unsigned full offset, 16-bit Task to 32-bit TSS 16-bit Task to V86 Task FF [mod 010 r/m] E8 +++ OPCODE 16-bit Task to 16-bit TSS Call Gate to Different Privilege m Par's Call Gate to Different Privilege No Parameters Call Gate to Same Privilege Direct Intersegment Register/Memory Indirect Within Segment Direct Within Segment CALL Subroutine Call INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-18 F8 FC FA 0F 06 F5 CLC Clear Carry Flag CLD Clear Direction Flag CLI Clear Interrupt Flag CLTS Clear Task Switched Flag CMC Complement the Carry Flag 0F A2 99 98 27 CPUID CPU Identification CWD Convert Word to Doubleword CWDE Convert Word to Doubleword Extended DAA Decimal Adjust AL after Add - x x x - - - - - - - - - - - - - - - - - - - - 0 - - - - - - - - - - - - 0 - - - - - - - - - - - - - - - - - - x - - - x x x - - - - - - - x x - - - - - - x x x x x x - - - - - - - - - - - - - - x x - - - - - - x x x x x x - x - - - - - - - 0 - - - - OF DF IF TF SF ZF AF PF CF 9 2 2 12 11 11 9 2 2 12 11 11 5 1 1 5 1 1 1 1 2 10 7 7 1 2 3 Reg/ Cache Hit b b c Real Mode h h l m Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1 1 1 1 2 10 7 7 1 2 3 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 0F B [000w] [mod reg r/m] Memory, Register # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 0F B [000w] [11 reg2 reg1] A [011w] 3 [110w] ### 8 [00sw] [mod 111 r/m] ### 3 [100w] [mod reg r/m] 3 [101w] [mod reg r/m] Register1, Register2 CMPXCHG Compare and Exchange CMPS Compare String Immediate to Accumulator Immediate to Register/Memory Memory to Register Register to Memory Register to Register OPCODE 3 [10dw] [11 reg r/m] 99 CMP Compare Integers 98 CBW Convert Byte to Word CDQ Convert Doubleword to Quadword INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 F [011w] [mod 110 r/m] DIV Unsigned Divide - - 5 5 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 17-45 u u - 16-28 x - 17-45 x - - 16-28 - - Doubleword - - Word - - - 16-20 - 16-20 F [011w] [mod 111 r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) Divisor: Byte Accumulator by Register/Memo ry IDIV Integer (Signed) Divide HLT Halt - 10+L*3 10+L*3 F4 13 13 Level = 1 Level (L) > 1 10 10 Level = 0 ENTER Enter New Stack Frame 13-41 - u 1 1 9 13-41 - u 1 1 9 Doubleword - x x x 13-25 - x x x x 13-25 - x - x - x x Word - - - - Reg/ Cache Hit 13-17 - - - - Reg/ Cache Hit b,e b b,e b Real Mode e,h l h e,h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 13-17 - - - x - - - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT Divisor: Byte C8 ##,# 4 [1 reg] Register (short form) Accumulator by Register/Memory F [111w] [mod 001 r/m] 2F OPCODE Register/Memory DEC Decrement by 1 DAS Decimal Adjust AL after Subtract INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-19 6-20 with 4 [0 reg] 6 [110w] Register (short form) INS Input String from I/O Port # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) F [111w] [mod 000 r/m] Register/Memory INC Increment by 1 E [110w] - - x x - - - - - - - - - - - - - x - x - - x x - - x u - - x - - - u x OF DF IF TF SF ZF AF PF CF 14 1 1 14 14 14/28 1 1 14/28 14/28 5 11 5 10 4 10 4 4 Reg/ Cache Hit b b b Real Mode h,m h m h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 11 10 4 10 4 4 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 6 [10s1] [mod reg r/m] ### Variable Port to E [010w] [#] Immediate 0F AF [mod reg r/m] F [011w] [mod 101 r/m] OPCODE Fixed Port IN Input from I/O Port Doubleword Multiplier: Word Register/Memory Register2 Doubleword Multiplier: Word Register with Register/Memory Doubleword Word Multiplier: Byte Accumulator by Register/Memory IMUL Integer (Signed) Multiply INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 - - - - - x - 0 - - - - - - - - - - OF DF IF TF SF ZF AF PF CF INT 9 Reg/ Cache Hit Reg/ Cache Hit # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) - - - - - - - 13 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - 13 0F 01 [mod 111 r/m] INVLPG Invalidate TLB Entry - 12 12 0F 08 INVD Invalidate Cache 6 15+INT 6 If OF==1 (INT 4) If OF==0 CE 46 V86 to Privilege 0 by Trap Gate/Int Gate INTO 102 V86 to 32-bit TSS by Task Gate INT 124 V86 to 16-bit TSS by Task Gate CC 102 32-bit Task to V86 by Task Gate INT 3 116 16-bit Task to V86 by Task Gate 124 100 16-bit Task to 32-bit TSS by Task Gate 32-bit Task to 32-bit TSS by Task Gate 122 16-bit Task to 16-bit TSS by Task Gate 16-bit Task to 16-bit TSS by Task Gate 32 114 Interrupt or Trap to Different Privilege t b,e Real Mode t g,j,k,r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 21 CD # OPCODE REAL MODE CLOCK COUNT Interrupt or Trap to Same Privilege Protected Mode: INT i INT Software Interrupt INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-21 6-22 - - x - x - x - x - x - - x x 7 Reg/ Cache Hit - - - - - - - - - - - - - - - - 1 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - 1 0F 86 +++ E3 + 1 1 1 1 1 1 0F 82 +++ 76 + 1 1 72 + # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) JCXZ/JECXZ Jump on CX/ECX Zero Full Displacement 8-bit Displacement JBE/JNA Jump on Below or Equal/Not Above Full Displacement 8-bit Displacement Carry - 105 32-bit Task to V86 Task JB/JNAE/JC Jump on Below/Not Above or Equal/ 119 127 103 16-bit Task to V86 Task 32-bit Task to 32-bit TSS 125 16-bit Task to 32-bit TSS 32-bit Task to 16-bit TSS 117 16-bit Task to 16-bit Task 26 - x Reg/ Cache Hit Within Task to Different Privilege x OF DF IF TF SF ZF AF PF CF Real Mode r r r g,h,j,k,r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 10 CF OPCODE REAL MODE CLOCK COUNT Within Task to Same Privilege Protected Mode: Real Mode IRET Interrupt Return INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 - - - - - - - - - 1 1 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 1 - - - 1 - - - 0F 8E +++ - - - 7E + - - - 1 - - 0F 8C +++ - - Reg/ Cache Hit 1 # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) Full Displacement 8-bit Displacement JLE/JNG Jump on Less or Equal/Not Greater Full Displacement 8-bit Displacement - - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT 7C + 0F 84 +++ Full Displacement JL/JNGE Jump on Less/Not Greater or Equal 74 + OPCODE 8-bit Displacement JE/JZ Jump on Equal/Zero INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 1 1 1 1 1 1 Reg/ Cache Hit Real Mode r r r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 6-23 6-24 112 120 98 32-bit Task to 16-bit TSS 32-bit Task to 32-bit TSS 32-bit Task to V86 Task 121 99 115 123 101 16-bit Task to 32-bit TSS 16-bit Task to V86 Task 32-bit Task to 16-bit TSS 32-bit Task to 32-bit TSS 32-bit Task to V86 Task + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 113 16-bit Task to 16-bit TSS # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 17 Call Gate Same Privilege Level 7 96 16-bit Task to V86 Task 5 118 16-bit Task to 32-bit TSS FF [mod 101 r/m] 110 16-bit Task to 16-bit TSS Indirect Intersegment 14 4 1 EA [unsigned full offset, selector] Direct Intersegment Call Gate Same Privilege Level 1/3 1/3 FF [mod 100 r/m] - - Register/Memory Indirect Within Segment 1 1 - E9 +++ - Full Displacement - Reg/ Cache Hit 1 - Reg/ Cache Hit 1 - OF DF IF TF SF ZF AF PF CF b Real Mode h,j,k,r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT EB + OPCODE REAL MODE CLOCK COUNT 8-bit Displacement JMP Unconditional Jump INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 Not Below on Not 0F 81 +++ # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) - - - - - - - - - - - - - 1 1 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - 1 1 1 - - - - 0F 8F +++ - - - 1 - - - 1 1 7F + Full Displacement or 71 + Less - - - - - - 1 - - - - 1 - - - - 0F 8D +++ - - - - 7D + 8-bit Displacement JNO Jump on Not Overflow Full Displacement 8-bit Displacement Jump JNLE/JG Equal/Greater Full Displacement 8-bit Displacement JNL/JGE Jump on Not Less/Greater or Equal 0F 85 +++ Full Displacement - - - 1 - - 1 - - Reg/ Cache Hit 0F 87 +++ - - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT 77 + 75 + or 8-bit Displacement JNE/JNZ Jump on Not Equal/Not Zero Full Displacement 8-bit Displacement on 0F 83 +++ JNBE/JA Jump Equal/Above 73 + Full Displacement OPCODE 8-bit Displacement JNB/JAE/JNC Jump on Not Below/Above or Equal/Not Carry INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 1 1 1 1 1 1 1 1 1 1 1 1 Reg/ Cache Hit Real Mode r r r r r r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 6-25 6-26 0F 8B +++ Full Displacement 0F 89 +++ Full Displacement 0F 80 +++ Full Displacement 0F 8A +++ 78 + 0F 88 +++ 9F Full Displacement 8-bit Displacement Full Displacement LAHF Load AH with Flags # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) With Index Register No Index Register 8D [mod reg r/m] C5 [mod reg r/m] LDS Load Pointer to DS LEA Load Effective Address 0F 02 [mod reg r/m] From Register/Memory LAR Load Access Rights JS Jump on Sign 7A + 8-bit Displacement JP/JPE Jump on Parity/Parity Even 70 + 8-bit Displacement JO Jump on Overflow 79 + 8-bit Displacement JNS Jump on Not Sign 7B + OPCODE 8-bit Displacement JNP/JPO Jump on Not Parity/Parity Odd INSTRUCTION - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 1 2 2 1 1 1 1 1 1 1 1 1 1 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - - - - - - OF DF IF TF SF ZF AF PF CF FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 1 1 4 8 2 1 1 1 1 1 1 1 1 1 1 Reg/ Cache Hit b a Real Mode h,i,j g,h,j,p r r r r r Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 0F B4 [mod reg r/m] 0F 01 [mod 010 r/m] 0F B5 [mod reg r/m] 0F 01 [mod 011 r/m] LFS Load Pointer to FS LGDT Load GDT Register LGS Load Pointer to GS LIDT Load IDT Register 0F 00 [mod 011 r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) From Register/Memory LTR Load Task Register LSS Load Pointer to SS From Register/Memory - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 1 1 1 3 13 5 8 2 8 2 2 4 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - - - - - - - - - - - - 1 1 1 3 13 5 8 4 8 4 4 4 7 4 E1 + LOOPZ/LOOPE Offset - - - - - - - - - - Reg/ Cache Hit 0F B2 [mod reg r/m] E0 + LOOPNZ/LOOPNE Offset - - - - - - - - - - Reg/ Cache Hit a a a b b,c a b,c b b,c b b b Real Mode g,h,j,l h,i,j g,h,j,p r r r h h,l g,h,j,l h,l h,i,j h,l h,i,j h,i,j h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 8 E2 + LOOP Offset Loop/No Loop - - - - - - - - - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT 0F 03 [mod reg r/m] A [110 w] LODS Load String LSL Load Segment Limit 0F 01 [mod 110 r/m] From Register/Memory LMSW Load Machine Status Word From Register/Memory 0F 00 [mod 010 r/m] C4 [mod reg r/m] LLDT Load LDT Register C9 LEAVE Leave Current Stack Frame OPCODE LES Load Pointer to ES INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-27 6-28 Move to/from Control/Debug/Test OPCODE 14 10 5 10 6 14 10 5 10 6 0F 26 [11 eee reg] 0F 24 [11 eee reg] 0F 26 [11 eee reg] 0F 24 [11 eee reg] + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 16 16 1 1 0F 21 [11 eee reg] 1/3 1 0F 23 [11 eee reg] 1 1 14 1 1 14 1 1 0F 21 [11 eee reg] 1 1 1 1 1 1 1 1 16 - - - - 16 - - 0F 23 [11 eee reg] - - 6 - - 20/5/5 - - 6 - - Reg/ Cache Hit 20/5/5 - - Reg/ Cache Hit b Real Mode l h,i,j Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 0F 20 [11 eee reg] - - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT 0F 22 [11 eee reg] 8C [mod sreg3 r/m] 8E [mod sreg3 r/m] A [001w] +++ A [000w] +++ B [w reg] ### C [011w] [mod 000 r/m] ### 8 [101w] [mod reg r/m] 8 [100w] [mod reg r/m] 8 [10dw] [11 reg r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) TR6-TR7 to Register Register to TR6-TR7 TR3-5 to Register Register to TR3-5 DR6-DR7 to Register Register to DR6-DR7 DR0-DR3 to Register Register to DR0-DR3 CR0/CR2/CR3 to Register Register to CR0/CR2/CR3 MOV Regs Segment Register to Register/Memory Register/Memory to Segment Register Accumulator to Memory (short form) Memory to Accumulator (short form) Immediate to Register (short form) Immediate to Register/Memory Register/Memory to Register Register to Memory Register to Register MOV Move Data INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 - - - - - - - - - - - - - - x x - - - - - - x x x u - - - - x x u x - - - - 4 1 1 4 F [011w] [mod 010 r/m] 0F FF NOT Boolean Complement OIO Official Invalid OpCode 0 [110w] ### - 0 - - - - x - - - 0 - - x - - - x - - - u - - - x 0 - - - - - - 1 1 1 1 1 1 8 - 125 1 1 1 10 4 4 1 1 4 Reg/ Cache Hit b b b b b b b Real Mode h h h h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1 1 1 1 1 1 1 1 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 8 [00sw] [mod 001 r/m] ### 0 [101w] [mod reg r/m] 0 [100w] [mod reg r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) Immediate to Accumulator Immediate to Register/Memory Memory to Register Register to Memory Register to Register 0 [10dw] [11 reg r/m] 90 NOP No Operation OR Boolean OR F [011w] [mod 011 r/m] NEG Negate Integer 4 - - Reg/ Cache Hit 10 x x - - - OF DF IF TF SF ZF AF PF CF Doubleword F [011w] [mod 100 r/m] 0F B[011w] [mod reg r/m] 0F B[111w] [mod reg r/m] A [010w] OPCODE REAL MODE CLOCK COUNT Word Multiplier: Byte Accumulator with Register/Memory MUL Unsigned Multiply Register from Register/Memory MOVZX Move with Zero Extension Register from Register/Memory MOVSX Move with Sign Extension MOVS Move String INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-29 6-30 E [111w] 6 [111w] Variable Port OUTS Output String 67 66 Address Size Prefix Operand Size Prefix 3E 26 64 65 36 DS ES FS GS SS # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 2E CS Segment Override Prefix F0 Assert Hardware LOCK Prefix PREFIX BYTES 9D POPF Pop Stack into FLAGS x x - - x - - x - - x - - x - - x - - - x x - 9 6 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - 9 6 3 - 1 0F [10 sreg3 001] 61 Segment Register (FS, GS) POPA Pop All General Registers 3 1 14/28 14/28 14/28 [000 sreg2 111] 14 14 14 Segment Register (ES, SS, DS) - - - - - - 1 - - - 1 - - - 5 [1 reg] - - - Register (short form) - - - 1 - - - Reg/ Cache Hit 1 - - - Reg/ Cache Hit b b b b Real Mode m h,n h h,i,j h,m m Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 8F [mod 000 r/m] - - - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT Register/Memory POP Pop Value off Stack E [011w] # OPCODE Fixed Port OUT Output to Port INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 D [001w] [mod 010 r/m] C [000w] [mod 010 r/m] # Register/Memory by CL Register/Memory by Immediate D [001w] [mod 011 r/m] C [000w] [mod 011 r/m] # F3 6[110w] F3 A[110w] F3 A[010w] F3 6[111w] Register/Memory by CL Register/Memory by Immediate REP INS Input String REP LODS Load String REP MOVS Move String REP OUTS Output String # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) D [000w] [mod 011 r/m] Register/Memory by 1 RCR Rotate Through Carry Right D [000w] [mod 010 r/m] Register/Memory by 1 RCL Rotate Through Carry Left - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - x - x - x - x - x - - - - 12+5n 9+n 10+n 12+5n 9 9 4 8 8 3 2 6 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - u u x u u x - - 9C - PUSHF Push FLAGS Register - 60 PUSHA Push All General Registers 28+5n 12+5n\ 9+n 10+n 28+5n 12+5n\ 9 9 4 8 8 3 2 6 1 1 1 1 1 1 1 1 Reg/ Cache Hit b b b b b b b b b Real Mode h,m h h h,m h h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1 6 [10s0] ### - - Immediate - Segment Register (FS, GS) - 1 - 0F [10 sreg3 000] - Segment Register (ES, CS, SS, DS) - 5 [0 reg] [000 sreg2 110] - Reg/ Cache Hit FF [mod 110 r/m] - OF DF IF TF SF ZF AF PF CF Register (short form) OPCODE REAL MODE CLOCK COUNT Register/Memory PUSH Push Value onto Stack INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-31 6-32 - - - - - - - x 1 D[001w] [mod 000 r/m] C[000w] [mod 000 r/m] # Register/Memory by CL Register/Memory by Immediate # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) D[000w] [mod 000 r/m] Register/Memory by 1 - - - - - - - - - - - - - x - x 1 2 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) u u 1 2 1 23 Intersegment Adding Immediate to SP ROL Rotate Left 23 Intersegment x 7 4 CA ## Protected Mode: Different Privilege Level 7 4 CB 10+2n 10+2n 10+2n 10+2n 10+n Intersegment Adding Immediate to SP - - 10+2n 10+2n 10+2n 10+2n 10+n Intersegment - x x x x x x x x - - 4 - x x x x - 4 - x x x x - C2 ## - x x x x - Within Segment Adding Immediate to SP - - - - - - 3 - - - - - - Reg/ Cache Hit 3 - - - - - - Reg/ Cache Hit b b b b b b b Real Mode h g,h,j,k,r h h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT C3 x x x x - OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT Within Segment RET Return from Subroutine (Find AL/AX/EAX) REPNE SCAS Scan String (Find match) F2 A[111w] F2 A[011w] REPNE CMPS Compare String (Find non-AL/AX/EAX) F3 A[111w] F3 A[011w] F3 A[101w] OPCODE REPE SCAS Scan String (Find non-match) REPE CMPS Compare String REP STOS Store String INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 C[000w] [mod 001 r/m] # 0F 79 [mod sreg3 r/m] Register/Memory by Immediate RSDC Restore Descriptor 0F AA 0F 7D [mod 000 r/m] 9E RSM Resume from SMM Mode RSTS Restore TSR and Descriptor SAHF Store AH in FLAGS D[001w] [mod 100 r/m] C[000w] [mod 100 r/m] # Register/Memory by CL Register/Memory by Immediate D[001w] [mod 111 r/m] C[000w] [mod 111 r/m] # Register/Memory by CL Register/Memory by Immediate # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) D[000w] [mod 111 r/m] Register/Memory by 1 SAR Shift Right Arithmetic D[000w] [mod 100 r/m] Register/Memory by 1 SAL Shift Left Arithmetic 0F 7B [mod 000 r/m] RSLDT Restore LDTR and Descriptor Register and D[001w] [mod 001 r/m] Register/Memory by CL Segment D[000w] [mod 001 r/m] OPCODE Register/Memory by 1 ROR Rotate Right INSTRUCTION - - - - - - - - - x - - - - - - - - - - - - - - - - - x - x - - - - - x x x x x x x - x - - - - - - x x - x x - x x - - - - - x x - - - - - u u u u u u x - x - - - - - x x x x x x x x x x x x x x - - x x - - - - - x - x - x 1 2 1 1 2 1 1 6 40 6 6 1 2 1 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) u u x u u x - x - - u u x OF DF IF TF SF ZF AF PF CF FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 1 2 1 1 2 1 1 6 40 6 6 1 2 1 Reg/ Cache Hit b b s s s s b Real Mode h h s s s s h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 6-33 6-34 Set Byte Set on Byte on Less/Not 0F 9C [mod 000 r/m] 0F 94 [mod 000 r/m] 0F 96 [mod 000 r/m] 0F 92 [mod 000 r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) To Register/Memory or Equal SETL/SETNGE Greater To Register/Memory SETE/SETZ Set Byte on Equal/Zero To Register/Memory Equal/Not Above SETBE/SETNA Set Byte on Below or To Register/Memory Above or Equal/Carry SETB/SETNAE/SETC Below/Not - - - - - - - - - - - - - - - - - x - - - - x - - - - x - - - - - - - - x x 1 1 1 1 2 + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - 1 1 1 1 2 A [111w] SCAS Scan String - 1 1 1[110w] ### Immediate to Accumulator (short form) - 1 1 8[00sw] [mod 011 r/m] ### Immediate to Register/Memory x 1 1 1[101w] [mod reg r/m] x x Memory to Register x 1 x 1 x 1[100w] [mod reg r/m] - Register to Memory - Reg/ Cache Hit 1 - Reg/ Cache Hit 1 x OF DF IF TF SF ZF AF PF CF b b Real Mode h h h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1[10dw] [11 reg r/m] OPCODE REAL MODE CLOCK COUNT Register to Register SBB Integer Subtract with Borrow INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 Set Byte on Less or Set Byte on Not OPCODE 0F 91 [mod 000 r/m] 0F 9F [mod 000 r/m] 0F 9D [mod 000 r/m] 0F 95 [mod 000 r/m] 0F 97 [mod 000 r/m] 0F 93 [mod 000 r/m] 0F 9E [mod 000 r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) To Register/Memory SETNO Set Byte on Not Overflow To Register/Memory Equal/Greater SETNLE/SETG Set Byte on Not Less or To Register/Memory or Equal SETNL/SETGE Less/Greater To Register/Memory SETNE/SETNZ Set Byte on No t Equal/Not Zero To Register/Memory Equal/Above SETNBE/SETA Set Byte on Not Below or To Register/Memory Above or Equal/Not Carry SETNB/SETAE/SETNC Set Byte on Not Below/ To Register/Memory Greater SETLE/SETNG Equal/Not INSTRUCTION - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 1 1 1 1 1 1 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - - - - OF DF IF TF SF ZF AF PF CF FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 1 1 1 1 1 1 1 Reg/ Cache Hit Real Mode h h h h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 6-35 6-36 D [001w] [mod 100 r/m] C [000w] [mod 100 r/m] # Register/Memory by Immediate 0F A5 [mod reg r/m] Register/Memory by CL # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 0F A4 [mod reg r/m] # Register/Memory by Immediate SHLD Shift Left Double D [000w] [mod 100 r/m] Register/Memory by CL 0F 01 [mod 000 r/m] 0F 98 [mod 000 r/m] 0F 9A [mod 000 r/m] 0F 90 [mod 000 r/m] 0F 99 [mod 000 r/m] 0F 9B [mod 000 r/m] OPCODE Register/Memory by 1 SHL Shift Left Logical To Register/Memory SGDT Store GDT Register To Register/Memory SETS Set Byte on Sign To Register/Memory SETP/SETPE Set Byte on Parity/Parity Even To Register/Memory SETO Set Byte on Overflow To Register/Memory SETNS Set Byte on Not Sign To Register/Memory Parity/Parity Odd SETNP/SETPO Set Byte on Not INSTRUCTION - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x x x x x x x x - - - - - - u u u u - - - - - - x x x x x x x x - - - - - - - - - - - - 5 4 1 2 1 4 1 1 1 1 1 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) u u u x - - - - - - OF DF IF TF SF ZF AF PF CF FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 5 4 1 2 1 4 1 1 1 1 1 Reg/ Cache Hit b b b,c Real Mode h h h h h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 D [001w] [mod 101 r/m] C [000w] [mod 101 r/m] # Register/Memory by CL Register/Memory by Immediate To Register/Memory 0F 01 [mod 100 r/m] F9 FD FB A [101w] SMSW Store Machine Status Word STC Set Carry Flag STD Set Direction Flag STI Set Interrupt Flag STOS Store String # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) To Register/Memory 0F 00 [mod 001 r/m] 0F 7E SMINT Software SMM Entry STR Store Task Register 0F 00 [mod 000 r/m] To Register/Memory SLDT Store LDT Register 0F 01 [mod 001 r/m] 0F AD [mod reg r/m] Register/Memory by CL SIDT Store IDT Register 0F AC [mod reg r/m] # Register/Memory by Immediate SHRD Shift Right Double D [000w] [mod 101 r/m] OPCODE Register/Memory by 1 SHR Shift Right Logical INSTRUCTION - - - - - - - - - - - 1 - - - - - - - - - - - 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - - - - x x x x x x x - - - - - - - - - u u u u - - - - - - - - - 1 - - - - - - - - x x x x x x x x 2 7 7 1 6 55 4 5 4 1 2 1 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) - - - - - u u u x OF DF IF TF SF ZF AF PF CF FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) 4 2 7 7 1 6 55 1 4 5 4 1 2 1 Reg/ Cache Hit a b b,c s a b,c b b Real Mode h h m h s h h h h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT ST6x86 6-37 6-38 9B 0F 09 WAIT Wait Until FPU Not Busy WBINVD Write-Back and Invalidate Cache # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) 0F 00 [mod 101 r/m] 0F 00 [mod 100 r/m] A [100w] ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - x - - x x x - - - x - - - - u - - - x - - - - - - - - x 0 - - - - - - x x 15 5 1 1 15 5 7 7 1 1 14 12 1 1 14 12 12 1 1 12 1 1 1 1 Reg/ Cache Hit t a a b s s s b Real Mode t g,h,j,p g,h,j,p h s s s h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1 1 1 1 Reg/ Cache Hit REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) F [011w] [mod 000 r/m] ### 8 [010w] [mod reg r/m] To Register/Memory VERW Verify Write Access To Register/Memory VERR Verify Read Access Immediate Data and Accumulator Immediate Data and Register/Memory Register/Memory and Register TEST Test Bits 0F 7C [mod 000 r/m] SVTS Save TSR and Descriptor - - 0F 7A [mod 000 r/m] SVLDT Save LDTR and Descriptor 0 x OF DF IF TF SF ZF AF PF CF - 2 [110w] ### 8 [00sw] [mod 101 r/m] ### 2 [101w] [mod reg r/m] 2 [100w] [mod reg r/m] 2 [10dw] [11 reg r/m] OPCODE SVDC Save Segment Register and Descrip0F 78 [mod sreg3 r/m] tor Immediate to Accumulator (short form) Immediate to Register/Memory Memory to Register Register to Memory Register to Register SUB Integer Subtract INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 0F C[000w] [mod reg r/m] Memory, Register D7 XLAT Translate Byte 3 [010w] ### - - - - - - - - - - - - - - x - - x x - - x u - - x x 0 - - - - x x 1 1 1 1 1 1 4 2 2 2 2 Reg/ Cache Hit b b,f Real Mode Continued on next page... d. c. This instruction may be executed in Real Mode. In Real Mode, its purpose is primarily to initialize the CPU for Protected Mode. b. Exception 13 fault (general protection) will occur in Real Mode if an operand reference is made that partially or fully extends beyond the maximum CS, DS, ES, FS, or GS segment limit (FFFFH). Exception 12 fault (stack segment limit violation or not present) will occur in Real Mode if an operand reference is made that partially or fully extends beyond the maximum SS limit. h h f,h Protected Mode NOTES x = modified - = unchanged u = undefined PROTECTED MODE CLOCK COUNT 1 1 1 1 4 2 2 2 2 Reg/ Cache Hit a. This is a Protected Mode instruction. Attempted execution in Real Mode will result in exception 6 (invalid op-code). Notes a through c apply to Real Address Mode only: Instruction Notes for Instruction Set Summary 0 x OF DF IF TF SF ZF AF PF CF REAL MODE CLOCK COUNT + = 8-bit signed displacement +++ = full signed displacement (16, 32 bits) 8 [00sw] [mod 110 r/m] ### 3 [001w] [mod reg r/m] 3 [000w] [mod reg r/m] # = immediate 8-bit data ## = immediate 16-bit data ### = full immediate 32-bit data (8, 16, 32 bits) Immediate to Accumulator (short form) Immediate to Register/Memory Memory to Register Register to Memory Register to Register 3 [00dw] [11 reg r/m] 9[0 reg] Register with Accumulator XOR Boolean Exclusive OR 8[011w] [mod reg r/m] Register/Memory with Register XCHG Exchange 0F C[000w] [11 reg2 reg1] OPCODE Register1, Register2 XADD Exchange and Add INSTRUCTION FLAGS Table 6.20. ST6x86 CPU Instruction Set Clock Count Summary (Continued) ST6x86 6-39 Continued on next page... g. LOCK# is asserted during descriptor table accesses. f. LOCK# is automatically asserted, regardless of the presence or absence of the LOCK prefix. e. An exception may occur, depending on the value of the operand. Notes e through g apply to Real Address Mode and Protected Virtual Address Mode: .Instruction Notes for Instruction Set Summary ST6x86 6-40 t. The total clock count is the clock count shown plus the number of clocks required to write all "modified" cache lines to external memory. Note t applies to cache invalidation instructions with the cache operating in write-back mode: s. All memory accesses to SMM space are non-cacheable. An invalid opcode exception 6 occurs unless SMI is enabled and ARR3 size > 0, and CPL = 0 and [SMAC is set or if in an SMI handler]. Note s applies to SGS-Thomson specific SMM instructions: r. The destination of a JMP, CALL, INT, RET, or IRET must be in the defined limit of a code segment or an exception 13 fault will occur. q. If the coprocessor 's memory operand violates a segment limit or segment access rights, an exception 13 fault will occur before the ESC instruction is executed. An exception 12 fault will occur if the stack limit is violated by the operand's starting address. p. Any violation of privilege rules as apply to the selector operand does not cause a Protection exception, rather, the zero flag is cleared. o. The PE bit of the MSW (CR0) cannot be reset by this instruction. Use MOV into CRO if desiring to reset the PE bit. n. The IF bit of the flag register is not updated if CPL is greater than IOPL. The IOPL and VM fields of the flag register are updated only if CPL = 0. m.An exception 13 fault occurs if CPL is greater than IOPL. l. An exception 13 fault occurs if CPL is greater than 0 (0 is the most privileged level). k. JMP, CALL, INT, RET, and IRET instructions referring to another code segment will cause an exception 13, if an applicable privilege rule is violated. j. All segment descriptor accesses in the GDT or LDT made by this instruction will automatically assert LOCK# to maintain descriptor integrity in multiprocessor systems. i. For segment load operations, the CPL, RPL, and DPL must agree with the privilege rules to avoid an exception 13 fault. The segment's descriptor must indicate "present" or exception 11 (CS, DS, ES, FS, GS not present). If the SS register is loaded and a stack segment not present is detected, an exception 12 occurs. h. Exception 13 fault will occur if the memory operand in CS, DS, ES, FS, or GS cannot be used due to either a segment limit violation or an access rights violation. If a stack limit is violated, an exception 12 occurs. Notes h through r apply to Protected Virtual Address Mode only: ST6x86 6-41 ST6x86 6-42 ST6x86 6.5 FPU Clock Counts The CPU is functionally divided into the FPU, and the integer unit. The FPU processes floating point instructions only and does so in parallel with the integer unit. For example, when the integer unit detects a floating point instruction without memory operands, after two clock cycles the instruction passes to the FPU for execution. The integer unit continues to execute instructions while the FPU executes the floating point instruction. If another FPU instruc- tion is encountered, the second FPU instruction is placed in the FPU queue. Up to four FPU instructions can be queued. In the event of an FPU exception, while other FPU instructions are queued, the state of the CPU is saved to ensure recovery. 6.5.1 FPU Clock Count Table The clock counts for the FPU instructions are listed in Table 6.21. The abbreviations used in this table are listed in FPU Clock Count Table Abbreviations Table 6.21 FPU Clock Count Table Abbreviations ABBREVIATION MEANING n Stack register number TOS Top of stack register pointed to by SSS in the status register. ST(1) FPU register next to TOS ST(n) A specific FPU register, relative to TOS M.WI 16-bit integer operand from memory M.SI 32-bit integer operand from memory M.LI 64-bit integer operand from memory M.SR 32-bit real operand from memory M.DR 64-bit real operand from memory M.XR 80-bit real operand from memory M.BCD 18-digit BCD integer operand from memory CC FPU condition code Env Regs Status, Mode Control and Tag Registers, Instruction Pointer and Operand Pointer 6-43 6-44 - TOS 9 - 10 9 - 10 CC setby TOS- M.WI;then PopTOS CC setby TOS- M.SI;then Pop TOS DA [mod 011 r/m] DE [mod 011 r/m] D9 FF D9 F6 FCOS Function Evaluation: Cos(x) FDECSTP Decrement Stack Pointer Decrementtop of stackpointer 4 92 - 141 9 - 10 9 - 10 CC setby TOS- M.WI CC setby TOS- M.SI DA [mod 010 r/m] DE [mod 010 r/m] COS(TOS) 4 4 4 4 CC setby TOS- ST(n);then Pop TOS CC setby TOS- M.DR;then PopTOS CC setby TOS- M.SR; then PopTOS CC setby TOS- ST(1); then PopTOS and ST(1) D8 [1101 1 n] DC [mod 011 r/m] D8 [mod 011 r/m] DE D9 TOS <------- 4 4 4 5 3 2 8 - 14 8 - 14 4-9 4-9 4-9 4-9 4-9 92 - 108 2 CC setby TOS- ST(n) CC setby TOS- M.DR CC setby TOS- M.SR Wait then Clear Exceptions Clear Exceptions TOS <------- ST(n)+ TOS TOS + ST(n) TOS + M.DR TOS + M.SR ST(n) + TOS; then Pop TOS TOS + M.SI TOS + M.WI 2TOS-1 | TOS| CLOCK COUNT D8 [1101 0 n] DC [mod 010 r/m] D8 [mod 010 r/m] (9B)DB E2 DB E2 FCLEX Clear Exceptions FNCLEX Clear Exceptions <------<------<------<------<------- TOS <------TOS <------- ST(n) TOS TOS TOS ST(n) TOS <------TOS <-------- OPERATION FCOM Floating Point Compare 80-bit Register 64-bit Real 32-bit Real FCOMP Floating Point Compare, Pop 80-bit Register 64-bit Real 32-bit Real FCOMPP Floating Point Compare, Pop Two Stack Elements FICOM Floating Point Compare 32-bit integer 16-bit integer FICOMP Floating Point Compare 32-bit integer 16-bit integer D9 E0 FCHS Floating Change Sign [1100 0 n] [1100 0 n] [mod 000 r/m] [mod 000 r/m] [1100 0 n] DA [mod 000 r/m] DE [mod 000 r/m] DC D8 DC D8 DE D9 F0 D9 E1 OP CODE FADD Floating Point Add Top of Stack 80-bit Register 64-bit Real 32-bit Real FADDP Floating Point Add, Pop FIADD Floating Point Integer Add 32-bit integer 16-bit integer F2XM1 Function Evaluation 2 x-1 FABS Floating Absolute Value FPU INSTRUCTI ON Table 6.22 6x86 FPU Instruction Set Summary See Note1 See Note2 NOTES ST6x86 D9 E8 D9 [mod 101 r/m] D9 [mod 100 r/m] FLD1 Load Floating Const.= 1.0 FLDCW Load FPU Mode Control Register FLDENV Load FPU Environment [1100 0 n] [mod 000 r/m] [mod 000 r/m] [mod 100 r/m] DF [mod 101 r/m] DB [mod 000 r/m] DF [mod 000 r/m] D9 DD D9 DF FLD Load Data to FPU Reg. Top of Stack 64-bit Real 32-bit Real FBLD Load Packed BCD Data to FPU Reg. FILD Load Integer Data to FPU Reg. 64-bit Integer 32-bit Integer 16-bit Integer D9 F7 (9B)DB E3 DB E3 FINCSTP Increment Stack Pointer FINIT Initialize FPU FNINIT Initialize FPU Empty M.SI/ TOS M.WI/ TOS TOS / M.SI TOS / M.WI Ctl Word <-------Env Regs <-------- Memory Memory Push 1.0 onto stack Push M.LI onto stack Push M.SI onto stack Push M.WI onto stack Push ST(n) onto stack Push M.DR onto stack Push M.SR onto stack Push M.BCD onto stack Incrementtop of stackpointer Wait then initialize Initialize TAG(n) <-------- TOS <-------TOS <-------- DA [mod 111 r/m] DE [mod 111 r/m] DD [1100 0 n] TOS <-------TOS <-------- DA [mod 110 r/m] DE [mod 110 r/m] 4 30 4 4 -8 4 -6 3 -6 2 2 2 41 - 45 2 8 6 3 34 - 38 33 - 38 34 - 38 33 - 38 24 - 34 TOS / ST(n); then Pop TOS <-------- 24 - 34 24 - 34 24 - 34 24 - 34 ST(n)/ TOS TOS / ST(n) M.DR/ TOS M.SR/ TOS <-------<-------<-------<-------- TOS ST(n) TOS TOS ST(n) [11110 n] [11111 n] [mod 111 r/m] [mod 111 r/m] DC D8 DC D8 24 - 34 24 - 34 24 - 34 24 - 34 24 - 34 ST(n)/ TOS TOS / ST(n) TOS / M.DR TOS / M.SR ST(n)/ TOS;then Pop TOS <-------<-------<-------<-------<-------- CLOCK COUNT ST(n) TOS TOS TOS ST(n) OPERATION DE [11110 n] [11111 n] [11110 n] [mod 110 r/m] [mod 110 r/m] [11111 n] DC D8 DC D8 DE OP CODE FFREE Free Floating Point Register FDIVRP Floating Point Divide Reversed , Pop FIDIV Floating Point Integer Divide 32-bit Integer 16-bit Integer FIDIVR Floating Point Integer Divide Reversed 32-bit Integer 16-bit Integer FDIV Floating Point Divide Top of Stack 80-bit Register 64-bit Real 32-bit Real FDIVP Floating Point Divide, Pop FDIVR Floating Point Divide Reversed Top of Stack 80-bit Register 64-bit Real 32-bit Real FPU INSTRUCTION Table 6.22 6x86 FPU Instruction Set Summary NOTES ST6x86 6-45 6-46 D9 D0 D9F3 D9F8 D9F5 D9F2 D9FC DD [mod 100 r/m] (9B)DD[mod 110 r/m] DD [mod 110 r/m] D9 FD D9 FE D9 FB D9 FA FNOP No Operation FPATAN Function Eval: Tan-1(y/x) FPREM Floating Point Remainder FPREM1 Floating Point Remainder IEEE FPTAN Function Eval: Tan(x) FRNDINT Round to Integer FRSTOR Load FPU Environment and Reg. FSAVE Save FPU Environment and Reg FNSAVE Save FPU Environment and Reg FSCALE Floating Multiply by 2n FSIN Function Evaluation: Sin(x) FSINCOS Function Eval.: Sin(x)& Cos(x) FSQRT Floating Point Square Root ST(n)x TOS TOSx ST(n) TOS x M.DR TOSx M.SR ST(n)x TOS; then Pop TOS TOSx M.SI TOSx M.WI <------<------<------<------- Rem[TOS/ST(1)] Rem[TOS/ ST(1)] TAN(TOS);thenPush1.0 Round(TOS) ATAN[ST(1)/ TOS]; then Pop TOS TOS; SIN(temp); then TOSx 2(ST(1)) SIN(TOS) TOS <------- SquareRoot of TOS Push COS(temp)onto stack temp <------TOS <------- TOS <------TOS <------- Restore state. Wait thensave state. Save state. TOS TOS TOS TOS ST(1) <------- No Operation TOS <------TOS <------- <------<------<------<------<------- Log 2(e) onto stack Log2(10) onto stack Log 10(2) onto stack Log e(2) onto stack onto stack 0.0 onto stack DA[mod 001 r/m] DE[mod 001 r/m] Push Push Push Push Push Push ST(n) TOS TOS TOS ST(n) EA E9 EC ED EB EE OPERATION DC[1100 1 n] D8[1100 1 n] DC[mod 001 r/m] D8[mod 001 r/m] DE[1100 1 n] D9 D9 D9 D9 D9 D9 OP CODE FMUL Floating Point Multiply Top of Stack 80-bit Register 64-bit Real 32-bit Real FMULP Floating Point Multiply & Pop FIMUL Floating Point Integer Multiply 32-bit Integer 16-bit Integer FLDL2E Load Floating Const.= Log 2(e) FLDL2T Load Floating Const.= Log 2(10) FLDLG2 Load Floating Const.= Log 10(2) FLDLN2 Load Floating Const.= Ln(2) FLDPI Load Floating Const.= p FLDZ Load Floating Const.= 0.0 FPU INSTRUCTION Table 6.22 6x86 FPU Instruction Set Summary 59 - 60 145- 161 7 - 14 76 - 140 56 - 72 57 - 67 55 - 65 97 - 161 82 - 91 82 - 91 117- 129 10 - 20 2 9 - 11 8 - 10 4-9 4-9 4-8 4-6 4-9 4 4 4 4 4 4 CLOCK COUNT See Note1 See Note1 See Note1 See Note3 NOTES ST6x86 [1110 1 n] [1110 0 n] [mod 100 r/m] [mod 100 r/m] [1110 1 n] DC D8 DC D8 DE FSUB Floating Point Subtract Top of Stack 80-bit Register 64-bit Real 32-bit Real FSUBP Floating Point Subtract, Pop ST(n) TOS TOS TOS ST(n) <-------<-------<-------<-------<-------- ST(n)- TOS TOS - ST(n) TOS - M.DR TOS - M.SR ST(n)- TOS;thenPop TOS 4 -9 4 -9 4 -9 4 -9 4 -9 5 3 14 - 24 12 - 22 6 4 4 2 (9B)D9[mod 111 r/m] D9 [mod 111 r/m] (9B)D9[mod 110 r/m] D9 [mod 110 r/m] (9B)DD[mod 111 r/m] DD [mod 111 r/m] (9B)DF E0 DF E0 FSTCW Store FPU Mode Control Register FNSTCW Store FPU Mode Control Register FSTENV Store FPU Environment FNSTENV Store FPU Environment FSTSW Store FPU Status Register FNSTSW Store FPU Status Register FSTSW AX Store FPU Status Register to AX FNSTSW AX Store FPU Status Register to AX Control Mode Register Control Mode Register Env. Registers Env. Registers Status Register Status Register Status Register Status Register 10 - 13 8 - 13 7 - 10 TOS; then PopTOS TOS; then PopTOS TOS; then PopTOS M.LI <-------M.SI <-------M.WI <-------- DF[mod 111 r/m] DB[mod 011 r/m] DF[mod 011 r/m] <-------<-------<-------<-------<-------<-------<-------<-------- 8 - 13 7 - 10 TOS TOS M.SI <-------M.WI <-------- DB[mod 010 r/m] DF[mod 010 r/m] Wait Memory Memory Wait Memory Memory Wait Memory Memory Wait AX AX 2 2 2 2 57 - 63 TOS; TOS; TOS; TOS; TOS; <-------<-------<-------<-------<-------- ST(n) M.XR M.DR M.SR M.BCD DB[1101 1 n] DB[mod 111 r/m] DD[mod 011 r/m] D9[mod 011 r/m] DF[mod 110 r/m] then PopTOS then PopTOS then PopTOS then PopTOS then PopTOS 2 2 2 2 TOS TOS TOS TOS <-------<-------<-------<-------- CLOCK COUNT ST(n) M.XR M.DR M.SR OPERATION DD[1101 0 n] DB[mod 111 r/m] DD[mod 010 r/m] D9[mod 010 r/m] OP CODE FST Store FPU Register Top of Stack 80-bit Real 64-bit Real 32-bit Real FSTP Store FPU Register, Pop Top of Stack 80-bit Real 64-bit Real 32-bit Real FBSTP Store BCD Data, Pop FIST Store Integer FPU Register 32-bit Integer 16-bit Integer FISTP Store Integer FPU Register, Pop 64-bit Integer 32-bit Integer 16-bit Integer FPU INSTRUCTION Table 6.22 6x86 FPU Instruction Set Summary NOTES ST6x86 6-47 6-48 temp <------TOS <------- D9 [1100 1 n] D9 F4 FXCH Exchange Register with TOS D9F1 D9F9 FLY2X Function Eval. y x Log2(x) FLY2XP1 Function Eval. y x Log2(x+1) FXTRACT Extract Exponent TOS <-------> ST(n) Exchange D9 E5 FXAM Report Class of Operand ST(1) <------- ST(1) <------- ST(1) x Log 2(TOS); then Pop TOS ST(1) x Log 2(1+TOS); then Pop TOS TOS; exponent(temp); then Push significant (temp) ontostack CC <-------- Class of TOS Waitfor FPU notbusy 9B CC setby TOS- 0.0 CC setby TOS- ST(n) CC setby TOS- ST(n); then Pop TOS CC set by TOS - ST(I); then Pop TOS and ST(1) FWAIT Wait E4 [1110 0 n] [1110 1 n] E9 D9 DD DD DA 131- 133 145- 154 11 - 16 3 4 2 4 4 4 4 14 - 29 14 - 27 TOS <------- M.SI- TOS TOS <-------- M.WI- TOS 4-9 4-9 4-9 4-9 4-9 DA [mod 101 r/m] DE [mod 101 r/m] ST(n)- TOS TOS - ST(n) M.DR- TOS M.SR- TOS TOS - ST(n);then PopTOS 14 - 29 14 - 27 <------<------<------<------<------- TOS - M.SI TOS - M.WI TOS ST(n) TOS TOS ST(n) CLOCK COUNT TOS <------TOS <------- [1110 0 n] [1110 1 n] [mod 101 r/m] [mod 101 r/m] [1110 0 n] OPERATION DA [mod 100 r/m] DE [mod 100 r/m] DC D8 DC D8 DE OP CODE FTST Test Top of Stack FUCOM Unordered Compare FUCOMP Unordered Compare, Pop FUCOMPP Unordered Compare, Pop two elements FSUBR Floating Point Subtract Reverse Top of Stack 80-bit Register 64-bit Real 32-bit Real FSUBRP Floating Point Subtract Reverse , Pop FISUB Floating Point Integer Subtract 32-bit Integer 16-bit Integer FISUBR Floating Point Integer Subtract Reverse 32-bit Integer Reversed 16-bit Integer Reversed FPU INSTRUCTION Table 6.22 6x86 FPU Instruction Set Summary See Note4 NOTES ST6x86 Note 5: The following opcodes are reserved by Cyrix: D9D7, D9E2, D9E7, DDFC, DED8, DEDA, DEDC, DEDD, DEDE, DFFC. If a reserved opcode is executed, and unpredictable results may occur (exceptions are not generated). Note 4: For FYL2XP1, clock count is 170 if TOS is out of range and regular FYL2X is called. Note 3: For FPATAN,clock count is 97 if ST(1)/TOS < /32. Note 2: For F2XM1, clock count is 92 if absolute value of TOS < 0.5. For FSIN, clock count is 81 to 82 if absolute value of TOS < /4. For FCOS, clock count is 141 if TOS < /4 and clock count is 92 if /4 < TOS > /2. Note 1: For FCOS, FSIN, FSINCOS and FPTAN, time shown is for absolute value of TOS < 3/4. Add 90 clock counts for argument reduction if outside this range. A Push to the stack decrements the top of stack pointer. A Pop from the stack increments the top of stack pointer. Values Popped off the stack are discarded. All references to TOS and ST(n) refer to stack layout prior to execution. FPU Instruction Summary Notes ST6x86 6-49 ST6x86 6-50 ST6x86 Ordering Information Example: SGS Thomson Prefix Device Name ST 6X86 P120+ H Speed Equalizer P120+=100MHz(internal) P133+=110MHz(internal) P150+=120Mhz(internal) P166+=133MHz(internal) P200+=150MHz(internal) S Package Type H = CPGA Package B = BGA Package P = PPGA Package S= Supply Voltage (3.52 V Commercial Grade Temperature) Please contact your nearest SGS-THOMSON sales office to confirm availability of specific valid combinations and to check on newly released combinations. A-1 ST6x86 The ST6x86 CPU part numbers are listed below. ST6x86 Part Numbers PART NUMBER FREQUENCY (MHz) NOM Vcc (V) BUS INTERNAL ST6X86P120+HS 3.52 50 100 ST6X86P133+HS 3.52 55 110 ST6X86P150+HS 3.52 60 120 ST6X86P166+HS 3.52 66 133 A-2 ST6x86 A-3 INDEX Numerics 296-Pin SPGA Package ................................. 5-1 6x86 Configuration Registers ....................... 2-23 A Abort Exception ............................................ 2-57 Absolute Maximum Ratings ............................ 4-2 AC Characteristics .......................................... 4-5 Address Bus ................................................... 3-8 Address Parity ............................ .................... 3-9 Address Space ............................................. 2-40 Application Register Set ................................. 2-4 Architecture Overview .................................... 1-1 B Back-Off Timing ............................................ 3-48 Base Field .................................................... 6-10 Branch Control ............................ .................. 1-11 Branch Prediction ......................................... 1-11 Burst Cycle Address Sequence .................... 3-33 Burst Write Cycles ........................................ 3-36 Bus Arbitration ............................ .........3-15, 3-45 Bus Cycle Control ......................................... 3-12 Bus Cycle Definitions .................................... 3-10 Bus Interface .................................................. 3-1 Bus Interface Unit ......................................... 1-15 Bus State Definition ...................................... 3-25 C Cache Coherency ......................................... 3-17 Cache Control ............................................... 3-13 Cache Control Timing ................................... 3-42 Cache Inquiry Cycles .................................... 3-49 Cache Units .................................................. 1-12 Clock Control .................................................. 3-6 Configuration Control Registers ................... 2-23 Confirmation Registers ................................. 2-11 Control Register ............................................ 2-13 Control Registers .......................................... 2-11 CPU Instruction Set Summary Table Abbreviations ...................................................... 6-12 CPU Instruction Set Summary Table Flags Table 6-13 CPUID Instruction ......................................... 6-11 Current Priviledge Level (CPL) ..................... 2-71 D d Field ............................................................. 6-4 Data Bus ......................................................... 3-9 Data Bypassing ............................................ 1-10 Data Dependency Solutions .......................... 1-4 Data Forwarding ............................................ 1-7 Data Parity ..................................................... 3-9 Data Registers ............................................... 2-5 DC Characteristics ......................................... 4-4 Debug Registers ................................. 2-11, 2-37 Descriptor Priviledge Level (DPL) ................ 2-71 Descriptor Table Registers ................. 2-11, 2-15 Descriptors ................................................... 2-16 E eee Field ........................................................ 6-5 Electrical Connections ................................... 4-1 Entering and Leaving V86 Mode .................. 2-75 Error Codes .................................................. 2-62 EWBE# Timing ............................................. 3-44 Exceptions Abort .................................................... 2-57 Fault .............. .......................................2-57 Trap .............. .......................................2-56 Exceptions in Real Mode ............................. 2-61 Exclusive Instructions .................................... 1-4 F Fault Exception ............................................ 2-57 Fields Base .............. .......................................6-10 d ............................................................. 6-4 eee ......................................................... 6-5 index ...................................................... 6-9 mod and r/m ........................................... 6-6 sreg2 ...................................................... 6-8 sreg3 ...................................................... 6-8 ss ........................................................... 6-9 w ............................................................ 6-4 Flag Register .................................................. 2-4 Flags ..................... .......................................6-13 Flags Register ....... ......................................... 2-9 Floating Point Unit ........................................ 1-15 Floating Point Unit Operations ........ .............2-75 FPU Control and Status Registers ............... 2-75 FPU Error Interface ...................................... 3-18 FPU Parallel Execution ................................ 1-15 FPU Tag Word Register ............................... 2-75 Functional Blocks ........................................... 1-1 Functional Timing ......................................... 3-24 G Gate Descriptors .......................................... 2-19 Gates ........................................................... 2-73 I-1 INDEX General Instruction Fields ............................... 6-2 General Purpose Register .............................. 2-5 General Purpose Registers ............................ 2-4 Global Descriptor Table Register ................. 2-15 H HALT Initiated Suspend Mode ...................... 3-61 Hold and HLDA ............................................. 3-45 I I/O Address Space ..................... .................. 2-41 I/O Privilege Levels ..................... .................. 2-72 index Field ...................................................... 6-9 Index Register ............................ .................... 2-6 Initialization and Transition to Protected Mode .... 2-73 Inquiry Cycles Using AHOLD ....................... 3-52 Inquiry Cycles Using BOFF .......................... 3-51 Inquiry Cycles Using HOLD/HLDA ............... 3-50 Instrucion Line Cache ................................... 1-13 Instruction Clock Count Assumptions made in determining ....... 6-12 Instruction Pointer Register .....................2-4, 2-9 Instruction Set Formats .............. .................... 6-1 Instruction Set Overview ................................. 2-3 Instruction Set Summary ................................ 6-1 Instruction Set Tables ................................... 6-12 Instuction Prefix Summary ............................ 6-43 Integer Unit ..................................................... 1-2 Interrups SMM ................................... .................. 2-56 Interrupt Acknowledge Cycles ...................... 3-40 Interrupt and Exception Priorities ................. 2-59 Interrupt Control ............................................ 3-12 Interrupt Descriptor Table ............................. 2-59 Interrupt Descriptor Table Register .............. 2-15 Interrupt Handling V86 Mode ............................................. 2-74 Interrupt Vector Assignments ....................... 2-57 Interrupts ...................................................... 2-55 INTR ................................... .................. 2-56 NMI ....................................................... 2-55 Vectors ................................................. 2-57 INTR Interrupt ............................................... 2-56 Invalidating the Cache Using FLUSH# ......... 3-42 J JTAG Interface ............................................. 3-23 I-2 L Local Descriptor Table Register ................... 2-15 Lock Prefix ..................................................... 2-3 M Memory Address Space ............................... 2-40 Memory Addressing Protected Mode ................................... 2-43 Real Mode ........................................... 2-43 V86 Mode ............................................ 2-74 Memory Addressing Methods ...................... 2-41 Memory Management Unit ........................... 1-14 mod and r/m Byte ........................................... 6-6 Modes System Management ........................... 2-63 N NC and Reserved Pins .................................. 4-2 NMI Interrupt ................................................ 2-55 Non-Pipelined Bus Cycles ........................... 3-28 Non-Pipelined Single Transfer Cycles ......... 3-28 Non-Piplined Burst Read Cycles .................. 3-31 O Offset Address Calculation .......................... 2-42 Offset Mechanism ........................................ 2-42 Operand forwarding ....................................... 1-7 Opicode Byte ................................................. 6-4 Optional Prefix Byte ....................................... 6-2 Out-of-Order Processing ................................ 1-3 P Pins NC and Reserved .................................. 4-2 Unused Input ......................................... 4-2 Pipeline Selection .......................................... 1-4 Pipeline Stages Execute .................................................. 1-3 Instruction Fetch .................................... 1-3 Write-Back ............................................. 1-3 Pipelined Back-to-Back Read/Write Cycles . 3-39 Pipelined Bus Cycles ................................... 3-37 Pipleine Stages Address Calculation ............................... 1-3 Pointer Register ............................................. 2-6 Power and Ground Connections and Decoupling 4-1 Power Management Interface ............. 3-21, 3-60 Priviledge Levels Current ................................................. 2-71 INDEX Descriptor ............................................. 2-71 Requested ............................................ 2-71 Privilege Level Transfers .............................. 2-72 Processor Initialization .................................... 2-1 Programming Interface ................................... 2-1 Protected Mode Memory Addressing ........... 2-43 Protection ................................... .................. 2-71 V86 Mode ............................................. 2-74 Pull-Up/Pull-Down Resisters .......................... 4-1 R RAW Dependency Example ....... .................... 1-8 Real Mode Memory Addressing ................... 2-43 Recommended Operating Conditions ............ 4-3 Register Renaming ..................... .................... 1-4 Register Sets .................................................. 2-4 Registers 6x86 Configuration ............................... 2-23 Configuration Control ........................... 2-23 Confirmation ......................................... 2-11 Control .........................................2-11, 2-13 Data ........................................................ 2-5 Debug ..........................................2-11, 2-37 Descriptor Table .......................... 2-11, 2-15 Flag ........................................................ 2-4 Flags ................................... .................... 2-9 General Purpose .............................2-4, 2-5 Global Descriptor Table ........................ 2-15 Index ................................... .................... 2-6 Instruction Pointer .............. .............2-4, 2-9 Interrupt Descriptor Table ..................... 2-15 Local Descriptor Table Register ........... 2-15 Pointer .................................................... 2-6 Segment ..........................................2-4, 2-7 Task .............................................2-11, 2-20 Test .............................................2-11, 2-39 Requested Priviledge Level (RPL) ............... 2-71 Requested Privilege Level (RPL) ................... 2-8 Reset Control .................................................. 3-6 Reset Timing ................................................ 3-24 Result Forwarding .......................................... 1-7 S s Field ............................................................. 6-5 Scatter/Gather Buffer Interface ............ 3-18, 3-55 Section 2.6.4 (Page 45) .............. .................... 2-7 Segment Register ........................................... 2-7 Segment Registers ..................... .................... 2-4 Segment Selector ........................................... 2-7 Selector Mechanism ..................................... 2-44 Shutdown and Halt ....................................... 2-69 Signal Description Table ................................ 3-2 Signal Descriptions ........................................ 3-6 Signals INTR .................................................... 2-56 NMI ...................................................... 2-55 SMM .................................................... 2-56 SMI Service Routine Execution ................... 2-68 SMI# Interrupt Timing .................................. 3-41 SMM Interrupt ....... ....................................... 2-56 SMM Memory Space ................................... 2-68 SMM Memory Space Header ....................... 2-65 SMM Operation ............................................ 2-64 Speculative Execution .................................. 1-12 sreg2 Field ..................................................... 6-8 sreg3 Field ..................................................... 6-8 ss Field ........................................................... 6-9 Stopping the Input Clock .............................. 3-62 SUSP# Initiated Suspend Mode .................. 3-60 System Management Mode ......................... 2-63 System Register Set .................................... 2-11 T Table Indicator (TI) ......................................... 2-8 Task Register ...................................... 2-11, 2-20 Test Registers ....... .............................. 2-11, 2-39 Thermal Characteristics ................................. 5-6 Traditional Paging Mechanism ........ .............1-15 Trap Exception ............................................. 2-56 U Unified Cache ....... ....................................... 1-12 Unused Input Pins .......................................... 4-2 V V86 Interrupt Handling ................................. 2-74 V86 Memory Addressing .............................. 2-74 V86 Protection ............................................. 2-74 Variable-Size Paging Mechanism ................ 1-14 Vector Interrupt ............................................ 2-57 Virtual 8086 Mode ........................................ 2-74 W w Field ............................................................ 6-4 WAR Dependency Example .......................... 1-5 WAW Dependency Example .......................... 1-6 WAW Dependency Example 1-6 I-3 ST6x86 I-4 ST6x86 Information furnished is believed to be accurate and reliable. However, SGS-THOMSON Microelectronics assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent or patent rights of SGS-THOMSON Microelectronics. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. SGS-THOMSON Microelectronics products are not authorized for use as critical components in life support devices or systems without express written approval of SGS-THOMSO N Microelectronics. 1996 SGS-TH OMSON Microelectronics - Printed In Italy - All rights reserved. SGS-THOMSON Microelectronics GROUP OF COMPANIES Australia - Brazil - Canada - China - France - Germany - Hong Kong - Italy - Japan - Korea - Malaysia - Malta - Morocco The Netherlands - Singapore - Spain - Sweden - Switzerland - Taiwan - Thailand - United Kingdom - U.S.A.