Chip123 科技應用創新平台

標題: The Xbox 360 CPU architecture [打印本頁]

作者: masonchung    時間: 2007-2-9 10:39 AM
標題: The Xbox 360 CPU architecture
The Xbox 360 CPU architecture
$ `9 M) p* k: X' y) L% s. N
2 e0 G' R7 o+ O% WThe Xbox 360 system has a single chip (with 165 million transistors) for its CPU. This chip is in fact a three-way symmetric multiprocessor design. The three PowerPC cores are identical, except that they are physically reflected through the X and Y axis. Each of the CPU cores is a specialized PowerPC chip with a VMX128 extension related to (and partially compatible with) the VMX instructions in the G4 and G5 CPUs. The three CPU cores share a 1MB Level2 cache. Each processor has 32KB each of data and instruction Level1 cache. The chip's front-side bus/physical interface has a 21.6GB/second bandwidth, and runs at 5.4GHz. The high frequency clocks are generated on-chip by four phase-locked loops: two for the core clocks, two for the PHY clock. : u$ \2 a( |  p
9 x5 {  A; Y2 ?: x, x
The Xbox 360 CPU chip has testing and debug functions, including tracing, configuration control, and performance monitoring features. Access to these functions is through the block in Figure 1 labeled test/debug. The block labeled Miscellaneous IO provides a JTAG port, a POST monitor, and an interface for a serial EEPROM in case patch logic configuration was needed during bring-up.
) G# Y2 y2 t/ G* Z' T0 ^7 u/ l9 M, T! j
To improve manufacturing yield, the SRAM Arrays used in the L1 and L2 caches support both row and column redundancy. This redundancy is enabled at chip test by burning electronic fuses. The eFuses are one of the unique capabilities of the IBM 90nm CMOS SOI technology the chip is fabricated in. Efuses were also used to record a unique supply voltage to be used for each chip. Finally, to help reduce the potential impact of process variations on the operation of the PHY analog circuits, eFuses were used for parametric adjustment in the analog units.0 {' D: w  H6 Y) r

0 W7 t- ~: t' g; J! VThe physical package of the chip matters, too. A crucial design goal in the CPU of a consumer electronics device is high volume with good yield and comparatively low cost. The package is a 2-2-2 FC-PBGA, measuring 31mm by 31mm.  Y3 O% e5 i- o0 y5 `

4 n$ B2 ?* U! p; CThe CPU core examined
- S# p2 K1 J4 J! H8 j% l$ C% b8 |( Z$ o* f$ f$ c( w. D  J
The CPU cores (there are three) are the highest frequency PowerPC cores currently available, running at 3.2GHz. Throughout, the CPU uses extensive clock gating, leaving pipelines shut down until there are instructions to be processed; this dramatically reduces power consumption under real-world loads. The basic design is a 64-bit PowerPC architecture, with the complete PowerPC ISA available.
; z. P2 V9 U3 \& ]6 }' N. h1 U
) n6 R$ B- g  f/ ]  |$ P, e/ ~/ x  ?$ m4 ]. G1 j( g
The instruction unit is multithreaded, with two simultaneous threads. The instruction cache is 32KB. The core implements a two-issue, in-order execution microarchitecture. This means two instructions are issued at a time but execution within the units is in sequential order. Execution is delayed to cover the load use penalty without stalling the pipeline. 4 S/ q8 W" O0 ^) y# m

' A+ I$ Z6 y( c4 w; gThe L1 instruction cache (Icache) is a 32K Byte cache with parity error checking. It is two-way set associative cache with 128B lines. First-level translation for instruction addresses is done using a 64-entry, two-way set associative effective to real address translation cache.
9 d5 l7 V8 J/ K! T& Q1 y: ?0 v; ?" ^; j8 i
The two issued instructions can go to one of five execution pipes: Branch (which is really part of the instruction unit), Load/Store , Fixed Point, Floating Point, and VMX. Difficult instructions are implemented through microcode. At dispatch they are cracked and converted into multiple micro-ops.( z7 F  Z8 y4 t
- h3 C$ k2 R6 N. N8 t
The branch unit includes a 4KB two-way set-associative Branch History Table per thread.
7 Q& J3 e2 M% `/ |1 K% S9 }9 [; \6 z' j! Q  H: r$ ~
The Fixed Point pipe actually has two units: one to handle the simple operations like (add/sub, cmp, logical ops, and rotate); and one to handle the complex operations like multiply/divide.8 J: A  w6 X5 T

& k3 F+ M+ U- \% [, A. tThe Load/Store pipe handles access to the L1 Data cache and the storage hierarchy. Like the L1 Icache, the L1 Dcache is a 32KByte cache with parity error checking. However, it is four-way set associative. It is "store through" and provides non-blocking access so a cache miss does not hold up a subsequent hit.6 z% e! G9 U9 s8 I5 z, R7 J
! \: S( x3 G& X. a  E/ Q% I( B
A 64 entry two-way associative ERAT handles first-level data address translation. Second-level translation for both data and instructions is handled by a 1K entry four-way associative TLB (translation lookaside buffer) which can be software as well as hardware-managed.
) s& K) v% d; V" [
/ i% F, m8 ~4 l/ z& t, Y- D
8 K) w" v  `! y2 G  M8 B- X$ M7 T9 q- ?3 R: `# W
http://www-128.ibm.com/developerworks/power/library/pa-fpfxbox/?ca=dgr-lnxw09XBoxDesign
; D; `" i  y. r  p6 m
% S- ^0 R; t+ L, N: J3 }7 q, z2 y[ 本帖最後由 masonchung 於 2007-2-9 10:41 AM 編輯 ]
作者: armmips    時間: 2008-9-7 01:30 AM
感謝分享,這是很有用的資料。0 t( O0 o" T3 M' }- k, F* X8 N
只是3+1式的多處理器架構,會不會有編程上的困難?




歡迎光臨 Chip123 科技應用創新平台 (http://www.chip123.com/) Powered by Discuz! X3.2