# MICROPROCESSOR B www.MPRonline.com THE INSIDER'S GUIDE TO MICROPROCESSOR HARDWARE

# VIA'S SPEEDY ISAIAH

New x86 Design Strikes a Different Balance of Power and Performance

By Tom R. Halfhill {3/10/08-01}

.....

Glenn Henry feels vindicated. For years, the straight-talking founder of Centaur Technology waged a lonely campaign against clock-frequency pyrotechnics and profligate power consumption in x86 microprocessors. His low-power x86 chips with simple pipelines and

small caches were like throwbacks to an earlier era. Meanwhile, Intel and AMD dueled each other with superscalar pipelining, speculative execution, out-of-order execution, ever-deeper superpipelines, and steroidal on-chip caches.

Then, a few years ago, Intel and AMD saw the light—or rather, felt the heat. As clock speeds exceeded 3.0GHz and power surpassed 100W, rising heat dissipation became unsustainable in air-cooled PCs and densely packed servers. The power problem forced both leading x86 vendors to radically change course. They turned to slower, smaller, cooler-running cores. They began pursuing higher performance by integrating multiple cores per chip.

Unfortunately for Centaur—an Austin-based subsidiary of Taiwan's VIA Technologies—Henry's foresight hasn't paid off in much market share. VIA's existing x86 processors enjoy some popularity in developing countries, low-end PCs, and embedded systems, but they have captured less than 2% of the worldwide x86 market. One obstacle is performance. Centaur processors are admirable power misers, but even the latest C7-family chips are slower than Intel's slowest Core 2 processors. Among other things, they are limited by an inorder uniscalar pipeline and a stripped-down FPU. (See *MPR 6/13/05-02*, "VIA's C7 Keeps Its Cool.")

In an ironic turn of events, Henry is now chasing higher performance by adopting some of the very techniques he once railed against. VIA's new Isaiah microarchitecture is a cleanslate x86-compatible design with superscalar pipelining, outof-order instruction processing, speculative execution, multilevel dynamic branch prediction, larger on-chip caches, and one of the fastest FPUs in the industry. In addition, Isaiah is VIA's first 64-bit x86 processor. Henry previewed Isaiah (then known as Centaur CN) in 2004, but it's only now sampling in silicon and is scheduled to debut later this year. (See *MPR* 10/5/04-02, "Centaur CN Is Super(scalar) 64 Bits.")

# Intel's New Threat

Has Glenn Henry sold out? Not exactly. Somehow, his intrepid design team managed to cram all those new features into a relatively small chip that doubles or quadruples the C7's instructions-per-cycle throughput while holding power consumption at about the same level. Specifically, the initial implementation of Isaiah will consume 3.5W at 1.0GHz, or 6.0W at 1.5GHz. It can run as fast as 2.0GHz, but power consumption soars to 16W. Note that these numbers represent Isaiah's worst-case thermal design power (TDP) with thermal-protection and thermal-management features disabled. VIA's TDPs are more stringent than vague "typical" power estimates and other TDPs measured under aggressive thermal management.

VIA says future versions of Isaiah will consume less power than the initial version, which is intended for higherperformance systems within its low-power, low-cost domain. (VIA has demonstrated an Isaiah processor running at 1.2GHz in a fanless PC that can play high-definition video at 720p resolution.) Future versions of Isaiah will include low-voltage (LV) and ultralow-voltage (ULV) devices. With the ULV part, VIA is aiming at 5.0W worst-case TDP at 1.2GHz. In comparison, the ULV C7-M processor—VIA's lowest-power C7-family chip—dissipates 3.5W TDP at 1.0GHz, or 7.5W TDP at 1.5GHz. (The 1.0GHz ULV part requires only 796mV.) If Isaiah can actually deliver two to four times as much throughput at the same clock speed, its power/performance ratio easily beats the C7-M's.

Isaiah still isn't as fast as the leading x86 cores from Intel and AMD. However, Isaiah's low power consumption, improved throughput, and small die size make it an attractive choice for affordable "green" PCs, ultramobile PCs (UMPC), and mininotebooks running desktop operating systems. It runs cool enough for fanless designs. It's pin- and socket-compatible with its predecessor, making it an ideal performance upgrade for systems currently using the C7-M. (For those who don't need Isaiah's higher performance, VIA will continue selling its existing processors.)

Isaiah would seem to significantly strengthen VIA's position in the x86 market. But Henry has a fresh thorn in his side. Intel's new Silverthorne processor—now branded "Atom"—is already sampling to early customers. Atom has the potential to radically change the x86 landscape. Intel says Atom's TDP is a mere 0.6W at 1.0GHz, rising to only 2.0W at 2.0GHz. Even the future ULV version of Isaiah won't match those impressive low-power numbers.

However, VIA says that Isaiah isn't intended to compete directly with Atom in the lowest-power x86 realm. Isaiah's saving grace may be higher throughput, which positions it somewhere between Atom and Intel's lowest-end processors based on the Core 2 microarchitecture. Atom isn't based on Core 2. Ironically, Atom slashes power consumption partly by reverting to a simpler x86 microarchitecture with narrower superscalar pipelining and in-order execution. Atom has more in common with the C7-M than it does with Core 2. In fact, Atom is Intel's simplest x86 design since the original Pentium in 1993. Moving in opposite directions, the Isaiah and Atom design teams have quietly passed each other like ships in the night.

Intel revealed the first technical details about Atom at the recent International Solid-State Circuits Conference (ISSCC) in San Francisco. *Microprocessor Report* is meeting with Intel's design team and will cover Atom in detail soon. Meanwhile, this article analyzes VIA's different approach to balancing throughput and power consumption in low-end x86 microprocessors.

# **Balancing Power and Performance**

For Henry, designing an x86 processor is all about performance per watt. Specifically, his processors optimize for instructions per cycle per watt. In contrast, most AMD and Intel x86 processors strive to deliver the most throughput within the maximum power envelopes permitted by their target platforms—whether those platforms are servers, desktop PCs, or notebooks. Lately, there has been a trend toward lower power envelopes, mainly to reduce the cooling burdens and electric bills of data centers. Even in the context of this trend, Atom is a major departure for Intel. It compromises throughput to cut power consumption far below Intel's other x86 processors. Intel's goal is to make the x86 more suitable for ultramobile devices and embedded systems.

Henry has been espousing a similar philosophy since founding Centaur Technology in 1997. The trick is finding the best balance of throughput and power consumption. Most techniques for improving throughput in microprocessors have been widely practiced since the 1990s, but all require adding logic. More logic means more transistors, and more transistors burn more power—especially when they are manufactured in today's deep-submicron fabrication processes, where current leakage has become a major problem. Another challenge is the 30-year-old x86 architecture, which has maddening complexities that work against efficiency.

For those well versed in microprocessor design, most of the techniques that Isaiah uses to improve performance over the Centaur C7 won't seem novel. Indeed, Intel's x86 processors have been using many of the same techniques since 1995, and the techniques weren't new even then. The novelty is that Henry's engineering team managed to implement these techniques while holding the line on power, relative to the C7.

One big step was moving from a 32-bit x86 architecture to 64 bits. Inflating the logic was unavoidable, because x86-64 has additional registers, wider datapaths, larger memory addressing, and other extensions. (See *MPR 3/29/04-01*, "AMD and Intel Harmonize on 64.") The 64-bit architecture isn't strictly necessary for VIA's target markets at this time, but VIA is looking forward. The Centaur 32-bit microarchitecture is 11 years old, so it's possible that Isaiah will need equal longevity. Another undeniable factor is marketing. To compete with the 64-bit cores from AMD and Intel, VIA needs 64-bit cores, too. And 64-bit processing is useful for some embedded applications. One example is packet routing under the new IPv6 standard, which expands Internet Protocol addresses from 32 bits to 128 bits.

While stretching the architecture to 64 bits, VIA made other architectural improvements as well. Like the C7, Isaiah supports third-generation Streaming SIMD Extensions (SSE3), but Isaiah adds support for Supplemental SSE3 (SSE3)—16 new instructions that Intel added to Core 2. These instructions can manipulate four 32-bit operands at the same time and are particularly useful for video codecs. In addition, VIA says Isaiah will support SSE4.1 in the future. (AMD and Intel haven't yet harmonized on SSE4.1.)

To execute SSE instructions, Isaiah has two FPU/SSE units with 128-bit-wide datapaths, allowing each unit to manipulate four single-precision floats, two double-precision floats, or four 32-bit SSE integers at the same time. One FPU/SSE unit is optimized for fast floating-point multiplies, while the second unit handles other floating-point and SSE operations. VIA refers to these function units as the Media-A and Media-B units, although they also execute x87 generalpurpose floating-point instructions, not just multimedia instructions. Table 1 summarizes the performance of these two function units.

#### More New Architectural Features

Isaiah supports the new x86 virtualization extensions, making it easier to simultaneously run multiple operating systems or multiple instances of the same operating system. This feature is crucial for servers, which can virtualize multiple machines in the same box to save power and money. Virtualization isn't terribly important for VIA's target markets. However, like the x86-64 extensions, it's useful for future software compatibility and for marketing purposes. Unfortunately, Henry says, virtualization was even harder to implement than x86-64 was. ("The x86 is the world's ugliest instruction set, and now you've got to have multiple ugliest instruction sets running at the same time," he grumbles. "It's ugliness squared.")

All together, Isaiah brings VIA's x86-compatible architecture up to date with the mainstream x86 architectures from AMD and Intel. It also goes a little further. Isaiah inherits some proprietary x86 extensions from the Centaur C7 while adding a few more. These "PadLock" extensions, unique to VIA, enable greater security. Because they are nonstandard, PCs are unlikely to use them, but they are valuable for some embedded applications. Among the PadLock features inherited from the C7 are a hardware random-number generator, hardware acceleration for the Advanced Encryption Standard (AES), and SHA-1/SHA-256 secure hashing. Isaiah's newest PadLock feature is the mysterious Secure Execution Mode.

Secure Execution Mode allows a small amount of critical program code to continue running even if the operating system crashes or is compromised by malware. To make this possible, Secure Execution Mode runs beneath the heretofore lowest-level modes in the x86 (Ring 0 and System Management Mode). The secure code runs in a few kilobytes of volatile on-chip memory that is segregated from the caches and hidden from the operating system. A small amount of nonvolatile on-chip memory can preserve some secure state from one session to another. Even instruction fetching is encrypted in this mode.

VIA says it will release more information about Secure Execution Mode later, but some details will be revealed only to prospective customers under a nondisclosure agreement. In some ways, VIA's PadLock and Secure Execution Mode resemble ARM's TrustZone and IBM's SecureBlue technologies. (See *MPR 8/25/03-01*, "ARM Dons Armor," and *MPR 5/8/06-01*, "IBM Offers Chip-Level Security.") VIA is ahead of the technology curve in this regard. *MPR* believes that all microprocessors will eventually incorporate hardware acceleration for these functions, as data encryption and secure signing become universal in software.

#### **Bigger L2 Cache Fattens the Die**

Isaiah's 1MB on-chip L2 cache is eight times larger than the 128KB L2 caches found in Centaur C7-family chips. The L1 instruction and data caches remain the same size (64KB each). Although Isaiah supports a 64-bit 1.333GHz front-side bus (FSB), current VIA system chipsets limit the FSB to 800MHz (200MHz base clock, quad-pumped), providing

|                 | VIA Isaiah            | VIA Isaiah            |
|-----------------|-----------------------|-----------------------|
| Operation       | Media-A Function Unit | Media-B Function Unit |
| Most            | Latency: 1 cycle      |                       |
| SIMD Integer    | Throughput: 1 cycle   |                       |
| 32-Bit FP       | Latency: 2 cycles     |                       |
| SSE Add (4x)    | Throughput: 1 cycle   |                       |
| 64-Bit FP       | Latency: 2 cycles     |                       |
| SSE Add (2x)    | Throughput: 1 cycle   | _                     |
| 32/64/80-Bit FP | Latency: 2 cycles     |                       |
| x87 Add (1x)    | Throughput: 1 cycle   |                       |
| Most            | Latency: 1 cycle      |                       |
| Simple FP       | Throughput: 1 cycle   |                       |
| Integer Mul     | —                     | Latency: 3 cycles     |
| (2x–4x)         |                       | Throughput: 1 cycle   |
| 32-Bit FP       | —                     | Latency: 3 cycles     |
| SSE Mul (4x)    |                       | Throughput: 1 cycle   |
| 64-Bit FP       | _                     | Latency: 4 cycles     |
| SSE Mul (2x)    |                       | Throughput: 2 cycles  |
| 32/64/80-Bit FP | _                     | Latency: 4 cycles     |
| x87 Mul (1x)    |                       | Throughput: 2 cycles  |

**Table 1.** Isaiah's floating-point and media-processing performance. Two FPUs support SSE, SSE2, SSE3, and SSSE3 extensions, using 128-bit-wide datapaths. The Media-A unit handles addition, subtraction, simple floating-point operations (such as moves and compares), division, and square roots. The Media-B unit is dedicated to multiplication and fused floating-point multiply-add operations used by transcendental algorithms. In some cases, these function units have shorter latencies than SSE units in AMD and Intel processors—a first for VIA.

6.4GB/s of maximum theoretical bandwidth. VIA says future chipsets will support a 1.333GHz FSB, boosting I/O bandwidth to 10.6GB/s. (Although VIA is withdrawing from the broader chipset market for AMD and Intel processors, the company will continue making chipsets for its own Centaur processors.)

It's a little curious that Isaiah debuts with the same 800MHz FSB as its immediate predecessor. If it's true that Isaiah can process instructions two to four times faster than the C7 and isn't I/O-bound, then either the C7's I/O bandwidth was lavishly overprovisioned, or Isaiah would be I/O-bound without its much larger L2 cache. (We suspect the latter.)

For years, VIA has boasted that it has the smallest, cheapest-to-make x86 processors in the industry. Expanding the on-chip L2 cache to 1MB works against that ideal. Isaiah debuts with a 63mm<sup>2</sup> die, twice as large as the C7-M. Moreover, Isaiah debuts in a 65nm fabrication process, whereas the latest C7-M chips are still manufactured in an older 90nm process. In the same fabrication process, Isaiah would be three to four times larger than the C7-M. If their caches were the same size, Isaiah would be only about twice as large. Figure 1 is a die plot of Isaiah.

Isaiah's larger L2 cache is clearly a major factor in the chip's expansion. As the die photo makes obvious, the cache accounts for about one-third the die area. It's an efficient cache—unlike most Intel processors, Isaiah doesn't keep redundant data in the L1 and L2 caches. But even though Isaiah is still a small chip by x86 standards, its larger die probably makes it more expensive to manufacture than



**Figure 1.** VIA Isaiah die plot. The most prominent feature is the 1MB on-chip L2 cache, which is eight times larger than the biggest L2 cache in VIA's Centaur C7-family processors. New features in Isaiah also account for larger blocks of logic. At the lower right is control logic for branch prediction, x86 instruction translation, and the duplicate sets of rename registers required for out-of-order speculative execution. At the bottom is additional control logic for superscalar scheduling and the instruction-reorder buffer (ROB). Near the center is the load/store block with its memory order buffer (MOB). The FPU/SSE units are at the lower left, near the cryptography-acceleration logic. The die is 63mm<sup>2</sup>. VIA will offer two packages: NanoBGA2 (21mm x 21mm) and Mobile-BGA (11mm x 11mm).

Intel's Atom. Even if Isaiah isn't intended to compete directly with Atom, this disadvantage could alter VIA's traditional position in the marketplace. VIA will no longer make the smallest, least expensive, lowest-power x86 chips. (This discussion excludes the much slower and less capable embedded processors based on lingering 186, 286, and 386 designs.)

Intel's Atom die measures only 24.2mm<sup>2</sup>, less than half the size of Isaiah. (The C7-M die is about  $30mm^2$ .) One factor in Atom's favor is that Intel is manufacturing the chip in its smallest, most advanced 45nm fabrication process. That process could make Atom more costly to produce at first, as Intel amortizes the huge investment in its new high-*k* metalgate wizardry. But Atom can now claim bragging rights as the smallest-footprint x86, and eventually it will be cheaper to manufacture than Isaiah.

#### **Options for Better Throughput**

Two surprising features in Isaiah are superscalar pipelining and speculative out-of-order execution. Until now, Centaur processors have avoided such performance-enhancing features because of their complexity. It's not that Centaur wasn't up to the design challenge—Glenn Henry leads a talented team, despite its small size of only 20 logic engineers and 15 circuit designers. It was mainly the additional die area and power consumption that deterred them. Unfortunately, inorder processing through a uniscalar pipeline, no matter how power-efficient, leaves little room for improving throughput.

One route to greater throughput with a uniscalar pipeline is good old-fashioned clock-frequency scaling. But power consumption rises linearly with clock speed, and a faster clock doesn't improve the instructions-per-cycle efficiency. AMD and Intel have already explored that road—it ends in a meltdown. Another alternative is to add new instructions that do a better job of executing typical workloads. Isaiah already implements the AMD/Intel SSE extensions up to Supplemental SSE3, but VIA isn't influential enough to introduce additional extensions and establish them as an industry standard.

For all these reasons, superscalar pipelining was the logical next step for VIA. In terms of design complexity, out-oforder execution was an even bigger step, but it allows more flexibility with superscalar instruction scheduling. Speculative execution was another significant step, because it allows the processor to begin executing instructions at a predicted branchtarget address, then discards the speculative results if the branch isn't taken. All these features require additional transistors for duplicate registers and internal bookkeeping logic, but they help Isaiah double or quadruple its throughput over a Centaur C7-M processor running at the same clock speed.

Isaiah supports up to four-way multiprocessor systems, should an application need even higher performance. In addition, future implementations will have multiple cores. Right now, dual cores would exceed VIA's power-consumption and die-area budgets while adding little performance for the target applications. A future migration to a smaller fabrication process will make multiple cores more practical. Note that Intel's Atom is also a single-core processor, even though it will debut in Intel's state-of-the-art 45nm process, which is a full generation ahead of Isaiah's 65nm process.

Besides having only one core, the first implementation of Isaiah is single-threaded, too. Henry doesn't anticipate a future version with hardware multithreading in the fashion of Intel's Hyper-Threading. This technique mixes instructions from two different threads of execution in the pipelines at the same time, avoiding pipeline flushes and register swaps when switching contexts between those threads. The trade-off is a little more complexity, mainly in the form of duplicate register files and additional control logic. (See *MPR 9/17/01-01*, "Intel Embraces Multithreading.")

Atom supports dual Hyper-Threading, which allows the single-core processor to masquerade as a Core 2 Duo processor. Intel says Hyper-Threading improves Atom's throughput by as much as 35–45%. But Henry's view is that an out-of-order machine like Isaiah would realize much less benefit from hardware multithreading than an in-order machine like Atom does. When a multithreaded in-order pipeline stalls, it can quickly switch threads to continue working. When a single-threaded out-of-order pipeline stalls, it can speculatively execute newer instructions to continue working. In Henry's opinion, out-of-order execution tends to make hardware multithreading redundant. (Of course, out-of-order execution also makes hardware multithreading more difficult to implement.)

It's worth noting that Intel has never fully embraced hardware multithreading. After introducing dual Hyper-Threading in the Pentium 4 processor in 2003, Intel dropped the feature from some later processors and only recently began reviving it. Nevertheless, *MPR* expects hardware multi-threading to become more popular as other techniques for improving throughput reach their plateaus.

#### Juggling x86 Instructions

Describing Isaiah's superscalar capabilities is tricky, because it belongs to the club of modern x86 processors that transforms x86 instructions into digestible pieces before execution. The resulting "micro-ops" often don't correspond directly to standard x86 instructions. VIA's existing x86 processors transform x86 instructions into micro-ops, but not as extensively as Isaiah does.

Some transformation is virtually required for higher performance, because the ancient x86 architecture is so complicated. Instructions can be as short as 8 bits or as long as 120 bits—or even longer, with redundant prefixes. It's a classic CISC architecture that mixes single-operation instructions with multiple-operation instructions, including some odd string-manipulation instructions rarely found in other instruction sets. At one time, engineers feared that designing a superscalar x86 might be impossible for mere mortals.

The solution is instruction reconstruction. All highperformance x86 processors treat standard x86 instructions as merely the starting point. At run time, the processor converts the CISC instruction stream into something more RISClike for easier internal consumption. Long, complex instructions are broken into shorter, simpler operations. In other cases, two simple x86 instructions may be fused into a more efficient single operation. To the outside world of software, the processor is still x86 compatible. Internally, the core resembles the RISC architecture that the CPU architect would really rather be using.

In Isaiah, a micro-op may be a fragment of a long x86 instruction, a fusion of two short x86 instructions, or even a recombination of two fragments of an x86 instruction. VIA's previous x86 processors reduced x86 instructions to micro-ops, but Isaiah is the first Centaur processor that can fuse multiple x86 instructions into a single micro-op.

One example of a fused micro-op is a pair of compare and jump instructions. Isaiah can combine these instructions into a single micro-op, dispatch the fused operation to a single function unit, and execute the micro-op in a single clock cycle. VIA calls this technique "macro-fusion," because it combines two x86 instructions. In other cases, Isaiah may chop a complex x86 instruction into simpler micro-ops, then recombine two of those micro-ops together to form a fused micro-op that the processor can execute in parallel in two different function units. VIA calls this technique "micro-fusion."

### A Healthy Bulge In the Pipeline

To support superscalar execution, the initial Isaiah implementation has seven function units. Two are the FPU/SSE units (Media-A and Media-B) described above. The others are two ALUs, a load unit, a store-data unit, and a store-address unit. The division of labor for load/store operations is common in modern x86 designs. It allows the processor to simultaneously execute I/O operations while separately computing the memory address for a store.

At the front of the pipeline, Isaiah can translate three x86 instructions into micro-ops per clock cycle. In the middle of the pipeline, Isaiah can dispatch a micro-op to each of the seven function units per cycle. At the end of the pipeline, Isaiah can retire three micro-ops per cycle. In the best case, the three retired micro-ops might correspond to three x86 instructions. Therefore, it's legitimate to describe Isaiah as a three-way superscalar machine.

Notice the bulge in the middle of the pipeline—seven function units. They can simultaneously execute seven micro-ops. (Actually, if a divide or square-root operation is executing in the primary FPU/SSE pipeline while another floating-point operation is dispatched there, Isaiah can simultaneously execute eight micro-ops.) It's common for superscalar processors to have surplus execution bandwidth in the middle of their pipelines, because it increases the odds that the processor will, on average, retire more than one result per clock cycle. In no case, however, can Isaiah retire more than three results per cycle. All things considered, *MPR* regards Isaiah as a three-way superscalar processor. Figure 2 is a block diagram of Isaiah.

To relieve a bottleneck at the front of the pipeline, Isaiah has multiple x86 instruction decoders. Some x86 processors can decode only one complex x86 instruction per clock cycle. Decoding additional instructions in the same cycle requires those instructions to be of a simpler variety. Isaiah has no such limitation. It can decode three x86 instructions per cycle, and all three may be complex instructions. The only restriction is that all three must start within the same 16-byte memory region. Some complex x86 instructions are as long as 15 bytes, so occasionally this restriction prevents Isaiah from achieving its ideal of decoding three x86 instructions per cycle.

#### **Better Branch Prediction**

In microprocessor design, complexity begets complexity. All the new superscalar plumbing in Isaiah worsens the penalty for mispredicting a branch, because the processor must flush more pipelines and discard speculative results. Therefore, Isaiah's designers needed to improve the branch-prediction logic over the predictor in the Centaur C7.

Isaiah predicts branches at two stages—when fetching x86 instructions from the instruction cache, and when



**Figure 2.** Isaiah block diagram. Although it's basically a three-way superscalar processor, Isaiah can execute as many as seven micro-ops simultaneously. Seven function units include two ALUs, two FPU/SSE units (labeled Media-A and Media-B), a load unit, a store-data unit, and a store-address unit. The 64-bit front-side bus (FSB) supports an effective data rate of 1.333GHz, but current VIA chipsets restrict the effective frequency to 800MHz. Until VIA introduces a faster chipset, Isaiah's maximum theoretical bandwidth is 6.4GB/s. (The small block near the data cache labeled VSM is related to VIA's mysterious Secure Execution Mode. VIA is secretive about this block, but our guess is that VSM stands for "volatile state memory" or "volatile secure memory.")

.....

translating those instructions into micro-ops. The first branch predictor is by far the most sophisticated. It makes multiple predictions at multiple levels before integrating the answers into a single prediction. The second branch predictor is a safety net that catches any branches missed by the first predictor and occasionally overrides the first prediction.

Figure 3 illustrates the primary branch-prediction logic. It has two mechanisms—one for predicting branchtarget addresses and another for predicting the directions of branches. To predict target addresses, Isaiah has a four-way set-associative branch-target address cache (BTAC). Each set stores 1,024 target addresses. The second mechanism has four branch-history tables (BHT), each capable of storing a direction (forward or backward) for 8,192 conditional branches. Three BHTs independently vote their choice, and the fourth BHT decides which of those predictions to use, based on the history of that particular branch. By combining this result with the BTAC result, Isaiah predicts whether the next taken branch will jump forward or backward, and which address it will target.

One limitation is that the primary predictor can handle only two branches within a 16-byte line fetched from the instruction cache. It's possible for a series of short x86 instructions to include more than two branches in a 16-byte line. In those cases, the secondary branch predictor saves the day, relying on a table of 2,048 recent branches to make its prediction.

When Isaiah translates x86 instructions into micro-ops, it compares the results of the primary and secondary predictors to forecast the branch direction and target address. Usually the primary prediction gets the nod, but sometimes the secondary predictor overrides the primary predictor.

The penalty in lost clock cycles for mispredicting a branch is uncertain. At this time, VIA won't disclose the depth of Isaiah's pipelines or the pipe stage in which the processor discovers it has mispredicted a branch. VIA says only that Isaiah's pipeline is "very similar" in depth to a C7-M pipeline, which has 16 stages. Assuming that Isaiah's pipeline is no deeper than 20 stages, and that mispredicted branches usually aren't discovered until relatively late in the game, Isaiah probably has to flush instructions from a dozen or so stages after guessing wrong. In addition, there are internal buffers and reservation stations that may need flushing, as well as speculative results that may need discarding. Isaiah's misprediction penalty is probably similar to that of other superscalar processors with out-of-order speculative executionpainful, but amortized by better overall throughput.

#### Improved Voltage/Frequency Scaling

Isaiah implements the same PowerSaver features as the C7-M and adds some new features, branded as Adaptive PowerSaver Technology. PowerSaver can automatically adjust the chip's core voltage and clock frequency in response to changing software workloads, supervised by the operating system. Isaiah can vary its voltage and frequency over a range of 0.75–1.2V and 400MHz–2.0GHz. PowerSaver is similar to Transmeta's LongRun, which first appeared in Transmeta Crusoe x86-compatible processors in 2000. VIA's voltage/frequency-scaling technology also appeared in 2000. Intel's similar version of this technology is Enhanced SpeedStep, which first appeared in the Mobile Pentium III Processor-M in 2001. (For an explanation of this basic concept, see the sidebar, "Transmeta Explains LongRun," in *MPR* 7/10/2000-02, "Top PC Vendors Adopt Crusoe.")

PowerSaver has some clever twists. One, called Twin-Turbo, uses two independent PLLs to quickly adjust the voltage and frequency while allowing the processor to continue working during the transitions. Normally, a processor with only one PLL must stop fetching and executing instructions



Figure 3. Isaiah's primary branch predictor. This sophisticated two-level predictor relies on a branch-target address cache (BTAC) and four branchhistory tables (BHT). The four-way set-associative BTAC can store a total of 4,096 target addresses and the type of each branch. Each BHT can store the direction for 8,192 branches. Three BHTs make independent predictions, and the fourth BHT chooses among them. This result combines with the BTAC result to generate a final prediction. The return-address stack predicts addresses for RETURN instructions.

during these adjustments. It can take several hundred microseconds for the clocks and voltage to stabilize at the new settings, forcing the core and I/O bus to halt. To avoid these work stoppages, VIA's TwinTurbo alternates between the two PLLs when changing the voltage and frequency. As Figure 4 shows, one PLL continues regulating the processor while the other PLL changes to the next higher or lower setting. The processor keeps switching between PLLs in this manner until it reaches the target voltage and frequency.

Adaptive PowerSaver Technology, unique to Isaiah, adds more new twists. One is adaptive thermal clocking. By monitoring the die temperature, the processor can automatically adjust its core voltage and clock frequency to save power or increase performance. In one case, the processor might run at a lower-than-normal voltage to reach a particular clock speed if the die temperature is well within the maximum limit. For example, the processor may reach 2.0GHz at only 1.0V instead of the nominal 1.2V, thus saving power. In another case, the processor might overclock itself if the die temperature is well within range. For example, the processor might boost its frequency to 2.2GHz instead of the nominal maximum of 2.0GHz, thus increasing performance.

A related feature of Adaptive PowerSaver Technology, also unique to Isaiah, allows a program to hold the chip's operating temperature below a certain level—even if that temperature is lower than the chip's maximum rating. For example, if the maximum operating temperature is 125°C, the software could enforce a ceiling of 100°C. If the temperature threatens to exceed that threshold, the processor automatically downshifts to a lower core voltage and clock frequency. This feature is useful in systems with limited cooling.

Note that VIA measures Isaiah's TDP *without* using Adaptive PowerSaver Technology to scale the voltage and frequency. VIA says its TDPs truly represent worst-case power consumption. Other companies may use different methodologies, so direct comparisons may not be valid.

Isaiah is the first Centaur processor to support a new x86 deep-sleep mode called C-6. This mode also appears in Intel's Atom processors and Penryn mobile processors. C-6 mode reduces power to the lowest possible level without resetting the processor or losing critical state information. The processor saves its state (the program counter, registers, flags, stack pointer, etc.) in a small amount of internal memory. Then it powers down most of the chip, including the clocks and caches. The chip idles in this mode until receiving a wake-up call. Wake-up restores the original state, although the processor must reload the caches and reprime the pipelines. Core voltage in C-6 mode is the lowest allowed by the fabrication process in which the chip was manufactured. VIA hasn't disclosed Isaiah's power consumption in C-6 mode or the time required to enter and exit this mode.

#### Squeezing Between Atom and Core 2

Without a doubt, Isaiah is a major upgrade of Centaur's aging x86 designs. Although Centaur has steadily improved the original design over the last 11 years, it was due for an

# Price & Availability

The first implementations of VIA Technologies' Isaiah microarchitecture are sampling now. Production volumes are scheduled for introduction in 1H08. Initial speed grades will range from 1.0GHz to 2.0GHz. VIA hasn't announced pricing or the official brand name of Isaiahbased processors. For more information, visit:

• www.via.com.tw/en/products/processors/isaiah-arch/

overhaul. Evidently, VIA concluded that performance was a greater restraint on sales than TDP. Hence, Isaiah's emphasis on improving raw throughput and instructions per cycle per watt. Isaiah implements several new power-saving techniques as well. However, those techniques basically offset the additional power consumed by the millions of additional transistors required to improve throughput. Relative to Centaur's C7-M, Isaiah is a faster, more efficient processor that holds the line on power but doesn't significantly reduce TDP.

Intel started from a very different point. Intel's x86 processors deliver plenty of throughput, but even the lowest-power chips that Intel makes for notebook PCs are too power-hungry and expensive for the ultramobile consumerelectronics devices and embedded systems that Intel covets. For a while, Intel flirted with XScale as a low-power solution, but XScale is based on the ARM architecture, not Intel's own x86. As Intel moves more aggressively into the embedded market, ARM becomes a larger potential competitor. In 2006, Intel sold most of its XScale business to Marvell Technology Group. (See *MPR 7/31/06-01*, "Intel's Embedded Future.") The XScale divestiture, coupled with the Atom project, confirmed that Intel was pushing the x86 into lower-power embedded applications. It's a logical move. For a long time, *MPR* has believed that the x86 has unrealized potential as an embedded-processor architecture. Intel has preferred to focus on the PC and server markets, which command higher average selling prices. But the embedded market has vastly higher volumes and is the breeding ground for new kinds of products. *MPR* has often wondered why Intel essentially surrendered that lucrative market to ARM and numerous other companies. At last, with Atom, Intel is putting up a fight.

To make a competitive x86 embedded processor, Intel had to dramatically slash power consumption. Fortunately for Intel, its PC processors already have plenty of throughput to trade off for lower power. At the same time, Intel doesn't want its lower-priced x86 embedded processors competing with its own higher-priced mobile PC processors. Therefore, Intel approached the Atom project with the goal of sacrificing throughput to significantly reduce power consumption. In contrast, VIA approached the Isaiah project with the goal of significantly improving throughput without worsening power consumption.

Both companies appear to have achieved their goals. VIA says Isaiah is two to four times faster than the Centaur C7-M at about the same TDP. Intel says Atom has a much lower TDP than the Core 2 Duo while retaining enough horsepower to drive a UMPC. The surprising result of these two successful projects is that Intel will almost certainly replace VIA as the maker of the smallest, lowest-power, cheapest-to-manufacture x86 microprocessor that can run major software. Without final silicon or independent





benchmarks to evaluate, we can't conclude yet which processor is faster. It's also unclear which processor can execute the most instructions per cycle per watt, which is Glenn Henry's yardstick.

VIA must position Isaiah between Atom and Intel's mobile/embedded Core 2 processors. Even if Isaiah delivers higher throughput than Atom does, it will consume more power and cost more to manufacture. Isaiah will use less power and probably be less expensive to manufacture than Intel's Core 2-based processors, but it will also deliver less throughput. For VIA, the straits between Atom and the low end of the Core 2 line are a tight place to be. If future descendants of Atom offer better performance, and if future descendants of Core 2 offer better

onthe VIA and 1

power consumption, Isaiah could be squeezed out. Of course, Intel is entering tight straits, too. Atom will squeeze between Isaiah and the Centaur C7 family, which remains available.

Until now, Intel has largely ignored VIA. The Taiwanbased company has been selling low-power, low-priced x86 processors into geographic regions and markets that Intel's x86 processors were too power-hungry and expensive to address. Now Intel is muscling in with a smaller, lower-power x86 chip. VIA has the advantage of existing design wins and established business relationships in these markets, which count for a lot, especially in Asia. In addition, VIA can probably exploit Intel's reluctance to cannibalize its more-profitable x86 processors based on Core 2. If necessary, VIA can sacrifice some profit margin, as AMD often does when caught in a bind. And perhaps VIA can reduce Isaiah's TDP closer to Atom's by migrating more rapidly to a 45nm process. One certainty is that Intel's new attention to low-power x86 processors will force VIA to navigate stormier waters in years to come.

To subscribe to Microprocessor Report, phone 480.483.4441 or visit www.MPRonline.com

© IN-STAT