Duo Core Processors
and Multiple Caches

What is the hype all about

 

Doug Willoughby

February 20, 2007

 

Agenda

•      What is Duo Core and what are requirements for use

•      Description of Cache memory

•      History of cache development

•      Some recent Developments

•      Some Duo Core Specs

 

What is Duo Core

•      A Processor Chip that has two processing units instead of one.

•      Built into a single package

•      Can run two applications or two processes simultaneously

 

News Flash February 12, 2007

•      Intel announced an experimental chip design with 80 cores to give enormous calculating power at low power requirements

 

Applications/Processes

•      Consist of instruction sequences and data associated with each instruction

•      Most are sequential; called a single thread

•      Some applications can have multiple threads which run simultaneously.

•      A single thread cannot take advantage of dual core; multiple threads can

 

Operating System Requirement

•      Must implement task dispatcher that can handle:

•      multiple threads of one application or

•      multiple applications or

•      multiple operating system processes or

•      combinations of the above

 

Single Core Processing

•      Task Manager implements preemptive multitasking

•      One process or application runs until another higher priority task needs the processor; then switch occurs

•      Can also switch if one process encounters delay for data (I/O from CD/DVD/Internet)

 

Hyper Threading Technology

•      Running multiple single thread applications through a single processor sharing unused cycles

•      Compromise technology between single and dual core technologies

 

•      http://www.intel.com/products/processor_number/flash/demo.html

 

Duo Core Technology

•      Running multiple single thread applications through two processors

•      http://www.intel.com/products/processor_number/flash/demo.html

 

Of What Use is Dual Core?

•      None if running only one single thread application or process

•      Not much if running multiple applications or processes with low processor utilization

•      Great if you have multiple high processor utilization applications which can run simultaneously

 

News Flash February 19, 2007

•      AMD announces new Barcelona Quad Core processor chip.

•      Includes four cores with supporting circuits on one chip

•      Intel Quad Core puts 2 dual core chips on same module (Woodcrest and Clovertown)

 

L1 and L2 Cache

•      Why include a cache at all?

•      Cache is much smaller, much faster, more expensive per bit memory

•      Interfaces to off-chip RAM

•      RAM is much larger, much slower, cheaper per bit memory

•      Because of Locality of Reference combo appears as faster memory at the lower cost

  

News Flash Feb 14, 2007

•      IBM announces a breakthrough that allows substitution of eDRAM in place of SRAM on the chips.

•      Dramatically reduces space requirement on the chip for L2 cache and also L1 cache

 

Locality of Reference

•      Theory that when applications operate, only a small kernel of instructions and data are required at any one time.

•      If stored in a fast small buffer and only go to larger slower RAM if not in buffer, the combo would operate at an average speed closer to the speed of the buffer at a cost close to the cost of RAM

 

IBM Performance Evaluation

•      Complex computer designs required complex simulations driven by instruction streams

•      Streams were created by tracing real benchmark workloads

•      Traces included addresses of instructions, instructions, addresses of data as well as  data itself

 

Cache Development

•      IBM Research extracted all the addresses from streams 

•      Confirmed that a small fast buffer could store most current instructions and data

•      All instructions and data stored in slower RAM

•      Effect was to appear that all data was accessible at near buffer speed if hit ratio in the buffer was 96 to 99 percent

 

Cache Development

•      L1 cache was initially implemented in System 360 Model 85, (circa 1968) then System 370 smaller models

•      Later instruction and data caches were implemented.

•      Most later computer designs and chip designs incorporate L1 and L2 caches as well as data and instruction separation

 

Cache 1968 vs Today

•      System 360 Model 85 > $1,000,000

•      32K byte Cache   80 ns cycle time

•      4M byte RAM   960 ns access time

•      Magnetic Core memory

 

•      Pentium 4 PC  < $1000

•      32K byte L1 Cache   3 cycle access

•      4 M byte L2 Cache    12 cycle access

•      1G byte RAM  

 

Examples of Use

•      Adobe Photoshop and Elements can run multiple threads.

•      Web Page Servers

•      Gaming applications

•      Applications with high computing requirements

•      Grid Computing applications