QCDOC Project Hardware Status
QCDOC Home | LQCD Home
2004 BNL All-Hands Meeting
Date: March 26, 2004
Given By: Norman Christ
PDF
Version ≈ 18m (
Get Reader |
Can't View PDFs? )
Physics: Lattice QCD is driven by a synergy between
exponentially increasing computer resources and dramatically improved
algo-rithms.
Architecture: Large gains possible from optimized design.
- Space-time homogeneity supports easy parallelization and a mesh
network.
- System-on-a-chip technology permits a highly scalable and
cost-effective design:
1. Entire node (including interconnect logic) on a single chip.
2. The only extra components:
- Serial nearest-neighbor wires.
- Commercial Ethernet tree for
booting, diagnostics and I/O.
- Low power, compact design.
Goal: $1/sustained Mflops

Columbia (DOE)
- Norman Christ
- Saul Cohen
- Calin Cristian
- Zhihua Dong
- Changhoan Kim
- Ludmilia Levkova
- Xiaodong Liao
- Guofeng Liu
- Robert Mawhinney
- Azusa Yamaguchi
UKQCD (PPARC)
- Peter Boyle
- Mike Clark
- Balint Joo
RBRC (RIKEN)
- Shigemi Ohta (KEK)
- Tilo Wettig (Yale)
IBM:
- Dong Chen
- Alan Gara
- Design Groups:
- Yorktown Heights, NY
- Rochester, MN
- Raleigh,NC
BNL (DOE)
- Robert Bennett
- Chulwoo Jung
- Kostya Petrov
- Dave Stampf

- IBM-fabricated, single-chip node [50 million transistors, 5-6
Watt, 1.3cm Χ 1.3cm die]
- PowerPC 32-bit processor
- 1Gflops, 64-bit IEEE FPU
- Memory Management
- GNU and XLC Compilers
- 4 Mbyte on-chip memory and up to 2.0 Gbyte/node on DIMM card
- 6-dim communications network:
Efficient for small packet sizes, ≈500ns
latency
Global sum/broadcast functionality
Minimal processor overhead
Lower dimensional machine partitions
- 100 Mbit/sec, Fast Ethernet
JTAG/Ethernet boot hardware
- Host-node OS communication
DiskI/O
RISC Watch debugger
- ≈10 Watt, 15 in3 per node

Complete Processor Node on a Single QCDOC Chip
Education Version - Not For Commercial Use

Expand Image

- ECC on external DIMM and EDRAM
- Automatic recovery from single-bit communications errors
- Running check sum on both ends of each serial channel
- Number of components similar to QCDSP:1-2 failures/week on 10K
node machine
- Soft error rate estimated at <1/week on 10K nodes (low-∞
lead in solder balls)


Expand Image

Daughter Boards

Mother Boards

Expand Image

Single Motherboard Cabinet

Expand Image

128-node Machine

Expand Image

8 Motherboard Backplane

Expand Image

8 Motherboard Cabinet

Expand Image

Serial Communications Simulated and Tested

- ASIC tape-out April 8, 2003
- First five chips: June 5, 2003
- First five daughter cards June 27, 2003
- Basic functionality verified July 9, 2003
- First three motherboards September 2003
- Full motherboard functioning November 3, 2003
- Two motherboards functioning November 18, 2003
- ASIC sign-off, Nov. 2003
- Final daughterboard sign-off, March 1, 2004
- Backplane sign-off, March 23, 2004

- Final motherboard sign-off, April 7, 2004
- 384-node machine, April 9, 2004
- First water-cooled cabinet, May 17 , 2004
- Two 2048-node machines May 31, 2004
- Two 10,240-node machines, August 31, 2004 [RIKEN, UKQCD]
- Third 10,240-node machine, October 31,2004 [U.S. LQCD
Collaboration]

- Initial Target:
Processor frequency: 500MHz
Total cost per node:$500
Price/performance:
$500/(2flops/cycle) x 500MHz x 0.50eff. = $1/Mflops
- Present Status:
Processor frequency: 450MHz
Total cost per node: $400
Price/performance:
$400/(2flops/cycle) x 450MHz x 0.50 eff. = $0.89/Mflops
- Goal of $1/Mflops should be achieved
(Exceeded?)
QCDOC Home | LQCD Home

Last Modified: February 1, 2008
|