LQCD Homepage

QCDOC Project Hardware Status

QCDOC Home | LQCD Home

2004 BNL All-Hands Meeting
Date: March 26, 2004
Given By: Norman Christ

Download PDF VersionPDF Version ≈ 18m ( Get Reader | Can't View PDFs? )


Outline

QCDOC Project Overview

Physics: Lattice QCD is driven by a synergy between exponentially increasing computer resources and dramatically improved algo-rithms.

Architecture: Large gains possible from optimized design.

  • Space-time homogeneity supports easy parallelization and a mesh network.
  • System-on-a-chip technology permits a highly scalable and cost-effective design:
      1.  Entire node (including interconnect logic) on a single chip.
      2.  The only extra components:
            - Serial nearest-neighbor wires.
            - Commercial Ethernet tree for booting, diagnostics and I/O.
  • Low power, compact design.

Goal: $1/sustained Mflops

 

QCDOC Collaboration

Columbia (DOE)

  • Norman Christ
  • Saul Cohen
  • Calin Cristian
  • Zhihua Dong
  • Changhoan Kim
  • Ludmilia Levkova
  • Xiaodong Liao
  • Guofeng Liu
  • Robert Mawhinney
  • Azusa Yamaguchi

UKQCD (PPARC)

  • Peter Boyle
  • Mike Clark
  • Balint Joo

RBRC (RIKEN)

  • Shigemi Ohta (KEK)
  • Tilo Wettig (Yale)

IBM:

  • Dong Chen
  • Alan Gara
  • Design Groups:
      - Yorktown Heights, NY
      - Rochester, MN
      - Raleigh,NC

BNL (DOE)

  • Robert Bennett
  • Chulwoo Jung
  • Kostya Petrov
  • Dave Stampf

 

Design

  • IBM-fabricated, single-chip node [50 million transistors, 5-6 Watt, 1.3cm Χ 1.3cm die]
  • PowerPC 32-bit processor
      - 1Gflops, 64-bit IEEE FPU
      - Memory Management
      - GNU and XLC Compilers
  • 4 Mbyte on-chip memory and up to 2.0 Gbyte/node on DIMM card
  • 6-dim communications network:
      – Efficient for small packet sizes, 500ns latency
      – Global sum/broadcast functionality
      – Minimal processor overhead
      – Lower dimensional machine partitions
  • 100 Mbit/sec, Fast Ethernet
      – JTAG/Ethernet boot hardware
      - Host-node OS communication
      – DiskI/O
      – RISC Watch debugger
  • 10 Watt, 15 in3 per node

Complete Processor Node on a Single QCDOC Chip
Education Version - Not For Commercial Use

Complete Processor Node on a Single QCDOC Chip
 Zoom Image Expand Image

Reliability

  • ECC on external DIMM and EDRAM
  • Automatic recovery from single-bit communications errors
  • Running check sum on both ends of each serial channel
  • Number of components similar to QCDSP:1-2 failures/week on 10K node machine
  • Soft error rate estimated at <1/week on 10K nodes (low- lead in solder balls)

Machine Overview

Machine Overview
 Zoom Image Expand Image

Daughter Boards

Daughter Boards Daughter Boards Testing
Zoom Image Expand Image Zoom Image Expand Image

Mother Boards

Motherboards
 Zoom Image Expand Image

Single Motherboard Cabinet

Motherboard Cabinet
 Zoom Image Expand Image

128-node Machine

128-node Machine
 Zoom Image Expand Image

8 Motherboard Backplane

8 Motherboard Backplane
 Zoom Image Expand Image

8 Motherboard Cabinet

8 Motherboard Cabinet
 Zoom Image Expand Image

Serial Communications Simulated and Tested

Hardware Status

  • ASIC tape-out April 8, 2003
  • First five chips: June 5, 2003
  • First five daughter cards June 27, 2003
  • Basic functionality verified July 9, 2003
  • First three motherboards September 2003
  • Full motherboard functioning November 3, 2003
  • Two motherboards functioning November 18, 2003
  • ASIC sign-off, Nov. 2003
  • Final daughterboard sign-off, March 1, 2004
  • Backplane sign-off, March 23, 2004

Completion Schedule

  • Final motherboard sign-off, April 7, 2004
  • 384-node machine, April 9, 2004
  • First water-cooled cabinet, May 17 , 2004
  • Two 2048-node machines May 31, 2004
  • Two 10,240-node machines, August 31, 2004 [RIKEN, UKQCD]
  • Third 10,240-node machine, October 31,2004 [U.S. LQCD Collaboration]

Performance and Cost

  • Initial Target:
    – Processor frequency: 500MHz
    –Total cost per node:$500
    – Price/performance:
      $500/(2flops/cycle) x 500MHz x 0.50eff. = $1/Mflops
     
  • Present Status:
      – Processor frequency: 450MHz
      – Total cost per node: $400
      – Price/performance:
        $400/(2flops/cycle) x 450MHz x 0.50 eff. = $0.89/Mflops
     
  • Goal of $1/Mflops should be achieved
    (Exceeded?)

QCDOC Home | LQCD Home

Top of Page

Last Modified: February 1, 2008


One of ten national laboratories overseen and primarily funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven National Laboratory conducts research in the physical, biomedical, and environmental sciences, as well as in energy technologies and national security. Brookhaven Lab also builds and operates major scientific facilities available to university, industry and government researchers. Brookhaven is operated and managed for DOE’s Office of Science by Brookhaven Science Associates, a limited-liability company founded by Stony Brook University, the largest academic user of Laboratory facilities, and Battelle, a nonprofit, applied science and technology organization.

Privacy and Security Notice  | Contact Web Services for help