General Information

Top of Page

High Throughput Computing (HTC) Mode on NY Blue/P

Overview

HTC mode executables cannot use MPI. In HTC mode, you can run one, two, or four different jobs on each compute node of a partition depending on the mode (SMP, DUAL, or VN respectively) the partition was booted in. These jobs must not require data communication among each other. HTC focuses on pushing a large number of relatively short jobs through the system, which differs from the usual high performance computing use of Blue Gene in which a single (MPI) job runs the same executable on each node in the partition.

For additional information, in the IBM Redbooks see the High Throughput Computing (HTC) Paradigm chapter in the IBM redbook titled, "IBM System Blue Gene Solution: Blue Gene/P Application Development". That redbook also describes the VN, Dual, and SMP modes in its Execution Process Modes chapter.

Another source of HTC information is the High Throughput Computing chapter in the IBM redbook titled, "IBM System Blue Gene Solution: Blue Gene/P System Administration".

Top of Page

Procedure for Running HTC Mode jobs on NY Blue/P

  1. On fenp, copy /bgsys/local/samplecodes/htc/submit_htc.pl to the fenp directory from which you will be submitting your fenp HTC mode job.
  2. Contact Len Slatest, who will reserve an HTC partition for you for your HTC mode job. Tell him in which of the following three modes he should boot it up:VN, DUAL, or SMP.
  3. Len will at the same time tell you of any modifications that you must make to submit_htc.pl in your fenp directory. These changes will be needed in order that the HTC mode jobs that are submitted by that script reference the exact racks, midplanes, node cards, compute cards, and processors that Len has reserved for your job.
  4. You may have to make additional changes to submit_htc.pl depending upon what kind of HTC run you wish to make. For example, if you do not wish to run the same executable on each of the compute nodes in the partition, submit_htc.pl will have to be modified accordingly.
  5. Make sure the Unix owner file permissions include rx for submit_htc.pl in your fenp directory, then execute that script: ./submit_htc.pl
  6. As an illustration of this procedure see the examples below.

Top of Page

Background Notes

In submit_htc.pl, R is used to denote the rack, and M is used to denote the rack's midplane (1 for the midplane in the upper half of the rack, 0 for the lower midplane).

N denotes the node card. Recall that a node card consists of 32 compute nodes and since there are four processor cores per NY Blue/P compute node, a node card contains 128 processor cores.

J denotes the compute card, i.e. the compute node. Each compute card contains four processor cores. Locations J00 through J03 are i/o cards so the looping over J always begins with J04.

C denotes the processor core: 00, 01, 02, or 03.

Top of Page

View Examples

Example 1: Suppose you wish to run the executable for the following trivial multiply.f fortran code in HTC mode on 128 compute nodes in mode VN:

        program main
        x = 4.2
        y = 7.8
        z = x*y
        print *,'z=',z
        end

This first example is a bit silly, why would one want to do the same exact node-independent computation on multiple nodes? But it serves as an introductory example. Example 2 further below is more sensible.

Continuing with Example 1, first create the executable:
mpif77 -o multiply multiply.o

Then, ask Len to reserve a 128-node, mode VN HTC partition for you.

Suppose Len then told you he had reserved for you the following:

Compute cards 04 through 35 inclusive and processor cores 00 through 03 inclusive, of rack R00, midplane 1, node card 08

Then you would modify submit_htc.pl in your fenp directory to be:

#!/usr/bin/perl
$counter = 0;
my $debug = 1;
my $mode = "VN";
my $cwd  = `pwd`;
my $exe  = "multiply";
chomp($cwd);
 for ($cc = 4; $cc <=35; $cc++) {
 for ($proc = 0; $proc <= 3; $proc++) {
    $location = sprintf("R00-M1-N08-J%02d-C0%1d", $cc, $proc);
    $locations[$counter] = $location;
    $counter++;
                                }
                                      }
foreach $loc (@locations) {
   print "location = $loc\n" if $debug;
   $submit_cmd = sprintf("submit --cwd $cwd --mode $mode --location $loc -exe $exe \> $loc.out 2>&1 & ");
   print "$submit_cmd\n" if $debug;
   $result = `$submit_cmd`;
}

Notice that the modified script specifies that the name of the executable to run in HTC mode is multiply and that the partition was booted up in VN mode.

When executed, this script would produce 128 output files. Each output file would be identical, i.e. have the same output, namely:

z= 32.76000

Note also that because VN mode was used, four processes ran on each compute node J:

-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J04-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J04-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J04-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J04-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J05-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J05-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J05-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J05-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J06-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J06-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J06-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J06-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J07-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J07-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J07-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J07-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J08-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J08-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J08-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J08-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J09-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J09-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J09-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J09-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J10-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J10-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J10-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J10-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J11-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J11-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J11-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J11-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J12-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J12-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J12-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J12-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J13-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J13-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J13-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J13-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J14-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J14-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J14-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J14-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J15-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J15-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J15-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J15-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J16-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J16-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J16-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J16-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J17-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J17-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J17-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J17-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J18-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J18-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J18-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J18-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J19-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J19-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J19-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J19-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J20-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J20-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J20-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J20-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J21-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J21-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J21-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J21-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J22-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J22-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J22-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J22-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J23-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J23-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J23-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J23-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J24-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J24-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J24-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J24-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J25-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J25-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J25-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J25-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J26-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J26-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J26-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J26-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J27-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J27-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J27-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J27-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J28-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J28-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J28-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J28-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J29-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J29-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J29-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J29-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J30-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J30-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J30-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J30-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J31-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J31-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J31-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J31-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J32-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J32-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J32-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J32-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J33-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J33-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J33-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J33-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J34-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J34-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J34-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J34-C03.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J35-C00.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J35-C01.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J35-C02.out
-rw-r--r-- 1 johndoe users 19 2008-11-20 13:50 R00-M1-N08-J35-C03.out

Top of Page

Example 2:

On fenp, copy files htc_example1.c and Makefile.htc in /bgsys/local/samplecodes/htc to your directory.

make -f Makefile.htc htc_example1

Then run the resulting executable htc_example1 in HTC mode (in a manner similar to example 1).

The executable will check whether the node it is running on was booted in HTC mode.

Top of Page