SPEC MPI2007: Frequently Asked Questions

Updated for SPEC MPI2007 (new features are highlighted)

Last updated: 22 Mar 2007 cgp
(To check for possible updates to this document, please see http://www.spec.org/mpi2007/Docs/.)

Introduction

This document answers Frequently Asked Questions and provides background information about the SPEC MPI2007 benchmark suite. SPEC hopes that this material will help you understand what the benchmark suite can, and cannot, provide; and that it will help you make efficient use of the product.

Overall, SPEC designed SPEC MPI2007 to provide a comparative measure of compute intensive MPI-parallel performance across the widest practical range of cluster and SMP hardware. The product consists of source code benchmarks that are developed from real user applications. These benchmarks depend on the processors, memory, interconnect, compiler and file server on the tested system.

This document is organized as a series of questions and answers.

Background

Q1. What is SPEC?

Q2. What is a benchmark?

Q3. Should I benchmark my own application?

Q4. If not my own application, then what?

Scope

Q5. What does SPEC MPI2007 measure?

Q6. Why use SPEC MPI2007?

Q7. What are the limitations of SPEC MPI2007?

Overview of usage

Q8. What is included in the SPEC MPI2007 package?

Q9. What does the user of the SPEC MPI2007 suite have to provide?

Q10. What are the basic steps in running the benchmarks?

Q11. What source code is provided? What exactly makes up these suites?

Q12.A. How do I submit my results?

Q12.B. How do I edit a submitted report?

Metrics

Q13. What metrics can be measured?

Q14. What is the difference between a "base" metric and a "peak" metric?

Q15. What are the different data-set sizes?

Q16. Which SPEC MPI2007 metric should be used to compare performance?

MPI2007 vs. other SPEC suites

Q17. SPEC CPU2006 and OMP2001 and already available. Why create SPEC MPI2007? Will it show anything different?

Q18. What happens to SPEC HPC2002 after SPEC MPI2007 is released?

Q19. Is there a way to translate measurements of other suites to SPEC MPI2007 results or vice versa?

Benchmark selection

Q20. Some of the benchmark names sound familiar; are these comparable to other programs?

Q21. What criteria were used to select the benchmarks?

Q22. Aren't some of the SPEC MPI2007 benchmarks in SPEC CPU2006 and HPC2002? How are they different?

Q23. Why were most of the benchmarks not carried over from CPU2006, HPC2002 or OMP2001?

Miscellaneous

Q24. Why does SPEC use a reference machine? What machine is used for SPEC MPI2007?

Q25.A. How long does it take to run the SPEC MPI2007 benchmark suites on the reference platform?

Q25.B. How long does it take to run the SPEC MPI2007 benchmark suites on my platform?

Q26. What if the tools cannot be run or built on a system? Can the benchmarks be run manually?

Q27. Where are SPEC MPI2007 results available?

Q28. Can SPEC MPI2007 results be published outside of the SPEC web site? Do the rules still apply?

Q29. How do I contact SPEC for more information or for technical support?

Q30. Now that I have read this document, what should I do next?

Q1. What is SPEC?

SPEC is the Standard Performance Evaluation Corporation. SPEC is a non-profit organization whose members include computer hardware vendors, software companies, universities, research organizations, systems integrators, publishers and consultants. SPEC's goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user community benefits from objective tests which can serve as a common reference point.

Q2. What is a benchmark?

A benchmark is "a standard of measurement or evaluation" (Webster’s II Dictionary). A computer benchmark is typically a computer program that performs a strictly defined set of operations - a workload - and returns some form of result - a metric - describing how the tested computer performed. Computer benchmark metrics usually measure speed: how fast was the workload completed; or throughput: how many workload units per unit time were completed. Running the same computer benchmark on multiple computers allows a comparison to be made.

Q3. Should I benchmark my own application?

Ideally, the best comparison test for systems would be your own application with your own workload. Unfortunately, it is often impractical to get a wide base of reliable, repeatable and comparable measurements for different systems using your own application with your own workload. Problems might include generation of a good test case, confidentiality concerns, difficulty ensuring comparable conditions, time, money, or other constraints.

Q4. If not my own application, then what?

You may wish to consider using standardized benchmarks as a reference point. Ideally, a standardized benchmark will be portable, and may already have been run on the platforms that you are interested in. However, before you consider the results you need to be sure that you understand the correlation between your application/computing needs and what the benchmark is measuring. Are the benchmarks similar to the kinds of applications you run? Do the workloads have similar characteristics? Based on your answers to these questions, you can begin to see how the benchmark may approximate your reality.

Note: A standardized benchmark can serve as reference point. Nevertheless, when you are doing vendor or product selection, SPEC does not claim that any standardized benchmark can replace benchmarking your own actual application.

Q5. What does SPEC MPI2007 measure?

SPEC MPI2007 focuses on performance of compute intensive applications using the Message-Passing Interface (MPI) standard 2.1, which means these benchmarks emphasize the performance of:

It is important to remember the contribution of all these components. SPEC MPI performance intentionally depends on more than just the processor.

SPEC MPI2007 is not intended to stress other computer components such as the operating system, graphics, or the I/O system. They may have an effect in some cases, if the component is exceptionally slow or if there is an unusually small resource bound, e.g. paging delays due to a slow disk or too little memory. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on graphics, distributed Java computing, webservers, and network file systems.

Q6. Why use SPEC MPI2007?

SPEC MPI2007 provides a comparative measure of MPI-parallel, floating point, compute intensive performance. If this matches with the type of workloads you are interested in, SPEC MPI2007 provides a good reference point.

Other advantages to using SPEC MPI2007 include:

Q7. What are the limitations of SPEC MPI2007?

As described above, the ideal benchmark for vendor or product selection would be your own workload on your own application. Please bear in mind that no standardized benchmark can provide a perfect model of the realities of your particular system and user community.

Q8. What is included in the SPEC MPI2007 package?

SPEC provides the following on the SPEC MPI2007 media (DVD):

Q9. What does the user of the SPEC MPI2007 suite have to provide?

Briefly, you need an SMP or cluster system, running Unix, Linux or Microsoft Windows with compilers; a minimum of 1GB of free memory per rank - although more may be required, as described in system-requirements.html; and 8GB of free disc space. For cluster configurations, the file system will need to be shared across the nodes.

Note: links to SPEC MPI2007 documents on this web page assume that you are reading the page from a directory that also contains the other SPEC MPI2007 documents. If by some chance you are reading this web page from a location where the links do not work, try accessing the referenced documents at one of the following locations:

Q10. What are the basic steps in running the benchmarks?

Installation and use are covered in detail in the SPEC MPI2007 User Documentation. The basic steps are:

If you wish to generate results suitable for quoting in public, you will need to carefully study and adhere to the run rules.

Q11. What source code is provided? What exactly makes up these suites?

MPI2007 is composed of MPI-parallel compute-intensive applications provided as source code. 2 are written in C, 6 are written in Fortran, 4 combine C and Fortran, and 1 is written in C++. The benchmarks are:

104.milc C Physics: Quantum Chromodynamics (QCD)
107.leslie3d Fortran Computational Fluid Dynamics (CFD)
113.GemsFDTD Fortran Computational Electromagnetics (CEM)
115.fds4 Fortran Computational Fluid Dynamics (CFD)
121.pop2 C/Fortran Climate Modeling
122.tachyon C Graphics: Parallel Ray Tracing
126.lammps C++ Molecular Dynamics Simulation
127.wrf2 C/Fortran Weather Prediction
128.GAPgeofem C/Fortran Heat Transfer using Finite Element Methods (FEM)
129.tera_tf Fortran 3D Eulerian Hydrodynamics
130.socorro C/Fortran Molecular Dynamics using Density-Functional Theory (DFT)
132.zeusmp2 Fortran Physics: Computational Fluid Dynamics
137.lu Fortran Matrix Decomposition

Descriptions of the benchmarks, with reference to papers, web sites, and so forth, can be found in the individual benchmark descriptions (click the links above). Some of the benchmarks also provide additional details, such as documentation from the original program, in the nnn.benchmark/Docs directories in the SPEC benchmark tree.

The numbers used as part of the benchmark names provide an identifier to help distinguish programs from one another. For example, some programs in SPEC CPU2006 derive from the same sources and need to be distinguished from the MPI2007 versions. Note: even if a program has the same name as in another suite - for example, 127.wrf2 vs. 361.wrf_m from the HPC2002 suite - the updated workload and updated source code mean that it is not valid to compare the SPEC MPI2007 result to the result with the older SPEC HPC2002 benchmark.

Q12.A. How do I submit my results?

The runspec utility leaves a file with a path result/MPIM2007.*.rsf. If the run was reportable, you can mail this file (or attach it to a message) to [email protected].

Q12.B. How do I edit a submitted report?

The SPEC Private website http://pro.spec.org/private/hpg/submit/mpi2007/UnderReview contains links to each report that is being reviewed, in various formats. You will need to log on with your company password. Download the sub file, edit it, and mail it to [email protected]. Note that the file will not be accepted if you edit anything below the line

# =============== do not edit below this point ===================

Q13. What metrics can be measured?

After the benchmarks are run on the system under test (SUT), a ratio for each of them is calculated using the run time on the SUT and a SPEC-determined reference time. From these ratios, the following metrics are calculated:

Larger data sets will be added later on, with the metrics

In all cases, a higher score means "better performance" on the given workload.

Q14. What is the difference between a "base" metric and a "peak" metric?

In order to provide comparisons across different computer hardware, SPEC provides the benchmarks as source code. Thus, in order to run the benchmarks, they must be compiled. There is agreement that the benchmarks should be compiled the way users compile programs. But how do users compile programs?

In addition to the above, a wide range of other types of usage models could also be imagined, ranging in a continuum from -Odebug at the low end, to inserting directives and/or re-writing the source code at the high end. Which points on this continuum should SPEC MPI2007 allow?

SPEC recognizes that any point chosen from that continuum might seem arbitrary to those whose interests lie at a different point. Nevertheless, choices must be made.

For MPI2007, SPEC has chosen to allow two types of compilation:

Note that options allowed under the base metric rules are a subset of those allowed under the peak metric rules. A legal base result is also legal under the peak rules but a legal peak result is NOT necessarily legal under the base rules.

A full description of the distinctions and required guidelines can be found in the SPEC MPI2007 Run and Reporting Rules.

Q15. What are the different data set sizes?

The MPI2007 suites include test and train data sets for the purpose of validating the correctness of the compile and the run environment. Note that the term train is a holdover from previous suites that allowed feedback-directed optimization using measurements from training runs. For the purpose of MPI2007 it refers to a larger testing data set.

The mref data-set is the largest and is used in reportable runs. Since it runs longer it gives a clearer picture of system performance. In the future it will be supplemented with even larger lref and xref data sets.

Q16. Which SPEC MPI2007 metric should be used to compare performance?

A base measurement is required in every reportable run. The corresponding peak measurement is optional but is useful to show how high performance your system can achieve.

When the larger lref and xref data sets become available, you will have to decide which to use. The larger data sets will scale to larger machines and so give a more accurate picture of the performance on large problems. On the other hand, more results will have been reported with the mref data set size, so it will provide a more common reference point of comparison.

Q17: SPEC CPU2006 and OMP2001 are already available. Why create SPEC MPI2007? Will it show anything different?

Technology is always changing. As the technology progresses, the benchmarks have to adapt to this. SPEC needed to address the following issues:

Application type:
Many native MPI-parallel applications have been developed for cluster systems and are widely used in industry and academia. SPEC feels that standard benchmarks need to be available for comparing cluster systems for the same reason that SPEC CPU was developed for measuring serial CPU performance and SPEC OMP was developed for measuring SMP performance.

Moving target:
OMP2001 has been available for SMP systems for six years. In the meantime, clusters have become increasingly popular as a low-cost and flexible alternative to configure parallel systems. The HPC2002 suite allows MPI parallelism but is not an adequate bridge because it only contains 3 benchmarks, has too short of a runtime, and is not as strictly standardized as MPI2007 or OMP2001.

Application size:
As applications grow in complexity and size, older suites become less representative of what runs on current systems. For MPI2007, SPEC included some programs with both larger resource requirements and more complex source code than previous suites.

Run-time:
As of spring, 2007, many of the OMP2001 benchmarks are finishing in less than 5 minutes on leading-edge processors/systems. Small changes or fluctuations in system state or measurement conditions can therefore have significant impacts on the percentage of observed run time. SPEC chose to make run times for the CPU2006 and MPI2007 benchmarks longer to take into account future performance and prevent this from being an issue for the lifetime of the suites.

Q18: What happens to SPEC HPC2002 after SPEC MPI2007 is released?

The HPC2002 suite will be retired at the point that MPI2007 is released. No further results will be accepted for publication. The MPI2007 suite does a better job of measuring and standardizing MPI performance.

Q19: Is there a way to translate measurements of other suites to SPEC MPI2007 results or vice versa?

There is no formula for converting CPU2006, OMP2001 or any other measurements to MPI2007 results and vice versa; they are different products. We expect some correlation between any two given suites, i.e. machines with higher results with one suite tend to have higher results with another suite, but there is no universal formula for all systems.

SPEC encourages SPEC licensees to publish MPI2007 numbers on older platforms to provide a historical perspective on performance.

Q20. Some of the benchmark names sound familiar; are these comparable to other programs?

Many of the SPEC benchmarks are derived from publicly available application programs. The individual benchmarks in this suite may be similar, but are NOT identical to benchmarks or programs with similar names which may be available from sources other than SPEC. In particular, SPEC has invested significant effort to improve portability and to minimize hardware dependencies, to avoid unfairly favoring one hardware platform over another. For this reason, the application programs in this distribution may perform differently from commercially available versions of the same application.

Therefore, it is not valid to compare SPEC MPI2007 benchmark results with anything other than other SPEC MPI2007 benchmark results.

Q21: What criteria were used to select the benchmarks?

In the process of selecting applications to use as benchmarks, SPEC considered the following criteria:

Q22: Aren't some of the SPEC MPI2007 benchmarks in SPEC CPU2006 and HPC2002? How are they different?

Some of the benchmarks in CPU2006 and MPI2007 derive from the same source codes. CPU2006 benchmarks are serialized versions of the applications while the corresponding MPI2007 benchmarks preserve the original MPI-parallel nature of the applications. They all have been given different workloads to better excercise a large parallel machine, and in some cases have additional source-code modifications for parallel scalability and portability. Therefore, for example, results with the CPU2006 benchmark 433.milc may be strikingly different from results with the MPI2007 benchmark 104.milc.

One benchmark, 127.wrf2, is derived from a more current version of the WRF source than is 361.wrf_m from the earlier HPC2002 suite. Its larger workload is better for excercising the performance of modern parallel systems. Further, the HPC2002 rules allow the use of OMP parallelism, while this capability has been removed from the MPI2007 source code. So, again, results with the HPC2002 benchmark 361.wrf_m may be strikingly different from results with the MPI2007 benchmark 127.wrf2.

Q23: Why were most of the benchmarks not carried over from CPU2006, HPC2002 or OMP2001?

Many applications in the CPU2006 suite were not designed to run with MPI parallelism, so would not realistically measure MPI performance. The benchmarks in the HPC2002 and OMP2001 suites either

  1. are older applications that have been replaced by more general, more exact, or more efficient applications that solve the same problem,
  2. are not native MPI-parallel applications,
  3. do not scale well to the sizes of modern systems,
  4. for some reason it was not possible to create a longer-running or more robust workload, or
  5. SPEC felt that they did not add significant performance information compared to the other benchmarks under consideration.

Q24: Why does SPEC use a reference machine? What machine is used for SPEC MPI2007?

SPEC uses a reference machine to normalize the performance metrics used in the MPI2007 suites. Each benchmark is run and measured on this machine to establish a reference time for that benchmark. These times are then used in the SPEC calculations.

As a reference machine, SPEC/MPI2007 uses an 8-node cluster of Celestica A2210 (AMD "Serenade") systems connected by a TCP (GigE) Interconnect. Each A2210 system contains two 940 sockets, each holding one single-core AMD Opteron 848 processor chip running at 2200 MHz with 1 MB I+D L2 cache, plus 4 GB of DDR 333 memory per socket. The MPI implementation is MPICH2 version 1.0.3 running on SLES 9 SP3 Linux OS with Pathscale 2.5 compilers.

Note that this machine differs dramatically from the ones used for SPEC/OMP and SPEC/CPU, since the MPI2007 suites represent a fundamentally different class of applications.

Note also that when comparing any two systems measured with the MPI2007, their performance relative to each other would remain the same even if a different reference machine was used. This is a consequence of the mathematics involved in calculating the individual and overall (geometric mean) metrics.

Q25.A: How long does it take to compile and run the SPEC MPI2007 benchmark suites on the reference platform?

The reference cluster takes about 1 hour to build the base versions of the MPI2007 benchmark executables and 24 hours to finish a rule-conforming (2 iteration) run of the base metrics for the medium sized workload.

Q25.B: How long does it take to compile and run the SPEC MPI2007 benchmark suites on my platform?

This depends on the data set size, the compiler, and the machine that is running the benchmarks. The reference cluster was sold circa 2003 and is correspondingly slower than contemporary machines, so expect a 2 iteration base run of the medium workload to take less than 24 hours on a 16-core system.

Expect larger data set sizes to take longer to process than the medium sized data set.

Compile times can vary markedly between compilers offering different degrees of optimization. Likewise, the choice of compiler flags will affect compile times.

Q26: What if the tools cannot be run or built on a system? Can the benchmarks be run manually?

To generate rule-compliant results, an approved toolset must be used. If several attempts at using the SPEC-provided tools are not successful, you should contact SPEC for technical support. SPEC may be able to help you, but this is not always possible -- for example, if you are attempting to build the tools on a platform that is not available to SPEC.

If you just want to work with the benchmarks and do not care to generate publishable results, SPEC provides information about how to do so.

Q27: Where are SPEC MPI2007 results available?

Results for measurements submitted to SPEC are available at http://www.spec.org/mpi2007.

Q28: Can SPEC MPI2007 results be published outside of the SPEC web site? Do the rules still apply?

Yes, SPEC MPI2007 results can be freely published if all the run and reporting rules have been followed. The MPI2007 license agreement binds every purchaser of the suite to the run and reporting rules if results are quoted in public. A full disclosure of the details of a performance measurement must be provided on request.

SPEC strongly encourages that results be submitted for publication on SPEC's web site, since it ensures a peer review process and uniform presentation of all results.

The run and reporting rules for research and and academic contexts recognize that it may not be practical to comply with the full set of rules in some contexts. It is always required, however, that non-compliant results must be clearly distinguished from rule-compliant results.

Q29. How do I contact SPEC for more information or for technical support?

SPEC can be contacted in several ways. For general information, including other means of contacting SPEC, please see SPEC's Web Site at:

http://www.spec.org/

General questions can be emailed to: info@spec.org
MPI2007 Technical Support Questions can be sent to: mpi2007support@spec.org

Q30. Now that I have read this document, what should I do next?

If you haven't bought MPI2007, it is hoped that you will consider doing so. If you are ready to get started using the suite, then you should pick a system that meets the requirements as described in

system-requirements.html

and install the suite, following the instructions in

install-guide-unix.html or
install-guide-windows.html

Questions and answers were prepared by Kaivalya Dixit of IBM, Jeff Reilly of Intel Corp, and John Henning of Sun Microsystems, and adapted to SPEC MPI2007 by Carl Ponder of IBM. Dixit was the long-time President of SPEC, Reilly is Chair of the SPEC CPU Subcommittee, Henning is Vice-Chair/Secretary of the SPEC CPU Subcommittee, and Ponder is the HPG representative for IBM.


Copyright (C) 1995-2007 Standard Performance Evaluation Corporation
All Rights Reserved