Updated for SPEC MPI2007 (new features are highlighted)
Last updated: 22 Mar 2007 cgp
(To check for possible updates to this document, please see http://www.spec.org/mpi2007/Docs/.)
This document answers Frequently Asked Questions and provides background information about the SPEC MPI2007 benchmark suite. SPEC hopes that this material will help you understand what the benchmark suite can, and cannot, provide; and that it will help you make efficient use of the product.
Overall, SPEC designed SPEC MPI2007 to provide a comparative measure of compute intensive MPI-parallel performance across the widest practical range of cluster and SMP hardware. The product consists of source code benchmarks that are developed from real user applications. These benchmarks depend on the processors, memory, interconnect, compiler and file server on the tested system.
This document is organized as a series of questions and answers.
Background
Q1. What is SPEC?
Q2. What is a benchmark?
Q3. Should I benchmark my own application?
Q4. If not my own application, then what?
Scope
Q5. What does SPEC MPI2007 measure?
Q6. Why use SPEC MPI2007?
Q7. What are the limitations of SPEC MPI2007?
Overview of usage
Q8. What is included in the SPEC MPI2007 package?
Q9. What does the user of the SPEC MPI2007 suite have to provide?
Q10. What are the basic steps in running the benchmarks?
Q11. What source code is provided? What exactly makes up these suites?
Q12.A. How do I submit my results?
Q12.B. How do I edit a submitted report?
Metrics
Q13. What metrics can be measured?
Q14. What is the difference between a "base" metric and a "peak" metric?
Q15. What are the different data-set sizes?
Q16. Which SPEC MPI2007 metric should be used to compare performance?
MPI2007 vs. other SPEC suites
Q17. SPEC CPU2006 and OMP2001 and already available. Why create SPEC MPI2007? Will it show anything different?
Q18. What happens to SPEC HPC2002 after SPEC MPI2007 is released?
Q19. Is there a way to translate measurements of other suites to SPEC MPI2007 results or vice versa?
Benchmark selection
Q20. Some of the benchmark names sound familiar; are these comparable to other programs?
Q21. What criteria were used to select the benchmarks?
Q22. Aren't some of the SPEC MPI2007 benchmarks in SPEC CPU2006 and HPC2002? How are they different?
Q23. Why were most of the benchmarks not carried over from CPU2006, HPC2002 or OMP2001?
Miscellaneous
Q24. Why does SPEC use a reference machine? What machine is used for SPEC MPI2007?
Q25.A. How long does it take to run the SPEC MPI2007 benchmark suites on the reference platform?
Q25.B. How long does it take to run the SPEC MPI2007 benchmark suites on my platform?
Q26. What if the tools cannot be run or built on a system? Can the benchmarks be run manually?
Q27. Where are SPEC MPI2007 results available?
Q28. Can SPEC MPI2007 results be published outside of the SPEC web site? Do the rules still apply?
Q29. How do I contact SPEC for more information or for technical support?
Q30. Now that I have read this document, what should I do next?
SPEC is the Standard Performance Evaluation Corporation. SPEC is a non-profit organization whose members include computer hardware vendors, software companies, universities, research organizations, systems integrators, publishers and consultants. SPEC's goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user community benefits from objective tests which can serve as a common reference point.
A benchmark is "a standard of measurement or evaluation" (Webster’s II Dictionary). A computer benchmark is typically a computer program that performs a strictly defined set of operations - a workload - and returns some form of result - a metric - describing how the tested computer performed. Computer benchmark metrics usually measure speed: how fast was the workload completed; or throughput: how many workload units per unit time were completed. Running the same computer benchmark on multiple computers allows a comparison to be made.
Ideally, the best comparison test for systems would be your own application with your own workload. Unfortunately, it is often impractical to get a wide base of reliable, repeatable and comparable measurements for different systems using your own application with your own workload. Problems might include generation of a good test case, confidentiality concerns, difficulty ensuring comparable conditions, time, money, or other constraints.
You may wish to consider using standardized benchmarks as a reference point. Ideally, a standardized benchmark will be portable, and may already have been run on the platforms that you are interested in. However, before you consider the results you need to be sure that you understand the correlation between your application/computing needs and what the benchmark is measuring. Are the benchmarks similar to the kinds of applications you run? Do the workloads have similar characteristics? Based on your answers to these questions, you can begin to see how the benchmark may approximate your reality.
Note: A standardized benchmark can serve as reference point. Nevertheless, when you are doing vendor or product selection, SPEC does not claim that any standardized benchmark can replace benchmarking your own actual application.
SPEC MPI2007 focuses on performance of compute intensive applications using the Message-Passing Interface (MPI) standard 2.1, which means these benchmarks emphasize the performance of:
It is important to remember the contribution of all these components. SPEC MPI performance intentionally depends on more than just the processor.
SPEC MPI2007 is not intended to stress other computer components such as the operating system, graphics, or the I/O system. They may have an effect in some cases, if the component is exceptionally slow or if there is an unusually small resource bound, e.g. paging delays due to a slow disk or too little memory. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on graphics, distributed Java computing, webservers, and network file systems.
SPEC MPI2007 provides a comparative measure of MPI-parallel, floating point, compute intensive performance. If this matches with the type of workloads you are interested in, SPEC MPI2007 provides a good reference point.
Other advantages to using SPEC MPI2007 include:
As described above, the ideal benchmark for vendor or product selection would be your own workload on your own application. Please bear in mind that no standardized benchmark can provide a perfect model of the realities of your particular system and user community.
SPEC provides the following on the SPEC MPI2007 media (DVD):
Briefly, you need an SMP or cluster system, running Unix, Linux or Microsoft Windows with compilers; a minimum of 1GB of free memory per rank - although more may be required, as described in system-requirements.html; and 8GB of free disc space. For cluster configurations, the file system will need to be shared across the nodes.
Note: links to SPEC MPI2007 documents on this web page assume that you are reading the page from a directory that also contains the other SPEC MPI2007 documents. If by some chance you are reading this web page from a location where the links do not work, try accessing the referenced documents at one of the following locations:
Installation and use are covered in detail in the SPEC MPI2007 User Documentation. The basic steps are:
If you wish to generate results suitable for quoting in public, you will need to carefully study and adhere to the run rules.
MPI2007 is composed of MPI-parallel compute-intensive applications provided as source code. 2 are written in C, 6 are written in Fortran, 4 combine C and Fortran, and 1 is written in C++. The benchmarks are:
104.milc | C | Physics: Quantum Chromodynamics (QCD) |
107.leslie3d | Fortran | Computational Fluid Dynamics (CFD) |
113.GemsFDTD | Fortran | Computational Electromagnetics (CEM) |
115.fds4 | Fortran | Computational Fluid Dynamics (CFD) |
121.pop2 | C/Fortran | Climate Modeling |
122.tachyon | C | Graphics: Parallel Ray Tracing |
126.lammps | C++ | Molecular Dynamics Simulation |
127.wrf2 | C/Fortran | Weather Prediction |
128.GAPgeofem | C/Fortran | Heat Transfer using Finite Element Methods (FEM) |
129.tera_tf | Fortran | 3D Eulerian Hydrodynamics |
130.socorro | C/Fortran | Molecular Dynamics using Density-Functional Theory (DFT) |
132.zeusmp2 | Fortran | Physics: Computational Fluid Dynamics |
137.lu | Fortran | Matrix Decomposition |
Descriptions of the benchmarks, with reference to papers, web sites, and so forth, can be found in the individual benchmark descriptions (click the links above). Some of the benchmarks also provide additional details, such as documentation from the original program, in the nnn.benchmark/Docs directories in the SPEC benchmark tree.
The numbers used as part of the benchmark names provide an identifier to help distinguish programs from one another. For example, some programs in SPEC CPU2006 derive from the same sources and need to be distinguished from the MPI2007 versions. Note: even if a program has the same name as in another suite - for example, 127.wrf2 vs. 361.wrf_m from the HPC2002 suite - the updated workload and updated source code mean that it is not valid to compare the SPEC MPI2007 result to the result with the older SPEC HPC2002 benchmark.
The runspec utility leaves a file with a path result/MPIM2007.*.rsf. If the run was reportable, you can mail this file (or attach it to a message) to [email protected].
The SPEC Private website http://pro.spec.org/private/hpg/submit/mpi2007/UnderReview contains links to each report that is being reviewed, in various formats. You will need to log on with your company password. Download the sub file, edit it, and mail it to [email protected]. Note that the file will not be accepted if you edit anything below the line
# =============== do not edit below this point ===================
After the benchmarks are run on the system under test (SUT), a ratio for each of them is calculated using the run time on the SUT and a SPEC-determined reference time. From these ratios, the following metrics are calculated:
Larger data sets will be added later on, with the metrics
In all cases, a higher score means "better performance" on the given workload.
In order to provide comparisons across different computer hardware, SPEC provides the benchmarks as source code. Thus, in order to run the benchmarks, they must be compiled. There is agreement that the benchmarks should be compiled the way users compile programs. But how do users compile programs?
Some users might experiment with many different compilers and compiler flags to achieve the best performance, and may be willing to develop multi-step make processes and "training" workloads.
Other users might prefer the relative simplicity of using a single set of switches and a single-step make process.
In addition to the above, a wide range of other types of usage models could also be imagined, ranging in a continuum from -Odebug at the low end, to inserting directives and/or re-writing the source code at the high end. Which points on this continuum should SPEC MPI2007 allow?
SPEC recognizes that any point chosen from that continuum might seem arbitrary to those whose interests lie at a different point. Nevertheless, choices must be made.
For MPI2007, SPEC has chosen to allow two types of compilation:
The base metrics (e.g. SPECmpiM_base2007) are required for all reported results and have stricter guidelines for compilation. For example, the same flags must be used in the same order for all benchmarks of a given language. This is the point closer to those who might prefer a relatively simple build process.
The peak metrics (e.g. SPECmpiM_peak2007) are optional and have less strict requirements. Different compiler options may be used on each benchmark. This point is closer to those who may be willing to invest more time and effort in development of build procedures.
Note that options allowed under the base metric rules are a subset of those allowed under the peak metric rules. A legal base result is also legal under the peak rules but a legal peak result is NOT necessarily legal under the base rules.
A full description of the distinctions and required guidelines can be found in the SPEC MPI2007 Run and Reporting Rules.
The MPI2007 suites include test and train data sets for the purpose of validating the correctness of the compile and the run environment. Note that the term train is a holdover from previous suites that allowed feedback-directed optimization using measurements from training runs. For the purpose of MPI2007 it refers to a larger testing data set.
The mref data-set is the largest and is used in reportable runs. Since it runs longer it gives a clearer picture of system performance. In the future it will be supplemented with even larger lref and xref data sets.
A base measurement is required in every reportable run. The corresponding peak measurement is optional but is useful to show how high performance your system can achieve.
When the larger lref and xref data sets become available, you will have to decide which to use. The larger data sets will scale to larger machines and so give a more accurate picture of the performance on large problems. On the other hand, more results will have been reported with the mref data set size, so it will provide a more common reference point of comparison.
Technology is always changing. As the technology progresses, the benchmarks have to adapt to this. SPEC needed to address the following issues:
Application type:
Many native MPI-parallel applications have been developed for cluster systems and are widely used in industry and
academia. SPEC feels that standard benchmarks need to be available for comparing cluster systems for the same reason that
SPEC CPU was developed for measuring serial CPU performance and SPEC OMP was developed for measuring SMP performance.
Moving target:
OMP2001 has been available for SMP systems for six years. In the meantime, clusters have become increasingly
popular as a low-cost and flexible alternative to configure parallel systems. The HPC2002 suite allows MPI parallelism but is
not an adequate bridge because it only contains 3 benchmarks, has too short of a runtime, and is not as strictly
standardized as MPI2007 or OMP2001.
Application size:
As applications grow in complexity and size, older suites become less representative of what runs on current systems.
For MPI2007, SPEC included some programs with both larger resource requirements and more complex source code than previous
suites.
Run-time:
As of spring, 2007, many of the OMP2001 benchmarks are finishing in less than 5 minutes on leading-edge
processors/systems. Small changes or fluctuations in system state or measurement conditions can therefore have
significant impacts on the percentage of observed run time. SPEC chose to make run times for the CPU2006 and MPI2007 benchmarks longer
to take into account future performance and prevent this from being an issue for the lifetime of the suites.
The HPC2002 suite will be retired at the point that MPI2007 is released. No further results will be accepted for publication. The MPI2007 suite does a better job of measuring and standardizing MPI performance.
There is no formula for converting CPU2006, OMP2001 or any other measurements to MPI2007 results and vice versa; they are different products. We expect some correlation between any two given suites, i.e. machines with higher results with one suite tend to have higher results with another suite, but there is no universal formula for all systems.
SPEC encourages SPEC licensees to publish MPI2007 numbers on older platforms to provide a historical perspective on performance.
Many of the SPEC benchmarks are derived from publicly available application programs. The individual benchmarks in this suite may be similar, but are NOT identical to benchmarks or programs with similar names which may be available from sources other than SPEC. In particular, SPEC has invested significant effort to improve portability and to minimize hardware dependencies, to avoid unfairly favoring one hardware platform over another. For this reason, the application programs in this distribution may perform differently from commercially available versions of the same application.
Therefore, it is not valid to compare SPEC MPI2007 benchmark results with anything other than other SPEC MPI2007 benchmark results.
In the process of selecting applications to use as benchmarks, SPEC considered the following criteria:
Some of the benchmarks in CPU2006 and MPI2007 derive from the same source codes. CPU2006 benchmarks are serialized versions of the applications while the corresponding MPI2007 benchmarks preserve the original MPI-parallel nature of the applications. They all have been given different workloads to better excercise a large parallel machine, and in some cases have additional source-code modifications for parallel scalability and portability. Therefore, for example, results with the CPU2006 benchmark 433.milc may be strikingly different from results with the MPI2007 benchmark 104.milc.
One benchmark, 127.wrf2, is derived from a more current version of the WRF source than is 361.wrf_m from the earlier HPC2002 suite. Its larger workload is better for excercising the performance of modern parallel systems. Further, the HPC2002 rules allow the use of OMP parallelism, while this capability has been removed from the MPI2007 source code. So, again, results with the HPC2002 benchmark 361.wrf_m may be strikingly different from results with the MPI2007 benchmark 127.wrf2.
Many applications in the CPU2006 suite were not designed to run with MPI parallelism, so would not realistically measure MPI performance. The benchmarks in the HPC2002 and OMP2001 suites either
SPEC uses a reference machine to normalize the performance metrics used in the MPI2007 suites. Each benchmark is run and measured on this machine to establish a reference time for that benchmark. These times are then used in the SPEC calculations.
As a reference machine, SPEC/MPI2007 uses an 8-node cluster of Celestica A2210 (AMD "Serenade") systems connected by a TCP (GigE) Interconnect. Each A2210 system contains two 940 sockets, each holding one single-core AMD Opteron 848 processor chip running at 2200 MHz with 1 MB I+D L2 cache, plus 4 GB of DDR 333 memory per socket. The MPI implementation is MPICH2 version 1.0.3 running on SLES 9 SP3 Linux OS with Pathscale 2.5 compilers.
Note that this machine differs dramatically from the ones used for SPEC/OMP and SPEC/CPU, since the MPI2007 suites represent a fundamentally different class of applications.
Note also that when comparing any two systems measured with the MPI2007, their performance relative to each other would remain the same even if a different reference machine was used. This is a consequence of the mathematics involved in calculating the individual and overall (geometric mean) metrics.
The reference cluster takes about 1 hour to build the base versions of the MPI2007 benchmark executables and 24 hours to finish a rule-conforming (2 iteration) run of the base metrics for the medium sized workload.
This depends on the data set size, the compiler, and the machine that is running the benchmarks. The reference cluster was sold circa 2003 and is correspondingly slower than contemporary machines, so expect a 2 iteration base run of the medium workload to take less than 24 hours on a 16-core system.
Expect larger data set sizes to take longer to process than the medium sized data set.
Compile times can vary markedly between compilers offering different degrees of optimization. Likewise, the choice of compiler flags will affect compile times.
To generate rule-compliant results, an approved toolset must be used. If several attempts at using the SPEC-provided tools are not successful, you should contact SPEC for technical support. SPEC may be able to help you, but this is not always possible -- for example, if you are attempting to build the tools on a platform that is not available to SPEC.
If you just want to work with the benchmarks and do not care to generate publishable results, SPEC provides information about how to do so.
Results for measurements submitted to SPEC are available at http://www.spec.org/mpi2007.
Yes, SPEC MPI2007 results can be freely published if all the run and reporting rules have been followed. The MPI2007 license agreement binds every purchaser of the suite to the run and reporting rules if results are quoted in public. A full disclosure of the details of a performance measurement must be provided on request.
SPEC strongly encourages that results be submitted for publication on SPEC's web site, since it ensures a peer review process and uniform presentation of all results.
The run and reporting rules for research and and academic contexts recognize that it may not be practical to comply with the full set of rules in some contexts. It is always required, however, that non-compliant results must be clearly distinguished from rule-compliant results.
SPEC can be contacted in several ways. For general information, including other means of contacting SPEC, please see SPEC's Web Site at:
General questions can be emailed to:
info@spec.org
MPI2007 Technical Support Questions can be sent to:
mpi2007support@spec.org
If you haven't bought MPI2007, it is hoped that you will consider doing so. If you are ready to get started using the suite, then you should pick a system that meets the requirements as described in
and install the suite, following the instructions in
install-guide-unix.html or
install-guide-windows.html
Questions and answers were prepared by Kaivalya Dixit of IBM, Jeff Reilly of Intel Corp, and John Henning of Sun Microsystems, and adapted to SPEC MPI2007 by Carl Ponder of IBM. Dixit was the long-time President of SPEC, Reilly is Chair of the SPEC CPU Subcommittee, Henning is Vice-Chair/Secretary of the SPEC CPU Subcommittee, and Ponder is the HPG representative for IBM.
Copyright (C) 1995-2007 Standard Performance Evaluation Corporation
All Rights Reserved