Copyright © 2006 Intel Corporation. All Rights Reserved.
Platform settings
One or more of the following settings may have been set. If so, the "General Notes" section of the report will say so; and you can read below to find out more about what these settings mean.
KMP_STACKSIZE
Specify stack size to be allocated for each thread.
KMP_AFFINITY
KMP_AFFINITY = < physical | logical >, starting-core-id
specifies the static mapping of user threads to physical cores. For example,
if you have a system configured with 8 cores, OMP_NUM_THREADS=8 and
KMP_AFFINITY=physical,0 then thread 0 will mapped to core 0, thread 1 will be mapped to core 1, and
so on in a round-robin fashion.
OMP_NUM_THREADS
Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows). Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8
Hardware Prefetcher:
This BIOS option allows the enabling/disabling of a processor mechanism to prefetch data into the cache according to a pattern-recognition algorithm.
In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.
Adjacent Cache Line Prefetch:
This BIOS option allows the enabling/disabling of a processor mechanism to fetch the adjacent cache line within an 128-byte sector that contains the data needed due to a cache line miss.
In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.
High Bandwidth Mode:
Enabling this option allows the chipset to defer memory transactions and process them out of order for optimal performance.
ulimit -s <n>
Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
submit= MYMASK=`printf '0x%x' $((1<<$SPECCOPYNUM))`; /usr/bin/taskset $MYMASK $command
When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors. This specific submit command is used for Linux. The description of the elements of the command are:
submit= $[top]/mysubmit.pl $SPECCOPYNUM "$command"
On Xeon 74xx series processors, some benchmarks at peak will run n/2 copies on a system with n logical processors.
The mysubmit.pl script assigns each copy in such a way that no two copies will share an L2 cache, for optimal performance.
The script looks in /proc/cpuinfo to come up with the list of cores that will satisfy this requirement.
The source code is shown below.
Source
******************************************************************************************************
#!/usr/bin/perl use strict; use Cwd; # The order in which we want copies to be bound to cores # Copies: 0, 1, 2, 3 # Cores: 0, 1, 3, 6 my $rundir = getcwd; my $copynum = shift @ARGV; my $i; my $j; my $tag; my $num; my $core; my $numofcores; my @proc; my @cores; open(INPUT, "/proc/cpuinfo") or die "can't open /proc/cpuinfo\n"; #open(OUTPUT, "STDOUT"); # proc[i][0] = logical processor ID # proc[i][1] = physical processor ID # proc[i][2] = core ID $i = 0; $numofcores = 0; while(<INPUT>) { chop; ($tag, $num) = split(/\s+:\s+/, $_); if ($tag eq "processor") { $proc[$i][0] = $num; } if ($tag eq "physical id") { $proc[$i][1] = $num; } if ($tag eq "core id") { $proc[$i][2] = $num; $i++; $numofcores++; } } $i = 0; $j = 0; for $core (0, 4, 2, 1, 5, 3) { while ($i < $numofcores) { if ($proc[$i][2] == $core) { $cores[$j] = $proc[$i][0]; $j++; } $i++; } $i=0; } open RUNCOMMAND, "> runcommand" or die "failed to create run file"; print RUNCOMMAND "cd $rundir\n"; print RUNCOMMAND "@ARGV\n"; close RUNCOMMAND; system 'taskset', '-c', $cores[$copynum], 'sh', "$rundir/runcommand";