Baseline flags:
C: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch=2,
preex,a8,mfunc,ilfunc -O5 -x-
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=4,
prefetch_cache_level=3,mfunc=2 -O5 -Nautoobjstack
Portability Flags:
318.galgel_m: -Am -Fixed -w
328.fma3d_m: -Am
Extra Flags:
330.art_m: -DINTS_PER_CACHELINE=16 -DDBLS_PER_CACHELINE=8
Peak Optimization Flags:
310.wupwise_m:
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=7,
prefetch_cache_level=3,mfunc=2,ilfunc -O5 -Nautoobjstack
312.swim_m:
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=7,
prefetch_cache_level=3,mfunc=2 -O5 -Nautoobjstack
316.applu_m:
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=4,
prefetch_cache_level=3,mfunc=2 -O5 -Nautoobjstack
-Kunroll=4
318.galgel_m:
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=8,
prefetch_cache_level=3,mfunc=2 -O5 -Nautoobjstack
-Kunroll=4,commonpad=16
326.gafort_m:
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=4,
prefetch_cache_level=3,mfunc=2 -O5 -Nautoobjstack
-Kunroll=4
328.fma3d_m:
F90: -KOMP,fast_GP2=2,V9,hardbarrier,largepage=1,prefetch_line=6,
prefetch_cache_level=3,mfunc=2 -O5 -Nautoobjstack
-Kcommonpad=8,prefetch_infer
Alternate sources:
Add critical region around update of linked list in parallel loop.
Compulsory src.alt available as ompm-purdue1-20040324.tar.gz
Used for 330.art_m base and peak.
Peak sources:
SPEC OMPL2001 source for 64bit systems modified for SPEC OMPM2001.
Available as ompl src.alt in the SPEC OMP v3.0 release.
Used for 320.equake_m, 326.gafort_m, and 328.fma3d_m.
System tunables:
/etc/system:
set shmsys:shminfo_shmmax=2147483648
set shmsys:shminfo_shmmni=256
set shmsys:shminfo_shmseg=256
set autoup=172800
set memscrub_period_sec=345600
/etc/opt/FJSVpnrm/lpg.conf:
JOB=408G,SHMSEGSIZE=2048M
/etc/opt/FJSVpnrm/cpursc.conf:
CPU_USE=0,1:0,1
NQS:
Per-process data size limit = UNLIMITED
Per-process permanent file size limit = UNLIMITED
Per-process memory size limit = 128 gigabytes
Per-request memory size limit = 128 gigabytes
Per-process number of cpus limit = 126
Per-process stack size limit = 4 gigabytes
Per-process CPU time limit = 2147483646.000
Execution mode = SImplex
Jobclass = 0
Submitting the runspec to NQS:
Run the qsub command with the following sh script.
cd /spec/omp2001
. ./shrc
OMP_NUM_THREADS=124
export OMP_NUM_THREADS
runspec --config=Fujitsu --reportable --tune=all medium
|