SEMCAD X
performance.jpg
Performance

Benchmark 1: CAD model of car, driver, and cell phone with simulated electric fields overlaid 


Benchmark 1: Detail of electric field in phone and hand


Benchmark 2: Human male head with deep brain stimulation implant and box


Benchmark 3: Full CAD of PCB board


Benchmark 4: Full detailed CAD of cell phone


Benchmark 5: Birdcage with human male model


Benchmark 6: CAD model of car with driver and bluetooth antennna


Fig. 1: Comparison of Simulation Size vs. Solver Speed in Fermi hardware


Fig. 2: Performance of the MPI-CPU Solver with Ben-Arabí Cluster FPCMur

PERFORMANCE

SEMCAD X sets new standards in computational electromagnetics (CEM) by offering some of the fastest finite-difference time-domain (FDTD) solvers and enhancements on the market. With awesome speedup, excellent RAM efficiency, and useful features like auto-termination, solving huge problems has never been so easy.

SEMCAD X was the first FDTD toolkit on the market to offer hardware acceleration. Hardware acceleration in SEMCAD is achieved with the Acceleware library or, as an alternative, with the new CUDA library available from SPEAG. With simulation speeds of 700 – 3000+ MCells/s, SEMCAD X is in a league of its own for solver performance.

Please contact us to request performance information related to your specific applications. The following examples highlight the advanced performance of SEMCAD X simulations performed with the Acceleware library 11.0.3 (Floriana) and the new SPEAG CUDA solver in very large, very complex geometries:

SEMCAD X Simulation Benchmark 1 Benchmark 2 Benchmark 3
Model car, driver, phone
head with implant
PCB
frequency (MHz) 1800
64 500 – 2000
No. time step 5896 406435 404530
computational domain (million cells)
296.6 355.8  286.2
excitation
antenna resonator coax line
ABC UPML 8 lyrs UPML 11 lyrs UPML 9 lyrs
Solver Performance Benchmark 1 Benchmark 2 Benchmark 3
Acceleware GPU Solver Speed (Mcell/s)* 3075 2815 2373
Acceleware GPU Solver Time (hh:mm) 00:59 28:40 120:11
SPEAG CUDA GPU Solver Speed (Mcell/s)* 2990 2782 2723
SPEAG CUDA  GPU Solver Time (hh:mm) 01:03 29:03 104:44
CPU Solver Speed (Mcell/s)** 27 19.8 15.6
CPU Solver Time (hh:mm)  52:21 2043:00 280:00
* System with four quad-core Intel Xeon CPU E5620 processors, 74 GB of RAM and four NVIDIA Tesla C2070
** HPxw9400 workstations with Dual Core AMD Opteron 2216 at 2.4 GHz and 16 GB of RAM

The performance power of SEMCAD X is also great for smaller simulation domains that fit in a single GPU. The examples following table were executed with a single NVIDIA Tesla C2070/C2075 card. For comparison with dual and quad-core architectures, see Fig. 1.


SEMCAD X Simulation Benchmark 4
Benchmark 5
Benchmark 6
Model cell phone birdcage with human male model
car, driver, Bluetooth antenna
frequency (MHz) 1130 – 2630 64 2100
No. time step 66622 31079 9600
computational domain (million cells)
11.34 21.97  23.65
excitation
antenna birdcage antenna
ABC UPML 8 lyrs UPML 8 lyrs UPML 6 lyrs
Solver Performance Benchmark 4
Benchmark 5
Benchmark 6
Acceleware GPU Solver Speed (Mcell/s) 469 519 503
Acceleware GPU Solver Time (hh:mm) 00:27 00:21 00:07
SPEAG CUDA GPU Solver Speed (Mcell/s) 593 651 565
SPEAG CUDA  GPU Solver Time (hh:mm) 00:20 00:18 00:06
CPU Solver Speed (Mcell/s) 15.5 20 17.2
CPU Solver Time (hh:mm)  13.34 09:29 03:39

NVIDIA's Fermi architecture adds ECC support and improved double precision throughput and overall performance. In addition, multiple cards can be connected in parallel to allow higher performance and larger domain sizes. The performance of the Tesla 20-Series GPU is markedly improved compared to that of the previous 10-Series, and positively eclipses the solver speed of CPU-only systems. Solving large-scale, high-resolution problems in CPU-based software becomes too burdensome to be practical, as demonstrated in Fig. 1.

SEMCAD X's numerical solver allows multiple computers to be networked in Beowulf-style clusters to run single FDTD simulations with the Acceleware Cluster library. The single simulation space is partitioned across multiple computer nodes in message passing interface (MPI) processes with time-stepped synchronization. This CPU cluster solution is now available for Linux 64-bit architectures.  

The Ben Arabi cluster at FPC Murcia, Spain consists of 102 nodes, for a total of 816 cores of Intel Xeon Quad-Core E5450@3GHz and 1072 GB of distributed memory. The generic phone model gridded at 195 Mcells was simulated as a benchmark in the Arabi cluster: solver speed as a function of  the number of cores executing the simulation is shown in Fig. 2.

 

 

 
created and designed by R.Ø.S.A.