Dear all,
As in the BSP 2.2 it is possible to build de the C6Run package, we've started to run the C6Runlib
examples, to compare the performance of ARM vs DSP in calculations. Expecting to see better
performance when calculations are done by DSP than ARM, we've found the oposite.
Following there are the results of the execution of cfft example program:
Quote:
DM-37x# ./cfft_arm
N=16,nTimes=100: 0.001007 s
N=32,nTimes=100: 0.002045 s
N=64,nTimes=100: 0.004791 s
N=128,nTimes=100: 0.010772 s
N=256,nTimes=100: 0.024628 s
N=512,nTimes=100: 0.055725 s
N=1024,nTimes=100: 0.124604 s
N=2048,nTimes=100: 0.272583 s
N=4096,nTimes=100: 0.596222 s
N=8192,nTimes=100: 1.29132 s
N=16384,nTimes=100: 2.74329 s
DM-37x# ./cfft_dsp
N=16,nTimes=100: 0.126648 s
N=32,nTimes=100: 0.14206 s
N=64,nTimes=100: 0.177978 s
N=128,nTimes=100: 0.260376 s
N=256,nTimes=100: 0.451263 s
N=512,nTimes=100: 0.872955 s
N=1024,nTimes=100: 1.81073 s
N=2048,nTimes=100: 3.87048 s
N=4096,nTimes=100: 8.3595 s
N=8192,nTimes=100: 18.1759 s
N=16384,nTimes=100: 39.5378 s
Segmentation fault
Thinking that in BSP, no NEON nor CortexA8 optimization options are provided, it is strange that DSP performance is worse than ARM performance. Do you get the same behavior in your tests?
Thanks and Best Regards,
Joaquim Duran