We are using the torpedo Linux BSP v2.4.4, and arm-none-linux-gnueabi-g++ (Sourcery CodeBench Lite 2011.09-70) v4.6.1
We have a series of routines that perform many floating point (float) multipy and accumulate operations. We are currently using the -mfpu=vfpv3 option, but want to leverage the SIMD capabilities of the NEON core.
We are following the guidance from here:
http://processors.wiki.ti...x-A8#What_is_Neon.3F
When we use the following compiler options
- arm-none-linux-gnueabi-gcc -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
However, there appears to be no improvement in the processing time. We are guessing the NEON co-processor is not enabled. Do we need to invoke our own assembly instructions as the TI guidance specifies? Or is there a kernel or BSP configuration options that we use to turn on the NEON core?
Here is the reuired assembly code to enable NEON/VFP per the TI guidance.
MRC p15, #0, r1, c1, c0, #2 ; r1 = Access Control Register
ORR r1, r1, #(0xf << 20) ; enable full access for p10,11
MCR p15, #0, r1, c1, c0, #2 ; Access Control Register = r1
MOV r1, #0
MCR p15, #0, r1, c7, c5, #4 ; flush prefetch buffer because of FMXR below
; and CP 10 & 11 were only just enabled
; Enable VFP itself
MOV r0,#0x40000000
FMXR FPEXC, r0 ; FPEXC = r0
Finally, is there an easy way to tell if the NEON or VFP is being used during runtime? Other than the processing improvement, we want to make sure it is enabled when we start our application code. Thanks!