Search

Technical Discussion Group Forum

This forum is provided for user discussion. While Beacon EmbeddedWorks support staff and engineers participate, Beacon EmbeddedWorks does not guarantee the accuracy of all information within in the Technical Discussion Group (TDG).

The "Articles" forums provide brief Articles written by Beacon EmbeddedWorks engineers that address the most frequently asked technical questions.

To receive email notifications when updates are posted for a Beacon EmbeddedWorks product download, please subscribe to the TDG Forum of interest.

TDG Forum

PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 01 Feb 2018 03:38 PM by  Adam Ford
How to enable NEON SIMD via compiler
 1 Replies
Sort:
You are not authorized to post a reply.
Author Messages

Erick Roane



New Member


Posts:1
New Member


--
01 Nov 2017 02:51 PM

    We are using the torpedo Linux BSP v2.4.4, and arm-none-linux-gnueabi-g++ (Sourcery CodeBench Lite 2011.09-70) v4.6.1

    We have a series of routines that perform many floating point (float) multipy and accumulate operations.  We are currently using the -mfpu=vfpv3 option, but want to leverage the SIMD capabilities of the NEON core. 

    We are following the guidance from here:

    http://processors.wiki.ti...x-A8#What_is_Neon.3F

    When we use the following compiler options

    • arm-none-linux-gnueabi-gcc -O3 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp

    However, there appears to be no improvement in the processing time.  We are guessing the NEON co-processor is not enabled.  Do we need to invoke our own assembly instructions as the TI guidance specifies?  Or is there a kernel or BSP configuration options that we use to turn on the NEON core?

    Here is the reuired assembly code to enable NEON/VFP per the TI guidance. 

         MRC p15, #0, r1, c1, c0, #2 ; r1 = Access Control Register
         ORR r1, r1, #(0xf << 20) ; enable full access for p10,11
         MCR p15, #0, r1, c1, c0, #2 ; Access Control Register = r1
         MOV r1, #0
         MCR p15, #0, r1, c7, c5, #4 ; flush prefetch buffer because of FMXR below
         ; and CP 10 & 11 were only just enabled
         ; Enable VFP itself
         MOV r0,#0x40000000
         FMXR FPEXC, r0 ; FPEXC = r0

    Finally, is there an easy way to tell if the NEON or VFP is being used during runtime?  Other than the processing improvement, we want to make sure it is enabled when we start our application code.  Thanks!


    Adam Ford



    Advanced Member


    Posts:788
    Advanced Member


    --
    01 Feb 2018 03:38 PM
    Erick,

    I apologize for now replying sooner. We've had intermittent forum outages, and I just not noticed this. Our 2.4-4 BSP has a speedetest App that compiles and tests the neon co-processor performance vs the ARM by itself.

    The speed test-neon app builds in multiple ways to compare the NEON vs ARM performance, but the following shows how it is compiled for NEON:

    make O=neon CFLAGS="$(CFLAGS) -DBUILD_NEON -O4 -fsigned-char -march=armv7-a -mtune=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ftree-vectorize -ffast-math -funsafe-math-optimizations -fsingle-precision-constant" speedtest-neon

    I would expect "-mfpu=neon -ftree-vectorize -ffast-math -funsafe-math-optimizations" to be the important parts, but I would explore that example app and it's compiler flags.

    adam
    You are not authorized to post a reply.