Search

Technical Discussion Group Forum

This forum is provided for user discussion. While Beacon EmbeddedWorks support staff and engineers participate, Beacon EmbeddedWorks does not guarantee the accuracy of all information within in the Technical Discussion Group (TDG).

The "Articles" forums provide brief Articles written by Beacon EmbeddedWorks engineers that address the most frequently asked technical questions.

To receive email notifications when updates are posted for a Beacon EmbeddedWorks product download, please subscribe to the TDG Forum of interest.

TDG Forum

PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 20 Jan 2005 07:20 PM by  colin howard
Execution speed on the SDK
 5 Replies
Sort:
You are not authorized to post a reply.
Author Messages
colin howard
New Member
New Member
Posts:


--
16 Jan 2005 04:39 PM
    Hi

    I am trying to get a handle on execution speed on my SDK. I have written some extremely simple code to toggle a port:


    while (1)
    {
    GPIO->pedr ^= 0x03;
    }


    This is done with interrupts off and is running from external RAM (0xc00c0000 ...) after being loaded in via LOLO. The info from LOSH is

    sect c0000000 -> c0000000 WBC dom:4

    Which I think tells me that cache is enabled for this range.

    The execution time of one pass of the loop is 180ns as measured with a CRO. My code does not modify any clock settings on the chip and relies on the initialisation of BOLO/LOLO. I have confirmed by register inspection and using the CRO that the PLL has been set up to 200MHz. 180ns seems a bit slow to execute 4 instructions (I checked the assembler listing for this)

    Am I missing something or is this reasonable? (BTW - I'll admit I'm new to the ARM)

    Cheers
    Colin H.
    Anonymous
    Posts:


    --
    17 Jan 2005 12:41 PM
    Hello,
    From the Sharp A400 users guide I found the following definition for the
    Advanced Peripheral Bus: "Defined in the AMBA specification, the APB connects the lower performance peripheral blocks. In the LH7A400, the APB connects the RTC, the WDT, the Timers, the GPIO, the SSP, the BMI, the UARTS, the USB Interface, the MMC, the Audio Codec, the AC97, the SCI, the DC-DC Interface, and the Interrupt Controller. The APB connects to the AHB via the APB Bridge."

    Logic has set the PCLK up to run at 50MHZ.

    I was unable to find a guaranteed throughput or any timing diagrams for the GPIO in the Sharp manuals.

    The fastest possible external access would use an area or a peripheral connected to the AHB which uses the HCLK. Logic has set the HCLK up to run at 100MHZ. Timing diagrams for the different memory areas and some of the AHB peripherals can be found in the Sharp LH7A400 technical data sheet.

    Bruce
    mikee@logicpd.com
    New Member
    New Member
    Posts:


    --
    18 Jan 2005 01:36 PM
    colin,

    One question, did you launch your application using 'exec' or 'jump' from the LOSH prompt?

    If you used 'jump', then LoLo is still running in the background. So you will have interrupts happening every millisecond, etc.

    If you used 'exec', then you are correct that just your code should be running. However, 'exec' flushes and turns off the cache routines. Therefore, you are no longer running with the cache on. If you want to turn on the cache but also want to use the 'exec' command, you will have to do so yourself. If you don't mind the overhead of LoLo running in the background, you can use its API and the cache routines it provides.

    Bruce's explanation concerning accesses to the GPIO peripheral via the AHB bus still stands.

    Regards,
    --mikee
    colin howard
    New Member
    New Member
    Posts:


    --
    18 Jan 2005 06:28 PM
    Hi Bruce/Mike

    Thanks for you follow up.

    I have had a look at the data sheet and can't see anything that would explain the performance (then again it's a long time since I have had to look at timing diagrams in this detail - if you can point me at a specific diagram I would be eternally grateful). I would expect that with the peripheral clock at 50MHz any sequential GPIO accesses could only occur on successive peripheral clock edges, ie every 20ns. A read followed by a write (which is what the code does) would then require 40ns of "waiting" to sync with the peripheral bus. - I could be completely off track here so correct me if so.

    As far as Mikes question goes, I am using a jump so that for now I don't need to worry about cache init, MMU etc. I use the Sharp ABL disable_irq_fiq() macro:


    asm
    (
    "MRS %0, CPSR" "\n\t"
    "ORR %1, %0, #0xC0" "\n\t"
    "MSR CPSR_c, %1" "\n\t"
    : "=r" (ret), "=r" (tmp)
    );


    to disable interrupts, so I don't think LOLO could be interfering too much. Even if it was I would expect to only see "hits" to the execution time of my loop every 1ms or whatever LOLO has the timer set for, however the waveform on the CRO looks pretty rock steady.

    Any futher pointers would be greatly appreciated.

    Cheers
    Colin
    mikee@logicpd.com
    New Member
    New Member
    Posts:


    --
    19 Jan 2005 09:54 AM
    colin,

    Can you post the assembled C-code here. Also, are any other interfaces running? That is to say, do you have an LCD screen running, or the Ethernet interface enabled, etc?

    --mikee
    colin howard
    New Member
    New Member
    Posts:


    --
    20 Jan 2005 07:20 PM
    Hi Mike

    Below is the disassembled output of my main (I hope this was what you wanted).


    main
    $a
    i.main
    0xc00c03f0: e10f0000 .... MRS r0,CPSR
    0xc00c03f4: e38000c0 .... ORR r0,r0,#0xc0
    0xc00c03f8: e121f000 ..!. MSR CPSR_c,r0
    0xc00c03fc: e59f103c <... LDR r1,[pc,#60] ; [0xc00c0440] = 0x80003000
    0xc00c0400: e3a00000 .... MOV r0,#0
    0xc00c0404: e581001c .... STR r0,[r1,#0x1c]
    0xc00c0408: e3a00102 .... MOV r0,#0x80000000
    0xc00c040c: e5901e2c ,... LDR r1,[r0,#0xe2c]
    0xc00c0410: e3c11003 .... BIC r1,r1,#3
    0xc00c0414: e5801e2c ,... STR r1,[r0,#0xe2c]
    0xc00c0418: e5901e24 $... LDR r1,[r0,#0xe24]
    0xc00c041c: e3811003 .... ORR r1,r1,#3
    0xc00c0420: e5801e24 $... STR r1,[r0,#0xe24]
    0xc00c0424: e5901e20 ... LDR r1,[r0,#0xe20]
    0xc00c0428: e3811003 .... ORR r1,r1,#3
    0xc00c042c: ea000001 .... B 0xc00c0438 ; main + 72
    0xc00c0430: e5901e20 ... LDR r1,[r0,#0xe20]
    0xc00c0434: e2211003 ..!. EOR r1,r1,#3
    0xc00c0438: e5801e20 ... STR r1,[r0,#0xe20]
    0xc00c043c: eafffffb .... B 0xc00c0430 ; main + 64
    $d


    The last 4 lines are the loop. I am starting to think it may just be a limitation of the peripheral bus that is producing the "slow" execution. The reason I am thinking this is the fact that the above doesn't produce a perfect square wave - the high / low times are 180ns / 200ns. If I insert a few extra lines of code to increase the execution time of the loop the waveform does take on a 50% duty cycle. To me this implies that the core is having to wait for the peripheral bus when the loop time is short.

    Please don't waste too much of your time on this. The reason I have been looking at this is to get a handle on the processing speed. I don't really need to be toggling a port pin at these sorts of rates - I would be curious to know how fast you can get a port pin toggling though.

    Cheers
    Colin
    You are not authorized to post a reply.