Search

Technical Discussion Group Forum

This forum is provided for user discussion. While Beacon EmbeddedWorks support staff and engineers participate, Beacon EmbeddedWorks does not guarantee the accuracy of all information within in the Technical Discussion Group (TDG).

The "Articles" forums provide brief Articles written by Beacon EmbeddedWorks engineers that address the most frequently asked technical questions.

To receive email notifications when updates are posted for a Beacon EmbeddedWorks product download, please subscribe to the TDG Forum of interest.

TDG Forum

PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 06 Jul 2006 01:31 PM by  peter.barada@logicpd.com
Performance. Coldfire vs ARM Is this as good as it gets?
 1 Replies
Sort:
You are not authorized to post a reply.
Author Messages
tbob
New Member
New Member
Posts:


--
27 Jun 2006 07:45 AM
    Hello All,
    A colleague runs the included c++ program on his ARM system and gets times around 3.6 seconds. On my logicpd development board I'm getting about 16s. I know there are lots of variables here but I'm hoping to get an idea from the group as to whether we should expect more from the coldfire or not.
    Thanks

    The arm system:
    # uname -a
    Linux (none) 2.6.17 #7 Fri Jun 23 15:54:11 UTC 2006 armv4tl unknown
    # cat /proc/cpuinfo
    Processor : ARM920Tid(wb) rev 0 (v4l)
    BogoMIPS : 88.47
    Features : swp half thumb
    CPU implementer : 0x41
    CPU architecture: 4T
    CPU variant : 0x1
    CPU part : 0x920
    CPU revision : 0
    Cache type : write-back
    Cache clean : cp15 c7 ops
    Cache lockdown : format A
    Cache format : Harvard
    I size : 16384
    I assoc : 64
    I line length : 32
    I sets : 8
    D size : 16384
    D assoc : 64
    D line length : 32
    D sets : 8


    The logicpd coldfire system:
    [root@localhost perftest]# uname -a
    Linux localhost 2.4.26 #3 Tue May 2 11:58:17 EDT 2006 m68k unknown
    [root@localhost perftest]# cat /proc/cpuinfo
    CPU: MCF5484 Rev. 1
    MMU: ColdFire V4e
    FPU: ColdFire V4e
    Clocking: 99.9MHz
    BogoMips: 199.88
    Calibration: 999424 loops

    The benchmark:
    #include <cstdio>
    #include <cstdlib>
    #include <string>
    #include <list>

    #include <sys/time.h>


    static void fill_list(std::list<std::string> &slist)
    {
    char tmpbuf[256];

    // Create string list with dupes
    for (int i=0; i<2000; i++) {
    sprintf(tmpbuf, "String number %d\n", rand()%500);
    slist.push_back(tmpbuf);
    }

    // Remove duplicate entries
    slist.unique();

    // Sort the list
    slist.sort();
    }


    int main()
    {
    std::list<std::string> slist;
    struct timeval tv1, tv2;
    float time;

    srand(0);

    gettimeofday(&tv1, NULL);

    // Fill out the list then eliminate dupes 50 times
    for (int i=0; i<50; i++)
    fill_list(slist);

    gettimeofday(&tv2, NULL);

    time=(tv2.tv_sec-tv1.tv_sec)*1000000;
    time+=(tv2.tv_usec-tv1.tv_usec);
    time/=1000000;
    printf("Total time: %.4f seconds\n", time);

    return 0;
    }
    peter.barada@logicpd.com
    New Member
    New Member
    Posts:72


    --
    06 Jul 2006 01:31 PM
    I ran the program (compiled using g++-3.4.3, linked to glibc-2.3.5) on a 5475 board(266Mhz) w/linux-2.4.26, and get a total time of 9.62 seconds, of which 7.05 seconds is in user space, and the other 2.57 seconds is spent in the kernel.

    The ColdFire MMU requires support code to handle page table walks, wheras the ARM has hardware support for TLB (transition lookaside buffers) walks, so this explains the 2+ seconds in the kernel since there are 1887570 page misses (a page is 8K). Each miss takes on average 1.361 microseconds to handle. The program has *very* poor reference locality.

    Part of the TLB code in the kerenl has to flush the cache upon handling the TLB miss since when a page is removed from the hardware MMU, cache lines for that page need to be flushed/invalidated. Currently I believe that the *entire* cache is flushed/invalidated instead of only the affected lines since its simpler to do. It is an area of the kernel that is ripe for performance review.

    Which compile,r and libstdc++ did your friend compile the test code for the ARM? WHich ARM is it? Which kernel is it running on?
    You are not authorized to post a reply.