Hi Mike
Below is the disassembled output of my main (I hope this was what you wanted).
main
$a
i.main
0xc00c03f0: e10f0000 .... MRS r0,CPSR
0xc00c03f4: e38000c0 .... ORR r0,r0,#0xc0
0xc00c03f8: e121f000 ..!. MSR CPSR_c,r0
0xc00c03fc: e59f103c <... LDR r1,[pc,#60] ; [0xc00c0440] = 0x80003000
0xc00c0400: e3a00000 .... MOV r0,#0
0xc00c0404: e581001c .... STR r0,[r1,#0x1c]
0xc00c0408: e3a00102 .... MOV r0,#0x80000000
0xc00c040c: e5901e2c ,... LDR r1,[r0,#0xe2c]
0xc00c0410: e3c11003 .... BIC r1,r1,#3
0xc00c0414: e5801e2c ,... STR r1,[r0,#0xe2c]
0xc00c0418: e5901e24 $... LDR r1,[r0,#0xe24]
0xc00c041c: e3811003 .... ORR r1,r1,#3
0xc00c0420: e5801e24 $... STR r1,[r0,#0xe24]
0xc00c0424: e5901e20 ... LDR r1,[r0,#0xe20]
0xc00c0428: e3811003 .... ORR r1,r1,#3
0xc00c042c: ea000001 .... B 0xc00c0438 ; main + 72
0xc00c0430: e5901e20 ... LDR r1,[r0,#0xe20]
0xc00c0434: e2211003 ..!. EOR r1,r1,#3
0xc00c0438: e5801e20 ... STR r1,[r0,#0xe20]
0xc00c043c: eafffffb .... B 0xc00c0430 ; main + 64
$d
The last 4 lines are the loop. I am starting to think it may just be a limitation of the peripheral bus that is producing the "slow" execution. The reason I am thinking this is the fact that the above doesn't produce a perfect square wave - the high / low times are 180ns / 200ns. If I insert a few extra lines of code to increase the execution time of the loop the waveform does take on a 50% duty cycle. To me this implies that the core is having to wait for the peripheral bus when the loop time is short.
Please don't waste too much of your time on this. The reason I have been looking at this is to get a handle on the processing speed. I don't really need to be toggling a port pin at these sorts of rates - I would be curious to know how fast you can get a port pin toggling though.
Cheers
Colin