Using swap, ext4, and the gcc-mips native compiler on the Ben--they all work!

Delbert Franz ddf at sonic.net
Thu Dec 2 15:11:02 EST 2010


I have been having fun, working on the Ben after not doing anything 
for some months.  Here are some highlights: 

1.  I noticed that there was a gcc-mips compiler option in the config 
of the new toolchain, so I added it to my image.  

2.  A recent post on the web found that ext4 was often significantly 
faster in writing and reading on solid-state devices.  So I did a "make 
kernel_menuconfig" and enabled ext4.  

3.  My 8 GB Transcend Micro SD card was divided into three partitions: 
about 500 MB on p1 using ext2, for the rootfs, about 70 MB on p2 for 
swap, and the remainder on p3 using ext4 for my main projects 
partition.  

4.  I changed /etc/inittab so that gmenu2x never gets launched: I 
prefer using the command line on such a small device, at least for my 
own work.  

5.  To enable swap and also to mount p3 on the card, I manually 
modified /etc/config/fstab to look like this: 

config global automount
	option from_fstab 1
	option anon_mount 1
	
config global autoswap
	option from_fstab 1
	option anon_swap 0
	
config mount
	option target	/pj
	option device	/dev/mmcblk0p3
	option fstype	ext4
	option options	rw,sync
	option enabled	1
	option enabled_fsck 0

config swap
	option device	/dev/mmcblk0p2
	option enabled	1

I know that the uci system exists for editing the config files but for 
the most part, I find it easier to just go in and change them 
directly:) 

The dev options look a bit "strange" but that is how Linux "talks" to 
devices like SD cards:)  There is only one slot on the Ben so it gets 
the number "0" and the partitions are numbered, starting at 1, 
 and prefixed with "p", for "partition". 

6.  To test gcc, I found and downloaded a C-language version of the 
long-used linpack benchmark for floating-point performance.  It has 
been in use for some decades and has a published list going back to 
computers from the 1970's at least, if not a bit earlier.  The 
collection of routines in this program has about 1200 lines of code, 
white space and comments.  Not very large but yet of interest to me.  
It compiled the first time with just 

gcc linpack_bench.c

and took about 20 seconds.  The program was set to solve a 1000 
equation linear system--that took close to 9 minutes to finish, so I 
reduced the equation count to 100 and then got times close to 1 
second!  However, the timer used in the code had a resolution of 0.01 
second, so I upped the size to 200 with a runtime close to 8 seconds.  
This reduced the timing noise and the Mflops reported are the average 
over five runs.  Here is a summary of what I found by experimenting 
with gcc options a bit: 

Options used                   Mflops      Compile(secs)   a.out size(bytes)
-----------------------------  ------     --------------   -----------------
none                           0.710          17           332878
-Os                            0.743          35           327136
-Os -march=mips2               0.750          33           327136
-0s -march=mips32              0.745          33           327136

Observations:  

a.  The optimizations do make a small difference.  I noted that the mips32 
option is used in the compilation of the toolchain as is Os.  O1, O2, 
O3 did give slightly poorer results than Os.  Thus seeking to make 
a.out smaller makes it a bit faster for this program.  

b.  Adding other options to Os did not change the size of a.out but 
did make tiny differences in the Mflops.  Based on using five 
observations and the pattern they followed, the differences could be 
real, even though small, but maybe not.  I didn't try to do a byte
compare on the a.out files:)

c.  Looking at the Mflops tabulated for the linpack benchmark, reveals 
that the Ben, for a single  double-precision floating point 
application, is only slightly slower than the IBM 370/165 mainframe 
from the 1970's, which showed 0.78 Mflops!  My current desktop, using 
an Intel i7-860 gets more than 1600 Mflops on this program.  So the 
Ben is slow relative to what we have today but it is fast and has 
large RAM compared to my early days in computing when RAM was measured 
in KB not MB.  GB was not even in the language yet:) 

7.  Finally, I wanted to see when swap would come into play.  
Compiling the benchmark did not even come close to showing any swap 
using htop.  Therefore, I recompiled the benchmark for a 1000-equation 
system.  I than started htop in one of the four terminals and added, 
one by one, another instance of the benchmark program in the remaining 
three terminals.  No swap with one instance, no swap with two 
instances, but on the third, swap went up to as high as 20 MB but then 
settled down to around 7 MB as the Ben was "working its heart out" on 
three instances of a solution that could not have been done on any 
computer in the world in the 1960's!  It was quite apparent that lots 
of time was being used in moving items between swap and RAM because 
the aggregate CPU percentage for the three benchmark processes was 
below 50 percent.  Killing one benchmark process resulted in the CPU 
usage jumping to about 95 percent for the two remaining benchmark 
runs.  

Swap works but it would be best to avoid it because it radically slows 
computation--but then we all knew that:) 

This E-mail is a bit long, but I wanted to report on what I found, 
since nothing has appeared on these topics yet.  Now I have to get 
back to "real" project work:) 

                   Delbert







More information about the discussion mailing list


interactive