16 Dec, 2008, Scandum wrote in the 41st comment:
Votes: 0
DavidHaley said:
That is a disk problem. But frankly, having even hundreds files shouldn't cause a program to halt for a few seconds unless the files are stored on a network, or the disk is already under pretty heavy load…

(If you don't believe me, just try running ls in /usr/bin or something on your local machine, it should be very fast; in fact, it should take longer to print that it takes to read the files!)

I don't think ls uses the same code as fopen. It's quite obvious in practice if you can time commands:

>finger elvira
(34654 usec to execute command)
>finger elvira
(2744 usec to execute command)
>finger elvira
(2768 usec to execute command)

The first attempt takes a lot longer, and its worse in large directories. I'm not sure what the exact cause is, but as far as I know all linux systems have the issue.

Zeno said:
Hm. Well I was getting this lag on "Xio", so I looked in the "x" directory. Only 2 files.

Figures, your problem looks more like resource exhaustion by other VPSes on the server given it hangs on trivial commands.
16 Dec, 2008, David Haley wrote in the 42nd comment:
Votes: 0
It's probably just loading the inode tables into memory, which are then cached until the disk cache fills up. I don't see why ls wouldn't have to do this, although it would only get the hit once.
16 Dec, 2008, Scandum wrote in the 43rd comment:
Votes: 0
DavidHaley said:
It's probably just loading the inode tables into memory, which are then cached until the disk cache fills up. I don't see why ls wouldn't have to do this, although it would only get the hit once.

I'm not sure either, ls probably uses low level routines that circumvent cashing. Possibly open() doesn't have the same issues as fopen() when called with the O_DIRECT flag.
16 Dec, 2008, David Haley wrote in the 44th comment:
Votes: 0
Why would circumventing caching speed things up? The whole point of caching is to avoid having to hit the disk again. :wink:
16 Dec, 2008, Scandum wrote in the 45th comment:
Votes: 0
DavidHaley said:
Why would circumventing caching speed things up? The whole point of caching is to avoid having to hit the disk again. :wink:

Only if you repeatedly access the same file. It's my understanding you get the largest advantage by only cashing stuff that is accessed regularly, so you don't fill up the cache with junk.
16 Dec, 2008, David Haley wrote in the 46th comment:
Votes: 0
I thought that repeatedly accessing the same thing is what you were doing in your speed tests…

Well, anyhow, this is starting to drift a bit…
16 Dec, 2008, quixadhal wrote in the 47th comment:
Votes: 0
Remember though, you're talking about a virtual machine. As such, disk access patterns will be unpredictable from inside the VM. Your "disk" might be a file on a RAID5 system, it might be a network share mounted from the Windows 2000 server machine's FAT32 system, it could be an external hard drive on a USB port.

The lag spikes might be the times another VM sucks up all the disk buffering, or times when the host OS is making a snapshot backup of your VM, or even the operator loading up Fallout 3 on the console. :)
16 Dec, 2008, elanthis wrote in the 48th comment:
Votes: 0
DavidHaley said:
It's probably just loading the inode tables into memory, which are then cached until the disk cache fills up. I don't see why ls wouldn't have to do this, although it would only get the hit once.


Or the time to load and link the program and its shared libraries. Cold loading anything from disk is a lot slower than loading it warm from the cache. :)

Windows can sometimes seem to cheat when running programs because it has things like SmartBoost (or whatever its called) that automatically caches frequently used files and programs into memory when the machine first boots. Linux has some tools for precaching a fixed set of files, but not anything that dynamically determines the user's frequently accessed data.

Scandum said:
Only if you repeatedly access the same file. It's my understanding you get the largest advantage by only cashing stuff that is accessed regularly, so you don't fill up the cache with junk.


You can't really fill the cache with junk. The kernel will cache anything and everything it can (after all, if an item isn't already in the cache, it doesn't know that you just accessed it a second ago), when when it needs memory for something else, it'll evict the least recently used items first. This is why addiing more memory always speeds up a system – the more RAM you have, the more memory the OS uses for its cache.

It is possible in some very specialized workloads to make the kernel think some data is important for the cache because you're accessinig it over and over, even though you're going to be done with it shortly. Generally, instead of killing the performance of the system by avoiding the cache (and thereby causing the disk I/O layer to grind), you should just fix your application. This situation really is very rare with any well behaving application (or benchmark).
40.0/48