|
|
Subscribe / Log in / New account

fincore()

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jonathan Corbet
January 27, 2010
Linux has long had the mincore() system call which allows an application to determine whether a given page is in RAM or not. There is no easy way, though, to tell whether a given page from a file is in the page cache or not. An application can mmap() the file and use mincore() on it, but that can be slow. So Chris Frost has proposed a new fincore() system call to handle this task:

    int fincore(int fd, loff_t start, loff_t len, unsigned char *vec);

A call to fincore() will look at the pages of the file associated with fd in the range indicated by start and len. For each page of the file, one byte of vec will be set to a non-zero value if that page is in memory. Naturally, this answer is an approximation - the situation can change while the system call is running.

That, however, can be good enough for Chris's needs. His objective is to speed up applications which perform large numbers of non-sequential file reads. The traditional readahead code deals poorly with this kind of application, since the access pattern cannot be predicted ahead of time. But the application often does know about a sequence of reads in advance; if the kernel could be told to pull in those pages ahead of time, it could order the I/O operations optimally and make the whole thing go faster. When doing this for sqlite and the GIMP, Chris reports significant speedups.

The fadvise() system call can be used to request prefetching of file data. But there's a problem: it's hard for a prefetch library to know how much system memory is available. If too little data is prefetched, the performance gains will not be what they could be. Prefetching too much data, however, can lead to thrashing. Hence the fincore() system call: if prefetched pages are no longer present by the time the application gets around to using them, the library knows that it is asking for too much and can back off.

Andrew Morton likes the patch:

I must say, the syscall appeals to my inner geek. Lot of applications are leaving a lot of time on the floor due to bad disk access patterns. A really smart library which uses this facility could help all over the place.

Jamie Lokier, though, wondered if it might not be a better idea to find a way to inform applications more directly that their pages are being evicted prior to use.

This is the first posting for this system call, so it has not gotten a lot of attention yet; more discussion will certainly be necessary before it could be merged. In the mean time, the libprefetch site has more information on this whole project.

Index entries for this article
KernelPrefetching
KernelSystem calls/fincore()


(Log in to post comments)

fincore()

Posted Jan 28, 2010 4:49 UTC (Thu) by bradfitz (subscriber, #4378) [Link]

Nice! I hope this gets in, in some form.

fincore()

Posted Jan 28, 2010 17:18 UTC (Thu) by iabervon (subscriber, #722) [Link]

It seems to me like, once you're using both fadvise() and a non-portable syscall, you could inform the kernel in more detail about your actual usage pattern, and it could store the info and decide what to do. The system should be able to make good choices if you could tell the kernel, in order, the pages you intend to use (up until the kernel tells you it doesn't want to store any more hints for you), and have the kernel only prefetch up to where it can fit in memory, prefer to drop the ones which are further out (and likely newest rather than oldest), and remember what you've requested that isn't in memory such that when you drop your interest in the pages you've passed, it can prefetch new ones then.

It doesn't make sense to have a userspace heuristic for figuring out kernel limits when you need kernel support to implement it, particularly if the information you're getting only helps if you are right about the kernel's heuristics. Maybe the kernel will stop evicting pages that have been requested but not used when asked to prefetch more pages, and heuristics based on checking whether pages are in core and an assumption as to the kernel's use of the hints will give entirely wrong results.

fincore()

Posted Jan 29, 2010 17:38 UTC (Fri) by giraffedata (guest, #1954) [Link]

I agree. First of all, fadvise() does not request prefetch. It advises the kernel that you are going to access a certain part of the file soon. It is up to the prefetcher to decide how to exploit that information.

Only the prefetcher, in the kernel, can properly decide how much memory to allocate for prefetching this particular file. Memory is a resource shared between processes, and coordinating resource usage between processes is fundamentally the kernel's responsibility. The application should just look out for itself.

fincore()

Posted Jan 29, 2010 6:19 UTC (Fri) by kleptog (subscriber, #1183) [Link]

Hmm, it seems there's an assumption here about the page size. It's not immediately clear if you mean 4K or 1K pages. So I think that should be made explicit somehow, like by vec getting a length argument.

But other than that I think it's a fabulous idea. Although you could achieve much the same benefits if you could do a read() and have it also return a flag indicating if the data was all in memory or not.


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds