New Features and Performance Improvements for System.IO
Microsoft is planning some simple but much welcomed performance improvements for the core System.IO functionality. These include convenience methods for reading and writing text-based files, significantly faster directory enumeration, and support for memory mapped files.
The first improvement is a replacement for the convenience method File.ReadAllLines. For small files this is a perfectly acceptable function, but as the file size increases so do the problems. The fundamental flaw is that ReadAllLines does just that, it pauses the program until the entire file can be read into an array of strings.
The replacement is File.ReadLines, which returns a string enumerator. This will lazily read the file, just as if you used the lower level stream objects. Also available are new overloads of File.WriteAllLines and File.AppendAllLines, both of which now take an enumerator instead of just an array.
DirectoryInfo.GetFiles has the same array problem, but they lies an even more serious issue underneath. When retrieving a list of files the Win32 API also returns basic information like size and last modified date. Unfortunately this information is discarded by .NET instead of being passed to the FileInfo objects. So when the program starts to loop through the files, perhaps to determine the directories overall size, it has to requery the file system one by one. What you end up with is a classic 1+N optimization issue. Both DirectoryInfo.GetFiles and the new DirectoryInfo.EnumerateFiles fix this problem.
Another major performance boost for .NET is support for memory-mapped files. Memory-mapped files are an operating system feature that links a block of memory to a file. Once linked, you can read and write to any part of the file as if it were nothing more than just an array of unmanaged memory. The operating system handles important details like paging different parts of the file into and out of memory as needed. Memory-mapped files allow applications to work with incredibly large files, even in excess of a gigabyte, in a highly efficient manner.
In addition to raw file I/O, memory-mapped files provide a powerful means of communication between processes. If two applications open the same memory-mapped file, changes made by one application will be immediately visible to the other application.
Despite the name, memory-mapped files are not necessarily real files. They can also be purely in-memory objects with no backing store. While potentially useful within an application, these are particularly applicable to cross-process communication.
Jim Driscoll Dec 08, 2013