latency, transfer bandwidth, spatial density, power consumption, and cooling costs. 13 Table 1 and some derived metrics in Table 2 illustrate this point (all metrics derived on 4/11/2007 from dramexchange.com, dvnation.com, buy.com, seagate.com, and samsung.com).

Given the number of CPU instructions possible during the time required for one disk I/O has steadily increased, an intermediate memory in the storage hierarchy is desirable. Flash memory seems to be a highly probable candidate, as has been observed many times by now.

Many architecture details remain to be worked out. For example, in the hardware architecture, will flash memory be accessible via a DIMM slot, a SATA (serial ATA) disk interface, or yet another hardware interface? Given the effort and delay in defining a new hardware interface, adaptations of existing interfaces are likely.

A major question is whether flash memory is considered a special part of either main memory or persistent storage. Asked differently: if a system includes 1GB traditional RAM, 8GB flash memory, and 250GB traditional disk, does the software treat it as

250GB of persistent storage and a 9GB buffer pool, or as 258GB of persistent storage and a 1GB buffer pool? The second goal of this article is to answer this question and, in fact, to argue for different answers in file systems and database systems.

Many design decisions depend on the answer to this question. For example, if flash memory is part of the buffer pool, pages must be considered “dirty” if their contents differ from the equivalent page in persistent storage. Synchronizing the file system or checkpointing a database must force disk writes in those cases. If flash memory is part of persistent storage, these write operations are not required.

Designers of operating systems and file systems will want to use flash memory as an extended buffer pool (extended RAM), whereas database systems will benefit from flash memory as an extended disk (extended persistent storage). Multiple aspects of file systems and database systems consistently favor these two designs. Presenting the case for these designs is the third goal of this article.

Finally, the characteristics of flash memory suggest some substantial

differences in the management of B-tree pages and their allocation. Beyond optimization of page sizes, B-trees can use different units of I/O for flash memory and disks. These page sizes lead to two new five-minute rules. Introducing these two new rules is the fourth goal of this article.

 

Table 1: Prices and performance of flash and disks.

Price and capacity
Access latency
Transfer bandwidth
Active power
Idle power
Sleep power

RAM

$3 for 8×64Mbit

flash disk $999 for 32GB 0.1ms 66MB/s API 1W 0.1W 0.1W

sATA disk
$80 for 250GB
12ms average
300MB/s API
10W
8W
1W

Table 2: Relative costs for flash memory and disks.

Price and capacity
Price per GB
Time to read a 4KB page
4KB reads per second
Price per 4KB read per second
Time to read a 256KB page
256KB reads per second
Price per 256KB read per second

nAnD flash
$999 for 32GB
$31.20
0.16ms
6,200
$0.16
3.98ms
250
$3.99

sATA disk $80 for 250GB $0.32 12.01ms 83 $0.96 12.85ms 78 $1.03

Assumptions

Forward-looking research relies on many assumptions. This section lists the assumptions that led to the conclusions put forth in this article. Some of these assumptions are fairly basic, whereas others are more speculative.

One assumption is that file systems and database systems assign the same data to the flash memory between RAM and the disk drive. Both software systems favor pages with some probability that they will be touched in the future but not with sufficient probability to warrant keeping them in RAM. The estimation and administration of such probabilities follows the usual lines, such as LRU (least recently used).

We assume that the administration of such information uses data structures in RAM, even for pages whose contents have been removed from RAM to flash memory. For example, the LRU chain in a file system’s buffer pool might cover both RAM and flash memory, or there might be two separate LRU chains. A page is loaded into RAM and inserted at the head of the first chain when it is needed by an application. When it reaches the tail of the first chain, the page is moved to flash memory and its descriptor to the head of the second LRU chain. When it reaches the tail of the second chain, the page is moved to disk and removed from the LRU chain. Other replacement algorithms would work mutatis mutandis.

Such fine-grained LRU replacement of individual pages is in contrast to assigning entire files, directories, tables, or databases to different storage units. It seems that page replacement is the appropriate granularity in buffer pools. Moreover, proven methods exist for loading and replacing buffer-pool contents entirely automatically, with no assistance from tuning tools or directives by users or administrators needed. An extended buffer pool in

References:

http://dramexchange.com

http://dvnation.com

http://buy.com

http://seagate.com

http://samsung.com

Archives