Hard Disk Drives

Everything You (N)Ever Wanted to Know About Hard Drives


Craig M. Buchek

St. Louis UNIX Users Group

June 14, 2006


"Opportunity is missed by most people because it is dressed in overalls and looks like work."

                                                                                                              -- Thomas Edison

History of the Hard Disk Drive

  • Invented by IBM in 1956
    • RAMAC 350 - Random Access Method of Accounting and Control
      • 50 platters, each 24 inches across
      • Stored 5 MB total
  • IBM model 3340, introduced in 1973
    • Stored 30 MB on each of 2 spindles
    • Nick-named "Winchester", after the 30/30 riffle
    • Ancestor of the modern hard drive
  • Seagate ST-412
    • Available with IBM PC/XT
    • 10 MB version of ST-506 (first 5.25" hard drive, 5 MB)

RAMAC

RAMAC hard drive RAMAC spindle

Physical Components

  • Platters
    • Disc-shaped, glass or aluminum substrate
    • Coated with particles of a ferrite oxide or cobalt-based alloy
      • Particles hold magnetic charge
        • Typically 500-1000 particles per magnetic domain
      • Modern drives use a thin-film layer
      • Cobalt, chromium, platinum, boron
  • Spindle
    • Collection of platters that spin together
  • Spindle motor
    • Rotates the spindle at high speeds
  • Actuator motor
    • Moves the read/write heads from track to track
    • Read/write arm pivots on an axis, using strong magnets and voice coil
    • Positioning done with servo system feedback loop
      • Servo information on drive media between data helps position the heads
  • Read/write heads

Read/Write Heads

  • Originally similar to tape read/write heads
  • Detect/modify magnetic domains in the media
  • One read head and one write head per platter surface
    • Both on same armature
  • Floats on an air bearing a couple nanometers above the disk surface
    • If it touches the surface, it is called a head crash
    • Requires (mostly) sealed enclosure and environmental monitoring
  • Most expensive and technologically-advanced part of the drive
    • Very tiny, very sensitive
  • Read head technologies:
    • Ferrite
    • Metal in Gap
    • TF - Thin Film
    • MR - Magneto-Resistive
    • GMR - Giant Magneto-Resistive
    • TMR - Tunnel Magneto-Resistive
    • CMR - Colossal Magneto-Resistive (not in use yet)
  • This video from IBM Research shows how it works

Physical Characteristics

  • Platter size
    • 3.5", 2.5", 1.8", 1.0", 0.85" (historically 5.25", 8", and larger)
    • Smaller platters can spin faster
      • Less mass to move, outer edge not going as fast at same RPM
    • Smaller platters have several other advantages
      • Less power required, less friction (and heat)
      • Less turbulence, less wobble
    • Drive enclosure is slightly larger than platter size
      • 3.5" drive is about 4" x 6"
  • Number of read/write heads (and platters)
    • Each platter has 1 or 2 read/write heads, one for each surface
    • Modern drives typically have 1 to 5 platters, 1-10 read/write heads
  • Spindle speed (RPM)
    • 4200, 5400, 7200, 10000, 15000
  • Power and data interfaces
  • Areal density
    • TPI - track per inch
      • Typically ~5500 tpi - each track about 5% the thickness of piece of paper
    • BPI - bits per inch

Perpendicular Recording

  • Superparamagnetic effect is limiting maximum BPI
    • Particles so small and close, they become magnetically unstable
    • They can be changed by the ambient temperature
  • Each domain is already wider (TPI direction) than long (BPI direction)
    • About 20 times wider, trying to get to 4-5 times
    • But they are not very deep
  • Perpendicular recording stands the magnetic domains "on end"
  • Expected 10 times improvement density over longitudinal recording
  • Toshiba 2.5" 200 GB, Seagate 3.5" 750 GB, Toshiba 1.8" drive
Perpendicular recording

Encodings

  • Data is stored in magnetic domains
    • Magnetized area pointing "North" or "South"
    • Read heads can't tell which is "N" and which is "S"
      • If a bunch of "S" were in a row it would be hard to tell how many there were
      • They read flux reversals ("R") -- changes from "N" to "S" or "S" to "N"
    • Encodings prevent too many non-reversals in a row
  • FM - Frequency Modulation
    • 1.5 reversals per bit
    • 0 = RN, 1 = RR
  • MFM - Modified Frequency Modulation
    • 0.75 reversals per bit
  • RLL - Run Length Limited
    • 0.46 reversals per bit
    • 11 = RNNN, 10 = NRNN, ... 0011 = NNNNRNNN
  • PRML - Partial Response Maximum Likelihood
    • Statistical methods must be used, because flux reversals partially overlap
    • Samples analog signals instead of expecting discrete reversals

Hardware Interfaces

  • ST-506
    • Two ribbon cables: one for data (20 pins), one for control (34 pins)
    • Driver had to do all the work on raw data from read/write heads and actuator
    • MFM or RLL encoding
  • ESDI
    • Enhanced Small Device Interface
    • Moved some of the intelligence to the drive
    • Somewhere between ST-506 and IDE in technology
  • IDE (Integrated Drive Electronics)
    • Controller logic is on the drive itself
    • Generic term actually applies to ATA and SCSI, but common usage implies ATA
  • ATA, Serial ATA
  • SCSI
  • External interfaces
    • USB
    • IEEE 1394 (FireWire)
    • eSATA

ATA

  • AT Attachment
    • Controller on the original IBM AT
      • Connected directly to the ISA bus
        • 16-bit data transfers
  • Most common interface - found on most PCs
    • 40-pin ribbon cable
      • Pin 1 marked with red stripe
    • 80-pin cable (but 40-pin connectors) for ATA/66 and faster
    • 44-pin connector for 2.5" notebook drives (4 added for power)
  • Lots of marketing names
    • IDE, EIDE, FastATA, UltraATA, Ultra/66, Ultra/100, Ultra/133
  • Master/Slave
    • Both drives on same data / control channel
    • Must configure drives with jumpers
    • Master drive controls the slave’s controller
    • Hurts performance

ATA (Continued)

  • ATA through ATA-6
    • 3.3 MB/s to 133 MB/s
    • Transfer modes
      • PIO - Programmed IO
        • CPU reads IO ports to get the data
        • PIO-1 to PIO-4: 3.3 MB/s to 16.7 MB/s
      • DMA - Direct Memory Access
        • DMA controller transfers memory
        • Bus Master - Drive controller transfers memory
        • DMA Mode 0 to Multiword DMA Mode 2: 2.1 MB/s to 16.7 MB/s
      • UltraDMA
        • Transfers on the clock edges (twice as often)
        • UDMA-0 to UDMA-6: 16.7 MB/s to 133 MB/s
  • ATAPI - ATA Packet Interface
    • Extension to allow CD-ROMs and other device types
  • MMC - Multi-Media Commands
    • Uses SCSI command set for CD-ROM access

Serial ATA

  • Currently reaching the limits of parallel interface
  • No more master/slave (1 drive per cable)
  • Smaller cables (less signal lines)
    • Data cable has 7 conductors on 8 mm wide wafer connector
    • Up to 1 m long, twice as long as the max for PATA
    • Keyed, so you can't plug them in wrong
    • Power connector is 15-pin wafer connectors (3 voltages - 3.3, 5, 12)
    • Hot pluggable
    • 3.5" and 2.5" drives now share the same connectors
  • SATA 1 (SATA/150)
    • 1.5 Gb/s, 150 GB/s
  • SATA II - new features
  • SATA 3.0 Gb/s (SATA/300)
  • Native Command Queuing (NCQ)
    • More than 1 I/O request can be sent to drive before it responds
    • Drive can respond to requests in optimal order
  • eSATA - external interface
    • Hot pluggable
    • 3 times faster than USB 2.0 or FireWire

SCSI

  • Small Computer Systems Interface
  • Various flavors
    • SCSI-1, SCSI-2, SCS-3
    • Fast, Ultra, Ultra2, Ultra3, Ultra160, Ultra320
    • 5 MB/s to 320 MB/s
    • Narrow, wide
      • 50-pin, 68-pin ribbon cable
    • SCA (84-pin connector)
      • Hot pluggable
    • FC-AL (Fibre-Channel Arbitrated Loop)

SCSI (Continued)

  • Higher performance than ATA
    • Optimized for server-type loads
      • Queues requests better than ATA
    • Allows access to more devices simultaneously
      • 8 / 16 (including controller)
  • Much higher price than ATA
    • You do get more features/performance for the higher price
    • Manufacturers rarely make the exact same drive with both interfaces
    • SCSI drives now tend to come in smaller capacities than ATA
  • Termination
    • Last device on each end of the channel must be properly terminated
    • Prevents signal from "bouncing back"
    • Used to require setting jumpers or resistors
  • SCSI IDs
    • Each SCSI device must be set with a unique ID on the channel
    • Use jumpers, or auto-configure on new drives

Serial Attached SCSI (SAS)

  • 3 Gb/s
  • Choice of 3 physical connectors
    • Same as SATA (1 device)
    • New SAS connector (4 internal devices)
    • Same as InfiniBand (4 external devices)
  • Can attach an SATA drive to an SAS controller
    • But not vice-versa
    • Connectors are keyed to enforce this
  • SCSI IDs are globally unique
    • Much like a MAC address
  • 3 transport protocols
    • Serial SCSI Protocol (SSP)
    • SATA Tunneling Protocol (STP)
    • Serial Management Protocol (SMP, for enclosures and "expanders")
  • Basically a layered network protocol family

NAS and SAN

  • NAS - Network Attached Storage
    • Device speaks application-layer protocols
    • Filesystem-level access (NFS, SMB)
    • Standard network technology (Ethernet, TCP/IP)
  • SAN - Storage Area Network
    • Raw access to disk device blocks
    • SAN-specific network hardware
    • Higher throughput
  • iSCSI
    • SCSI command protocols on top of TCP/IP
  • ATA over Ethernet (AoE)
    • Developed by Coraid, a Linux-friendly company NAS company
    • Free Linux, Solaris, and BSD drivers available
    • Windows drivers available
    • Non-routable Ethernet protocol
      • Not IP-based, sits next to IP at ISO layer 3

Block Addressing

  • Data is accessed one block at a time
    • Blocks are also known as sectors
  • CHS - Cylinders / Heads / Sectors
  • LBA - Logical Block Addressing
    • Each block gets a single sequential number
  • BIOS Limitations
    • 528 MB (1024x16x63)
    • 8.4 GB (1024x256x63)
    • 137 GB (ATA 28-bit LBA)
    • 144 PB (ATA 48-bit LBA)
    • Boot loaders require boot code within BIOS-accessible areas
  • OS limits
    • DOS had 2 GB partition limit
    • Some older OSes have a 4 GB file size limit
  • File system limits
    • FAT has a 4 GB file size limit
    • File system limits are usually very large
  • Other limits are generally BIOS bugs

RAID

  • Redundant Array of Inexpensive Disks
  • Level 0
    • Striping (no redundancy)
    • Related: spanning, JBOD
  • Level 1
    • Mirroring
  • Level 5
    • Distributed parity disk
  • Level 10
    • 1+0, not one of the original levels

Partitions

  • Partitions divide the drive into logical "chunks"
  • Partitions contain file systems
  • First sector of a partition may hold the boot sector
  • BSD uses slices
    • Slice is what DOS calls the extended partition
    • Slice contains BSD partitions
  • Creating and modifying partitions
    • fdisk
    • cfdisk
    • GUI versions are distro-specific

Partition Table

  • MBR (Master Boot Record)
    • First sector on the disk
    • Contains boot code and the partition table
    • MBR code actually defines the structure of the partition table
  • Partition table has only room for only 4 partitions
  • Primary partitions
    • Partition defined directly in the partition table
    • DOS/Windows/NT require boot partition to be a primary
  • Extended partitions
    • Takes up a primary partition slot
    • Contains a "sub" partition table
      • Still only 4 partitions, but extended partitions can be chained
    • Only one extended partition is allowed
  • Logical partitions
    • Partitions contained within extended partitions

Block Devices

  • Linux naming conventions
    • hda - entire ATA hard drive
    • sda - entire SCSI hard drive
    • hda1 - hda4 - primary partitions
    • hda5 - hda63 - logical partitions
  • FreeBSD naming conventions
    • sd0 - entire hard drive
    • sd0s1d - 4th partition (d) on 1st slice (s1) of first disk (sd0)
  • Boot loaders
    • Some use Linux conventions
    • Some use numbers instead of letters
    • Some start numbering at 0
  • /dev/rdsk
    • Raw disk interface (no buffering)
  • /dev/dsk
  • In OpenBSD slice 3 is the whole disk

File Systems

  • Logical structure of files and directories
  • Many to choose from:
    • Ext2, Ext3
    • ReiserFS, Reiser4
    • XFS, JFS
    • NTFS
    • FAT
      • FAT12, FAT16, FAT32
      • FAT, VFAT (long file names)
  • Creating a file system:
    • mkfs, mke2fs
  • Checking a file system:
    • fsck (file system check)

Performance

  • Spindle speed
    • Faster means more data passes by the read/write heads in a given amount of time
  • Access / seek times
    • Seek - time to move from one track to another
    • Access - seek time plus latency for desired sector to come around
    • Full-stroke - from inner-most to outer-most track
    • Average - average time form one random track to another
    • Definitions not precisely agreed upon (Marketing Alert!)
  • Storage Review FAQ says:
    • Access time is possibly the biggest factor that affects a drive's performance
  • Larger capacity drives (at a given RPM) are generally faster
    • Higher TPI, so seeking n tracks is quicker
    • Higher BPI, so each track holds more, so less need to seek to another track

I/O Performance

  • Transfer rates
    • Disk to controller
      • Depends a lot on physical characteristics
    • Controller to RAM
      • Most commonly used rating
      • You'll never reach this rate in real life
        • Command overhead
        • Unlikely you'd ask for contiguous sectors that are cached on controller
  • PIO, DMA, UltraDMA
    • Make sure controller, device, and OS drivers all use fastest available
  • Buffers
    • Larger buffers can cache more data on the drive (2-8 MB typical)
    • Allows for read-ahead of sectors you may ask for next

OS-Level Performance

  • Use the appropriate file system
  • Partitioning schemes
    • Use read-only partitions when possible
    • Keep partitions that may fill up separate
    • Place commonly-used partitions first (inner part of drive)
  • Tune the file system
  • Tune the controller / driver
    • hdparm
    • Make sure block transfers are as large as possible
    • Make sure DMA is enabled
    • Make sure fastest DMA/UltraDMA mode is used

Summary

  • Main components: platters, read/write head, spindle motor, actuator motor
  • Storage access is split into several layers to abstract various functions
  • Hard drives have gained more built-in intelligence
    • Removes complexity from the computer and OS
    • Allows added features, such as bad block remapping
    • Increases performance (buffering, etc.)
  • BIOS controls the initialization of the hard drive
    • Transfers control to the OS in several steps
  • SCSI vs. ATA
  • Partitions divide up a drive into logical chunks
    • Partitions contain file systems
  • Performance depends on factors from each layer of abstraction
  • Lots has changed in the past several years
  • Becoming more layered, and more like networking

Presentation Info

 

 

Stallman as St. Ignutius
with a Hard Drive Platter on His Head