NET EXPRESS

sales[at]netexpresslabs[dot]com, Silicon Valley, California

Home Page

Order Page

Sale Page

SCSI Controllers, FireWire and RAID Adapters

 

 

Technical Considerations

Index:

RAID Performance

Several components of a RAID controller determine its performance. Among these the type of I/O microprocessor, the type of SCSI host controller chip and the type of cache RAM utilized are key. The I/O microprocessor is the brains of the RAID controller. Currently most RAID controllers utilize either an Intel i960 I/O Processors or an Intel StrongARM I/O processor (recently acquired from DEC). The SCSI channels are generally controlled by separate Symbios, LSI, BusLogic Harpoon, Adaptec or QLogic SCSI chip sets. Faster SDRAM is used for high performance RAID controllers while low end models use FPM or EDO cache.  

Processor Clock Speed Benchmark RAID Controller
SA-11XX StrongARM 600MHz 750 MIPS The StrongARM processor will eventually achieve 750 MIPS!
SA-110 StrongARM 233MHz 268 MIPS Mylex ExtremeRAID 1100
80960HD DAC i960 HD (2X HD Core) 33/66MHz MM 60 MIPS DPT SmartRAID V Millennium Series
80960RN DAC i960 RN (3X Jt Core) 33/100MHz 55 MIPS ICP-Vortex GDT RN, Adaptec 3000, Mylex AcceleRAID 352, AMI Elite 1600
80960RS DAC i960 RS (3X Jt Core) 33/100MHz 55 MIPS ICP-Vortex GDT RS,  Mylex AcceleRAID 170, Adaptec 2000
80960RM DAC i960 RM (3X Jt Core) 33/100MHz 55 MIPS AMI Express 500, Intel SRCU21
80960RD DAC i960 RD (2X JD Core) 33/66MHz MM 45 MIPS Mylex DAC960PJ, Mylex AcceleRAID, ICP-Vortex GDT RD and DPT SmartRAID V Century
80960RP DAC i960 RP (1x JF Core) 33/33MHz 31 MIPS Mylex DAC960PG and ICP-Vortex GDT RP, DPT SmartRAID V Decade

Entry level controllers generally use the slower Intel 33MHz i960 RP "Mustang" processor and support slower EDO cache as opposed to SDRAM. The 33MHz i960RP  microprocessor operates at approximately 28 MIPS.  Examples of this class include the Mylex's DAC960PG Ultra Wide SCSI controllers, the ICP-Vortex GDT RP Ultra Wide SCSI controllers and the DPT SmartRaid V Decade Ultra2 Wide SCSI controllers. The Mylex  DAC960PG utilizes a Harpoon (Flashpoint) Ultra Wide SCSI controller chip for each channel which is slightly faster than the Symbios Ultra Wide SCSI chip used on the ICP-Vortex. The DPT model uses an even faster Ultra2 Wide Symbios SCSI controller chip set. In general all of these controllers are much faster than controllers offered by AMI or Adaptec.     

Mid-tier SCSI controllers feature a clock doubled 66MHz i960RD chip set running at approximately 50 MIPS along with one Symbios Ultra2 Wide SCSI controller chip sets. Examples include the Mylex DAC960PJ Ultra2 Wide SCSI controller, the Mylex AcceleRAID  Ultra2 Wide SCSI controllers, the ICP-Vortex GDT RP  Ultra2 Wide SCSI controllers and the DPT SmartRaid V Century Ultra2 Wide SCSI controllers.

High end RAID controllers use faster i960 RN and HD microprocessor or even the faster StrongARM processor. The StrongARM and  i960 RN processors supports faster SDRAM while the i960 HD only supports EDO RAM. The DPT SmartRaid V Millennium series RAID controller utilizes the 66MHz i960HD processor (60 MIPS) and EDO memory.  In contrast the ICP-Vortex GDT RN (Fiber channel) RAID controller utilizes the 100MHz i960 RN processor (55MIPS) and SDRAM. Both use the Symbios SCSI controller chip sets.

The Mylex ExtremeRAID was the first controller to use the new StrongARM 233MHz SA-110 processor running at 268 MIPS! It is unparalleled in performance. The ExtremeRAID also supports SDRAM and utilizes the Symbios Ultra2 Wide SCSI host controller chip set. Intel claims they will eventually release a 600MHz version of the StrongARM processor running at 750 MIPS! 

Linux, BSD and UNIX RAID Support

In our opinion the best Ultra2 Wide SCSI controller for Linux, SCO UnixWare and SCO OpenServer is currently the Mylex ExtremeRAID 1100 series. Leonard Zubkoff's Linux drivers for the Mylex ExtremeRAID 1100 as well as the Mylex DAC960PG, Mylex DAC960PJ and Mylex AcceleRAID series can be found on the Linux DAC960 RAID Page.  The ICP-Vortex GDT RN, RP and RD series are excellent Linux alternatives. Support for these controllers is already extant in all recent Linux kernels. Solaris x86 now supports the Mylex DAC960PJ and DAC960PG RAID controllers. You must download the latest update disks from SUN. Of the two, the Mylex DAC960PJ is the best choice for Solaris. FreeBSD users may wish to consider some of the older DPT SmartCache IV offerings. For those running operating systems that lack reasonable native RAID support, you can create an excellent hardware RAID solution by combining any supported SCSI controller with the Mylex DAC960SXI SCSI-to-SCSI RAID controller. This is a much slower solution than native support for a PCI RAID controller because SCSI-to-SCSI configurations access disks via LUNS. However, performance is acceptable for small file servers. We will configure a RAID solution for any UNIX OS upon request. We will install and configure RAID solutions for any operating system.

RAID 0, 1, 3, 5, 10, 0+1 Selecting a RAID solution

Redundant Array of Inexpensive Disks (RAID) technology was developed at U.C. Berkeley by Patterson, Gibson and Katz in 1988 with the goal of improving performance and data integrity. Performance is improved when data is written and read in blocks of 8KB-64KB in a round robin fashion to a group of disks in an array. In this way a series of bytes can can be written or read in parallel from each disk in the array and the total aggregate through-put of the array can approach the sum of the through-put of the individual disks. Such an allocation of blocks of data is known as RAID 0 or striping. The size of the stripe can impact performance. In general the controllers default size should be used. However, for specialized applications this may be adjusted. For I/O intensive operations a smaller 8K stripe is preferred. For accessing larger multi-media files and CAD files a larger 64K stripe is often used.

Striping is also used along with other data protective measures in RAID 3, 4, 5 and 0+1. Data can be protected in two ways. In disk mirroring, RAID 1, a pair of disks is used and all the data on the master disk is duplicated, mirrored, on a second disk. Hence one half of the total disk space is lost to the mirror. In the second method using parity, data is striped as in RAID 0. In addition for each stripe of blocks a parity block is calculated and written to the disk array. The parity block is a logical XOR of the other blocks in the stripe. Should one disk fail its data can be reconstructed by subtracting the parity block from the XOR of the remaining data in the stripe of blocks. If N disks are used in this fashion, the available disk space will be N-1. Parity is used in RAID 3, 4 and 5. 

When creating an array several types of RAID levels may be combined. For example if six disks are used 50% of the space may be used for RAID 5 and 50% may be used for RAID 0.  

Use of Cache on RAID Controllers

RAID controllers include either faster SDRAM cache memory or slower EDO cache memory. For applications involving intensive random I/O cache is not important. Indeed in a few cases cache may even slow down performance of random I/O. An example would be an active news server with many tiny history and overview files that must be continuously updated. In contrast applications that involve long sequential seeks and writes are greatly enhanced with larger cache memory. Examples include large databases and many multi-media applications. For optimal performance the cache should be set to "write back" mode. In write back mode dirty bits (data bits that have been altered) are written from the processor to the cache. The cache then in turn can write the data to disk as time allows. In "write through" mode the cache is effectively disabled for write operations. In this mode the processor waits for the data to be written to the disk before marking the process as complete. Write back mode does leave the system vulnerable. If the system looses power before being shut down the dirty bits in the cache are lost. This can be prevented is a cache battery back up module is installed on the cache. Alternately if a good UPS is utilized chances of a power failure will be reduced. battery back up modules may be added for an additional charge to most RAID controllers. Others, like the Mylex ExtremeRAID 1100 series, includes a battery back-up by default.

UNIX, Linux and FreeBSD Support for SCSI Controllers

The Adaptec Ultra2 Wide SCSI controller family based on the new 789X series chip sets are now the best choices for 32-bit multi-tasking OS's. Adaptec has finally surpassed our old favorite Mylex/BusLogic as the king of the *UNIX hill. The Adaptec 789X chips are used on the Adaptec 2940U2W PCI SCSI host adapters. Many of the newer 440BX boards also include either an integrated Adaptec AIC7890 Ultra2 Wide SCSI controller chip or the Adaptec AIC7895 PCI to Dual Channel Ultra Wide SCSI Controller chip. These chips are not compatible with the older 2940UW style Adaptec AIC7880 Ultra Wide SCSI Controller chips.  Drivers for the new AIC789X chips are currently available for Linux, Solaris x86, FreeBSD, SCO UnixWare, SCO OpenServer, Netware, OS/2 and all Microsoft OS's. Doug Ledford has released a new driver for the AIC789X series controller chips in the stable 2.0.36 and 2.2.X Linux kernels. You can find the latest information about this driver at DialnetJustin Gibbs has released a stable AIC789X CAM driver for FreeBSD 3.0. Several features of the AIC789X series controller chips make them a better choice than the previous models. A synopsis of the novel feature set can be found in the ahc.4 FreeBSD man page. If your interested in Linux and the Linux BusLogic Drivers you should visit Leonard Zubkoff's Linux site at Dandelion Digital.

Note that NeXTStep does not currently have drivers for this controller. For NeXTStep we suggest the Mylex 958 with the firmware flashed down.

For Sparc Solaris we suggest the IntraServer ITI-6101U2-N and ITI-6100U2-N Ultra2 Wide LVD SCSI host adapters featuring the Symbios SCSI chip set. Note to use these controllers on a Sparc you must purchase the Sparc driver set for $150 extra. This does not apply to Solaris x86. Solaris x86 fully supports the Adaptec 789X series chip sets and the IntraServer ITI-6101U2-N and ITI-6100U2-N Ultra2 Wide LVD SCSI host adapters with the latest free drivers in the Solaris x86 disk update set.    

SCSI Vs. IDE Performance: Command Tag Queuing, Mail Boxes, Disconnect Features of SCSI 

SCSI is much faster than Ultra DMA (IDE). The issue is less one of total bandwidth as most people think and more due to the inherent design of SCSI. Tagged queuing is probably the most significant SCSI property. Non-tag queued devices (IDE) can only process one command at a time. When a device is held up in a long I/O process such as a seek, read or write, the next command may not be issued until the I/O process has completed. Tag queuing allows the SCSI controller to process multiple commands in a queue without waiting for completion of extant I/O processes. Incoming commands are received and queued in mail boxes by the host adapter. In many cases 256 commands may be stored at once. The commands are serviced in a round robin fashion. The controller may then optimize the execution order (out of order execution) and process some commands in parallel. In some cases up to 64 commands may be executed in parallel. Tag queuing is extremely important for disk intensive processes like those found in servers. Indeed tag queuing is more important in most cases than the shear bandwidth available to the disk channel.

Moreover, on SCSI systems each device may be issued a set of commands and while these are being processed the SCSI device will disconnect from the SCSI bus. This allows other SCSI devices to negotiate data and commands on the SCSI bus while this device is busy. In contrast when IDE devices receive a command they can not disconnect and they lock the entire IDE channel exclusively for their use until the commands are completed. While this is going on no other device can be used. This causes IDE systems to feel like they are freezing. The user must wait on devices to finish various operations. SCSI systems appear more fluid to the user.

Further SCSI controllers allow each device to operate at its maximum transfer rate. In contrast IDE controllers must lower the channel transfer rate to that of the slowest device on the channel. One slow device will drag down a much faster device.

SCSI Cable Length Limitations

The total length of all cables on a single SCSI controller is limited. The total lengths depend on the speed of the SCSI bus as follows:

Clock Speed Type Speed Total Length
5MHz Wide SCSI 10MB/S 6 Meters or 20 feet
10MHz Fast Wide SCSI 20MB/S 3 Meters or 10 feet
20MHz Ultra Wide SCSI 40MB/S 1.5 Meters or 5 feet
40MHz Ultra2 Wide LVD SCSI <"trong>80MB/S 12 meters or 60 feet
80MHz Ultra160 Wide SCSI 160MB/S 12 meters or 60 feet
- Fiber Channel over copper 100MB/S 25 meters
- Fiber Channel over fiber optics 100MB/S 500 meters

For non-Fiber channel SCSI devices the max length is the sum of all SCSI cables on a single SCSI bus. For example on an Adaptec 2940UW or Mylex 958 with two cables connected to the host adapter, the max length would be the sum of both cables. So for example you could connect two 2.5 feet cables, one on and Ultra Wide SCSI port and one on the narrow Ultra SCSI port. Note this is not the case with newer Ultra2 Wide LVD controllers. In this case the controller is split into two independent buses. It has one Ultra2 Wide SCSI bus and a second Ultra SCSI bus (with an Ultra Wide SCSI port and an narrow Ultra SCSI port). So for example you can add 12 meters of cables on the Ultra2 Wide SCSI port and an additional 1.5 meters on the Ultra SCSI bus.

Ultra Wide SCSI controllers are limited to use of two of the three ports

Only two ports on a SCSI controller may be used at the same time. So for example if you use a BusLogic 958 or an Adaptec 2940UW and you attach a 50-pin CD-ROM to one internal port and a 68-pin Wide SCSI hard disk to the other port than you can not use the external port on the controller. If you need to use the external port you must stop using one of the internal ports. One easy way around this is to buy a 50 to 68 pin converter for the CD-ROM. This will allow you to put the 50-pin CD-ROM on the wide cable with the Wide SCSI disk. This will open up the internal 50-pin port and allow you to use the external port. We sell such converters on our cable page. 

Running Linux on a Mylex RAID Controller

Leonard N. Zubkoff of Dandelion Digital in cooperation with Mylex has written a exemplary Linux device driver for the Mylex RAID  controllers. The driver and the latest release notes are available here. We have implemented the Mylex RAID controllers under Linux in a vast number of major projects and in every case we have found these solutions to be stable, reliable and extremely fast. We suggest this configuration without reservation. Indeed we use the Mylex ExtremeRAID 1100 series under Linux for our file server at Net Express.

The device driver is implemented at the block device level for optimal performance. Up to 8 controllers are supported with up to 15 devices per channel. It does not support arbitrary SCSI devices such as SCSI CD-ROM's and SCSI tape drives. Prior to Linux installation the array and disks must be configured using the BIOS Configuration Utility or DACCF. Hitting <ALT>-M during BIOS initialization will allow one to set the disk geometry parameter. This defaults to a 2GB geometry of 128 heads and 32 sectors per track. In order to to translate the cylinder count to less than 65,535 for larger disks, an 8GB BIOS geometry should be set up yielding 255 heads and 63 sectors per track. Hitting <ALT>-R during BIOS initialization will allow one to configure the array. The disks should be low level formatted. Then disks should be added into one or more drive groups. Each drive group can be subdivided into 1 to 32 logical drives. Logical drives are seen as block devices under Linux: for example, "/dev/rd/c0d0." Each logical drive can be setup with a unique RAID level and caching policy. Up to seven partitions may be created on a logical disk using fdisk. If a greater number of partitions is required more than one logical drive should be created accordingly. An extended partition is counted as one partition. 

Both Red Hat Linux and SuSE Linux support the Mylex RAID controllers out of the box. These distributions contain updated kernel support, device files, boot floppies, as well as fdisk and lilo binaries that support the Mylex RAID controllers. Either installation can be set up to boot directly from the Mylex RAID array. The Mylex RAID controller is controlled as a block device. The block device file corresponding to Logical Drive "D" on Mylex Controller "C" is referred to as /dev/rd/cCdD. The partition "P" on this block device would be called /dev/rd/cCdDpP. These device names can be used as the arguments of the fdisk or mke2fs commands or in the /etc/lilo.conf or /etc/fstab configuration files. In order to format a partition on the logical drive to fit the default 64KB stripe size the command "mke2fs -b 4096 -R stride=16 /dev/rd/cCdDpP" should be executed. This sets a 4KB block size and a 16 block stride.

The DAC960 driver includes extensive error logging and online configuration management capabilities via the command line and /proc/rd file system and via console logging. For Controller "C" that initial and current status can be read by issuing "cat /proc/rd/cC/initial_status" and "cat /proc/rd/cC/current_status" respectively. The output is either "OK" or "ALERT" if the array is not running properly. Configuration commands should be directed into /proc/rd/cC/user_command using "echo "<configuration-command>" > /proc/rd/c0/user_command." Executing "cat /proc/rd/cC/user_command" will return the result. The configuration commands are:

flush-cache
The "flush-cache" command flushes the controller's cache. The system automatically flushes the cache at shutdown or if the driver module is unloaded, so this command is only needed to be certain a write back cache is flushed to disk before the system is powered off by a command to a UPS. Note that the flush-cache command also stops an asynchronous rebuild or consistency check, so it should not be used except when the system is being halted.

kill <channel>:<target-id>
The "kill" command marks the physical drive <channel>:<target-id> as DEAD. This command is provided primarily for testing, and should not be used during normal system operation.

make-online <channel>:<target-id>
The "make-online" command changes the physical drive <channel>:<target-id> from status DEAD to status ONLINE. In cases where multiple physical drives have been killed simultaneously, this command may be used to bring them back online, after which a consistency check is advisable. Warning: make-online should only be used on a dead physical drive that is an active part of a drive group, never on a standby drive.

make-standby <channel>:<target-id>
The "make-standby" command changes physical drive <channel>:<target-id> from status DEAD to status STANDBY. It should only be used in cases where a dead drive was replaced after an automatic rebuild was performed onto a standby drive. It cannot be used to add a standby drive to the controller configuration if one was not created initially; the BIOS Configuration Utility must be used for that currently.

rebuild <channel>:<target-id>
The "rebuild" command initiates an asynchronous rebuild onto physical drive <channel>:<target-id>. It should only be used when a dead drive has been replaced.

check-consistency <logical-drive-number>
The "check-consistency" command initiates an asynchronous consistency check of <logical-drive-number> with automatic restoration. It can be used whenever it is desired to verify the consistency of the redundancy information.

cancel-rebuild
cancel-consistency-check
The "cancel-rebuild" and "cancel-consistency-check" commands cancel any rebuild or consistency check operations previously initiated.

Demonstration I - Mylex RAID Drive Failure Under Linux Without A Standby Drive

This demonstration of the Mylex RAID controller running under Linux was conducted by the author of the Linux Mylex RAID Controller device driver, Leonard N. Zubkoff of Dandelion Digital and is copyright 1998-1999 by Leonard N. Zubkoff:

The following annotated logs demonstrate the controller configuration and and online status monitoring capabilities of the Linux DAC960 Driver. The test configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a DAC960PJ controller. The physical drives are configured into a single drive group without a standby drive, and the drive group has been configured into two logical drives, one RAID-5 and one RAID-6. First, here is the current status of the RAID configuration:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PJ PCI RAID Controller
Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB
PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned
PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru
No Rebuild or Consistency Check in Progress

gwynedd:/u/lnz# cat /proc/rd/status
OK

The above messages indicate that everything is healthy, and /proc/rd/status returns "OK" indicating that there are no problems with any DAC960 controller in the system. For demonstration purposes, while I/O is active Physical Drive 1:1 is now disconnected, simulating a drive failure. The failure is noted by the driver within 10 seconds of the controller's having detected it, and the driver logs the following console status messages indicating that Logical Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD:

DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL
DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command
DAC960#0: Physical Drive 1:1 is now DEAD

The Sense Keys logged here are just Check Condition / Unit Attention conditions arising from a SCSI bus reset that is forced by the controller during its error recovery procedures. Concurrently with the above, the driver status available from /proc/rd also reflects the drive failure. The status message in /proc/rd/status has changed from "OK" to "ALERT":

gwynedd:/u/lnz# cat /proc/rd/status
ALERT

and /proc/rd/c0/current_status has been updated:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Dead, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru
No Rebuild or Consistency Check in Progress

Since there are no standby drives configured, the system can continue to access the logical drives in a performance degraded mode until the failed drive is replaced and a rebuild operation completed to restore the redundancy of the logical drives. Once Physical Drive 1:1 is replaced with a properly functioning drive, or if the physical drive was killed without having failed (e.g., due to electrical problems on the SCSI bus), the user can instruct the controller to initiate a rebuild operation onto the newly replaced drive:

gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command
gwynedd:/u/lnz# cat /proc/rd/c0/user_command
Rebuild of Physical Drive 1:1 Initiated

The echo command instructs the controller to initiate an asynchronous rebuild operation onto Physical Drive 1:1, and the status message that results from the operation is then available for reading from /proc/rd/c0/user_command, as well
as being logged to the console by the driver.

Within 10 seconds of this command the driver logs the initiation of the asynchronous rebuild operation:

DAC960#0: Rebuild of Physical Drive 1:1 Initiated
DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01
DAC960#0: Physical Drive 1:1 is now WRITE-ONLY
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed

and /proc/rd/c0/current_status is updated:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Write-Only, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed

As the rebuild progresses, the current status in /proc/rd/c0/current_status is updated every 10 seconds:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Write-Only, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed

and every minute a progress message is logged to the console by the driver:

DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed
DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed

Finally, the rebuild completes successfully. The driver logs the status of the
logical and physical drives and the rebuild completion:

DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE
DAC960#0: Physical Drive 1:1 is now ONLINE
DAC960#0: Rebuild Completed Successfully

/proc/rd/c0/current_status is updated:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru
Rebuild Completed Successfully

and /proc/rd/status indicates that everything is healthy once again:

gwynedd:/u/lnz# cat /proc/rd/status
OK

Demonstration I - Mylex RAID Drive Failure Under Linux With A Standby Drive

This demonstration of the Mylex RAID controller running under Linux was conducted by the author of the Linux Mylex RAID Controller device driver, Leonard N. Zubkoff of Dandelion Digital and is copyright 1998-1999 by Leonard N. Zubkoff:

The following annotated logs demonstrate the controller configuration and and online status monitoring capabilities of the Linux DAC960 Driver. The test configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a DAC960PJ controller. The physical drives are configured into a single drive group with a standby drive, and the drive group has been configured into two logical drives, one RAID-5 and one RAID-6. First, here is the current status of the RAID configuration:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PJ PCI RAID Controller
Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB
PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned
PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Standby, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru
No Rebuild or Consistency Check in Progress

gwynedd:/u/lnz# cat /proc/rd/status
OK

The above messages indicate that everything is healthy, and /proc/rd/status returns "OK" indicating that there are no problems with any DAC960 controller in the system. For demonstration purposes, while I/O is active Physical Drive 1:2 is now disconnected, simulating a drive failure. The failure is noted by the driver within 10 seconds of the controller's having detected it, and the
driver logs the following console status messages:

DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL
DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command
DAC960#0: Physical Drive 1:2 is now DEAD
DAC960#0: Physical Drive 1:2 killed because it was removed

Since a standby drive is configured, the controller automatically begins rebuilding onto the standby drive:

DAC960#0: Physical Drive 1:3 is now WRITE-ONLY
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed

Concurrently with the above, the driver status available from /proc/rd also reflects the drive failure and automatic rebuild. The status message in /proc/rd/status has changed from "OK" to "ALERT":

gwynedd:/u/lnz# cat /proc/rd/status
ALERT

and /proc/rd/c0/current_status has been updated:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Dead, 2201600 blocks
1:3 - Disk: Write-Only, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed

As the rebuild progresses, the current status in /proc/rd/c0/current_status is updated every 10 seconds:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Dead, 2201600 blocks
1:3 - Disk: Write-Only, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed

and every minute a progress message is logged on the console by the driver:

DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed
DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed
DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed

Finally, the rebuild completes successfully. The driver logs the status of the logical and physical drives and the rebuild completion:

DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE
DAC960#0: Physical Drive 1:3 is now ONLINE
DAC960#0: Rebuild Completed Successfully

/proc/rd/c0/current_status is updated:

***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PJ PCI RAID Controller
Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB
PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned
PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Dead, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru
Rebuild Completed Successfully

and /proc/rd/status indicates that everything is healthy once again:

gwynedd:/u/lnz# cat /proc/rd/status
OK

Note that the absence of a viable standby drive does not create an "ALERT" status. Once dead Physical Drive 1:2 has been replaced, the controller must be told that this has occurred and that the newly replaced drive should become the
new standby drive:

gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command
gwynedd:/u/lnz# cat /proc/rd/c0/user_command
Make Standby of Physical Drive 1:2 Succeeded

The echo command instructs the controller to make Physical Drive 1:2 into a standby drive, and the status message that results from the operation is then available for reading from /proc/rd/c0/user_command, as well as being logged to the console by the driver. Within 60 seconds of this command the driver logs:

DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01
DAC960#0: Physical Drive 1:2 is now STANDBY
DAC960#0: Make Standby of Physical Drive 1:2 Succeeded

and /proc/rd/c0/current_status is updated:

gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Standby, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru
Rebuild Completed Successfully


Copyright (c) 1989 - 2008 Net Express All Rights Reserved.