| 







|
|
RAID - Redundant Array of Independent DisksRAID
is defined as Redundant Array of Independent Disks or a Redundant Array of
Inexpensive Disks. The key point about RAID is redundancy or fault
tolerance, which protects stored data from loss should a hard drive or controller fail.
RAID implementations can be as simple as mirrored hard drives or as complex as
multiple RAID 5 arrays striped together to create one or more large virtual hard disks or
as complex as sharing storage in a Storage Area Network (SAN) across an enterprise level
installation. In order to implement RAID solutions, special hardware or software is
often required.
A RAID implementation is only as good as its
backbone. A fault tolerant drive array is only as fault tolerant as the components
of which is it made. Unless all the components of the array are fault tolerant,
redundant, or hot swappable, the array can still suffer a complete failure. A fault
tolerant array should utilize hot swappable and redundant power supplies, hot swappable
fans, controllers, and hard drives. In some cases it is even advisable to have
fail-over servers.
Software based RAID solutions, like those
built into Windows NT Workstation, Windows NT Server, and Windows 2000, draw precious
system resources from the host processor(s) to control the RAID storage, while hardware
based RAID solutions free the host processor(s) to handle other applications, such as a
nonlinear editing program. Anytime RAID is implemented with redundancy or with
parity and fault tolerance, maximum data throughput will be lower than RAID without fault
tolerance or parity. In video applications, this may or may not be a problem, but
there are also ways to increase throughput, if necessary.
Most nonlinear editing systems utilize RAID 0
for maximum performance at the expense of data loss from potential crashing, while other
video systems utilize some form of fault tolerant RAID to prevent crashing. RAID 0
is striped disks only without fault tolerance. Servers, on the other hand, require
data to be available at all times. Servers utilize fault tolerant RAID like RAID 3,
4, 5, and 6 and many typically utilize mirrored disks in addition to the RAID 3, 4, 5, and
6 arrays. There are other "levels" of RAID too, but they tend to be
variants of RAID levels 0, 1, 3, 4, 5, or 6. Many RAID controllers do not support
RAID 6 and the majority of RAID controllers support RAID 3 and 4 but not both. In
the following sections, RAID will be discussed in depth.
Solumedia follows the RAID Advisory Board
definition of RAID which is based on a 1988 paper titled "A Case for Redundant Arrays
of Inexpensive Disks (RAID)" by David A. Patterson, Garth A. Gibson, and Randy
H. Katz and presented during the ACM SIGMOD Conference on Management of Data in Chicago,
Illinois. This paper has become known as the Berkeley Paper and the RAID types
defined are known as Berkeley RAID levels. The consummate source of information on
RAID technology is The RAIDbook published by the RAID Advisory Board. It
and other sources were used as references in the development of this paper. The RAID
Book costs approximately $49.
RAID 0
RAID 0 is not a Berkeley RAID level as
discussed in the above 1988 paper because it doesn't offer any protection for hardware
failure, but it does fall under the guise of an array of disks and is considered to be a
RAID level. RAID 0 is known as disk striping in which data is spread, mapped, or
interleaved across multiple disks in parallel to speed up the data transfer rate
substantially. For the most part, the data transfer rate of a RAID 0 striped set is
the sum of the transfer rates of the drives included in the set minus data and controller
overhead. Utilizing SCSI or Fibre Channel architectures allows multiple commands to
occur simultaneously on different disks, which increases data throughput. This is
not so with EIDE hard drives because they can only handle a single command or I/O request
at a time.
Spreading data among several disks also
increases data throughput, because data can be written to one disk of the stripe set while
the platters of another disk in the stripe set rotate into position (disk latency) to
write the next section. The same occurs when reading data from the hard disks in a
stripe set. The only limit to the number of disks that can be striped together is
the maximum bandwidth of the SCSI or Fibre Channel bus. However, since a RAID 0
stripe set has no redundancy, if a drive fails, all data in that stripe set is lost.
Consequently, in most cases, RAID 0 stripe sets are limited to 3 to 5 hard drives
in size.
A RAID 0 array can be of any size up to the
maximum supported by the operating system, but they are typically comprised of partitions
from 2 or 3 hard drives and not the entire drives striped together. The maximum size
of a RAID 0 array is determined by the smallest partition included in the stripe
set. In other words, if a 20 MB partition is included with two 100 MB partitions,
the maximum stripe set size is 60 MB (20 + 20 + 20).
RAID 0 arrays are used where data transfer
rate is the primary factor, safety is not a factor, and most data is sequential.
Therefore, RAID 0 is most often used for video applications where sustained scalable data
transfer rates of 18 MB per second for uncompressed D1 video are common. Video
editors are used to redigitizing video from a video tape deck should a RAID 0 drive
fail. Now, imagine if that same array contained time consuming finished video
compositing from programs such as Adobe After Effects. If that data was to be lost
and wasn't backed up, the loss of time in recreating the effect could be
substantial. And this is where a production house has to weigh maximum data transfer
rate against a slightly slower transfer rate with a fault tolerant array. However,
if the RAID 0 array was based on 20 MB/second fast wide SCSI-2, just upgrading to Ultra2
LVD SCSI at 80 MB/second in RAID 3 or Fibre Channel at 100 MB/second in RAID 5 would more
than make up for any speed losses related to fault-tolerant redundant RAID storage.
Of course, another option would be to utilize multiple RAID arrays; RAID 0 for video and
RAID 3 or 5 for rendered video and animations.
Striping Primer
How any hard drive array is physically
striped has a major effect on the performance of the array, but its major impact is on AV
drive arrays used for video streaming and nonlinear editing. Only AV drive arrays
are being addressed in this section. Most modern hard drives, notably those from
Seagate use Zone Bit Recording, where more data are packed on the inner cylinder tracks of
each platter than on the outer cylinder tracks. By a simple law of physics, it can
be seen that the rotational velocity of the outer platter tracks is the fastest and slowly
decreases toward the center of the platter. Therefore, the inner tracks of each
platter have about a 35% slower internal transfer rate than the outer tracks. Until
hard drives are developed that eliminate this physical problem and equalize the transfer
rate across all the tracks, we have to work around this fact, which may or may not impact
the application. Striping a sufficient number of hard drives together, essentially
negates the effect of the rotational velocity of the hard drive platters dropping as the
heads move toward the spindle.
A common way of striping hard drives is to
split two drives equally into two partitions each and then stripe the outer partitions
together and the inner partitions together. Of course, as discussed above, the
throughput of the inner stripe set will be less than the throughput of the outer stripe
set. Whether the difference in throughput is important or not is dependent upon the
maximum throughput required. If 15 MB/second is required and the inner stripe set of
two drives can only support a sustained 13 MB/second, one option is to restripe the array
using 3 or 4 hard drives. Another option is to stripe the array in other
ways. We like to call these other nonproprietary methods Striped Ape Technology
because they have the power to equalize the throughput across the drives, while having the
potential to make you act like a primate trying to set it up. But once set up, which
really isn't difficult, the hard drive array will run like a striped ape.
Some companies are providing proprietary
solutions to drive striping that lock you into their proprietary hardware, which also
limits your alternatives. There is no reason to do this. Solumedia offers
nonproprietary solutions to the aforementioned throughput problem that occurs as a hard
drive gradually fills toward the zone of inner tracks. Rather than stripe the outer
partitions of a pair of drives together and the inner partitions together, instead stripe
the outer partition of one drive with the inner partition of the other and repeat this
with the other drive pairs. What this does is lower the maximum throughput of the
outer partition and increase the virtual maximum throughput of the inner partition and its
tracks, to give a more sustained average throughput across the entire hard drive array
stripe sets. This is what we term Striped Ape Technology (SAT).
This technique can be expanded to further
increase the sustained throughput of the outer partitions, in the above two drive array,
to give nearly the same maximum or average throughput across the entire array, by striping
3 drives together in 3 partitions each. In this case, the middle partitions are
striped together and the outer and inner partitions are striped together, as in the above
example. However, if striping the outer partitions of 3 drives together, the middle
partitions together, and the inner partitions together provides sufficient throughput on
each stripe set for the video application, including dual stream real time effects, then
there is no reason to get creative with drive striping using Striped Ape Technology.
A prime example here is the Pinnacle TARGA 2000 RTX dual stream video capture board.
In single stream, the board can support up to about 450 kb/frame, but due to its
chipset architecture it can only support a maximum of 220 kb/frame in dual stream mode,
which translates to 13.2 MB/second. Two striped AV optimized Seagate Cheetah hard
drives have no problem supporting that rate even on the inner cylinders.
Another technique used to increase striped
drive throughput is to stripe across host bus adapters. This can be as simple as
having the 2 hard drives connected to controller A striped with the 2 hard drives
connected to controller B or as complex as striping whole RAID arrays together across two
RAID controllers. In either case, due to the increase in performance, there would be
no reason to be creative in which partitions are striped together.
RAID 1
RAID level 1 is only data redundancy, which
is obtained by disk mirroring or duplexing as it is also known. In disk mirroring,
data is written in duplicate to a second set of disks that mirror the primary set, so
reliability is high. Unfortunately, it is also the most expensive to implement
because of having to have two of everything. RAID 1 cannot be implemented in
software RAID built into Microsoft Windows NT Workstation, but it is available in Windows
NT Server. But RAID 1 can be implemented in either operating system by utilizing a
hardware RAID controller, either onboard the computer in a PCI slot or via an external
RAID controller.
Hard drives can be striped into a RAID 0
stripe set, and by utilizing disk mirroring with identical hard drives, a RAID 0+1 array
can be created that offers fault tolerance and the performance of a RAID 0 stripe
set. But once again, the costs of such configurations are double the cost of a
single array. In the case of Microsoft Windows NT Workstation, the cost of a
hardware RAID controller must be added. However, if high performance maximum data
transfer rates are required, along with a safety net in case of a drive failure, RAID 0+1
is appropriate. Controller failures bring up another point of failure, and in such
cases redundant controllers are often used. RAID 0+1 is an excellent choice for
nonlinear video editing if cost is not an object and redigitizing from the original media
is not a viable option.
RAID 2
RAID level 2 and 3 fall under a broad
classification of parallel access arrays. RAID 2, however, utilizes an error
correction type of algorithm that is often used in memory chips, known as Hamming code.
Unfortunately, Hamming code when used in a RAID array limits the size of the array
and because of this it is rare to find a RAID adapter that supports it.
RAID 3
RAID level 3 is the most supported fault
tolerant RAID class to be used with nonlinear editing systems, because in general it
offers the next highest level of data throughput on both reading and writing to and from
the hard disk array. RAID level 3 uses byte striping of data and is optimized for
high data transfer rates, unlike RAID 4 and 5 which are optimized for transaction
processing and small file transfers. Both RAID 4 and 5 are designed more for reading
from the disks, which is the primary activity in databases. Digital video utilizes
large sequential data files, for the most part, and requires high sustained data transfer
rates as well as a close balance between reading and writing. It does no good to
digitize video at 240 KB per frame, if all that can be read back is a rate of 200 KB per
frame. RAID 3 under SCSI is the best RAID level when high transfer rates and fault
tolerance is required and cost is an object. Companies such as Avid Technology
utilize RAID 3 in some of their nonlinear editing storage options.
RAID 3 adds parity data to RAID technology.
Parity is a type of checksum based on the Boolean exclusive OR function that is
written to one or more disks in the array as error correction information. Parity
data allows damaged information to be regenerated from the remaining disks of the
array. As such, it allows for the rebuilding of the information lost during a disk
drive failure, but it also requires a substantial part of one or more hard drives for
itself. RAID 3, 4, and 5 utilize parity. RAID 3 and 4, however, store the
parity information on an entire single disk which can create a write bottleneck, while
RAID 5 spreads or stripes it equally among the disks in the RAID array and has no write
bottleneck. The bottleneck exists in RAID 3 and 4, because each time a write is
made, an additional write must be done to the parity disk. RAID 5 is a more cost
effective solution because less available storage on each drive is lost to parity
information, especially when 5 or more drives create the array. There is also less
of a write penalty in RAID 5. If the parity disk in RAID 3 fails, the stored data is
still available but it is no longer protected from another disk failure, until the parity
disk is replaced and the array is rebuilt. During the rebuilding time, the array is
usually available but its operation may be slowed. RAID 3, through its parallel
access, splits each disk block equally among all the disks used to create the RAID 3
virtual disk. On the other hand, RAID 4 and 5 arrays map each block in the virtual
disks created to their individual disks in an independent fashion and do not require
accessing each disk for every read and write.
RAID 3 is more efficient if the spindles of
the members of the array are synchronized to eliminate latency. Latency is the time
it takes the hard drive platters to make one rotation to position the heads at the proper
sector to read or write. By synchronizing spindles, latency is essentially
eliminated because the heads on each drive always in position to simultaneously read or
write at the correct sector without additional platter rotation.
As stated in the RAID 2 section, RAID 3 is a
parallel access array, which means all disks in the array must be accessed for every read
and write, and consequently only one I/O request can be handled at a time. For video
editing on shared storage, this can prove to be a configuration challenge. Also as
mentioned previously, RAID 3 also utilizes byte striping on the disks. Fibre Channel
is a serial architecture and is less efficient when using parallel access arrays and byte
striping, but its high bandwidth can over shadow the parallel access penalty it suffers
with RAID 3. Software RAID 3 is not integrated into Microsoft Windows NT Server or
Windows NT Workstation.
RAID 4
RAID 4 is considered an independent access
array and not a parallel access array, so it is a better choice than RAID 3 for Fibre
Channel. RAID 4 also requires the use of a dedicated parity disk like in RAID 3, but
unlike RAID 3 it does not require synchronized hard drive spindles. RAID 4 also
favors disk reads over writes to the extent that writes are even slower than they would be
with a single non-RAID disk. Since RAID 4 disks operate independently, multiple I/O
requests can be executed simultaneously, which greatly increases I/O request performance
over a RAID 3 array. RAID 4 arrays are appropriate for transaction processing with
its nature of high I/O requests for small chunks of data and not for video editing
applications. However, if the hard drives in the array are fast enough, such as the
newer 10,000 RPM SCSI or Fibre Channel hard drives, even the data transfer rate in RAID 4
can support sustained video streams and nonlinear editing. RAID 4 is not available
as an integrated software RAID solution within Windows NT Server or Windows NT
Workstation.
RAID 5
RAID 5 is probably the most common
implementation of RAID on business servers, because of its fault tolerance and cost
effectiveness over RAID 1, 3, or 4. RAID 5 splits the parity information across all
the hard drives in the array, which increases the percentage of each hard drive that is
available for user data. In a hard drive failure, the lost data on that drive is
regenerated on the replacement drive, during the array rebuild process, from the parity
information on the remaining drives. Once again, the array has no fault protection
until the failed drive is replaced and regenerated.
Like RAID 4, RAID 5 array read transfer rate
is comparable to disk striping, but is considerably less than even a single disk for
writes. However, much of the write penalty can be made up with memory cache on the
RAID controller. Such cache is generally not available with software RAID
implementations.
RAID 6
RAID level 6 is even more fault tolerant than
RAID 5, but is also slower than even RAID 5 in disk writes. RAID level 6 protects
data from two simultaneous failures and is basically a RAID 5 array with a second set of
parity information. Once again, the disk write penalty is due to having to update
the parity information. Solumedia RAID controllers do not support RAID 6.
Cache
The write penalty that occurs with parity
RAID can be minimized by using cache on the RAID controller, but this too leads to a
problem in which a failure occurs before the write is completed to the hard drive.
However, with the large sequential files generated in digital video, any cache is rapidly
filled and has limited effect.
Write-behind cache is quite fast, but writes
the actual data to the hard drives after the host is notified that the data has been
written. This can create a problem if a drive fails before data is written to it and
the parity information is updated. Likewise, if the RAID controller fails before the
data is written to the hard drives, the host believes no data was lost. The key
points here are to have backup uninterruptable power supplies and a battery backup to the
cache.
Write-back cache is another version of
write-behind cache, but it notifies the host that the data has been written to the hard
drives after it has physically been written to the drives. Therefore, it is a safer
type of caching algorithm.
The bus speed of the cache is critical in
that it determines the maximum speed most controllers can support regardless of the type
of transfer protocol. Cache used to store command queues only has a minimal effect
on the controller's maximum throughput, but cache used to store data and commands can
become a significant bottleneck. A 66 MHz data memory cache bus speed can only
support up to 66 MB/second, while a 100 MHz cache bus speed can support up to 100
MB/second. An Ultra2 SCSI protocol with a 66 MHz cache can only transfer data at up
to 66 MHz, which is substantially less than the 80 MB/second bandwidth. Many RAID
controller manufacturers are currently upgrading to 100 MHz cache bus speeds to eliminate
this bottleneck.
Choosing a RAID Controller
Choosing a RAID controller is not as easy as
one might think, especially with the newer host interfaces like Ultra2 SCSI and Fibre
Channel. Many RAID controllers have been upgraded from SCSI-2 and Fast Wide SCSI-2
versions and do not support the maximum bandwidth of Ultra2 SCSI or Fibre Channel.
These controllers tend to be advertised as having 80 MB/second throughput for Ultra2 SCSI
or 100 MB/second for Fibre Channel, but the fine print states that these values are from
the controller to the hard drives or from the controller to the host computer. And
often these controllers use Fast Wide SCSI-2 or Ultra SCSI interfaces to the hard drives,
with Ultra2 SCSI or Fibre Channel to the host computer. Fast Wide SCSI-2 or Ultra
SCSI suffer from the 1.5 or 3 meter maximum cable length per channel, while Ultra2 SCSI
supports a cable length of 12 meters per channel and Fibre Channel supports 25-30 meters
or more depending on the type of cable. The Fibre Channel bus can extend to 10
kilometers between nodes by utilizing certain types of fiber optic cable.
Solumedia has carefully matched its RAID
controllers to maximize the capabilities of the interface used, which means Ultra2 SCSI to
the host and hard drives or Fibre Channel to the host and hard drives. However,
using Ultra2 SCSI to the hard drives and Fibre Channel to the host computer is acceptable,
because Ultra2 SCSI speed on two striped channels can meet or exceed Fibre Channel speed
and the combination leverages currently owned SCSI drives.
Some RAID controllers only support RAID 3 or
5 only, while others support RAID 0, 0+1, 3 or 4, and 5. Proper selection depends on
the application to be supported and whether the RAID controller is optimized to offset any
differences between RAID levels, such as between RAID 3, 4, and 5.
Some RAID controllers are available in a hot
swap configuration to keep the system up at all times, while others require taking the
entire system offline while a replacement is made. Once again, proper choice depends
on the application.
Some Fibre Channel RAID controllers and host
adapters do not support switched fabric technology. Consequently, choosing an
appropriate adapter or controller is critical, especially when using switched fabric for
large scale installations of up to 16 million nodes.
Fibre Channel and Ultra2 SCSI offer the added
benefit of being able to move hard drive arrays far enough from the work area to eliminate
the noise caused by these arrays. Additionally, both topologies can support shared
storage solutions with the proper hardware and software. Shared storage often
requires server clustering and cluster-aware software, as well as good arrray management.
While it is possible to used shared storage, such as a SAN, to concurrently store
media files from several nonlinear editing systems, those systems cannot access each
others data for collaborative editing unless their software is designed to support that
feature through networking. It is possible, however, to import media from another
workstation, providing the editor is given access permission to that data by the editor of
the system that owns that data.
Summary
Every level of RAID has its drawbacks and an
appropriate RAID level must be selected based on the applications to be used.
Layering or striping multiple arrays together can increase performance substantially over
a single RAID array. It is possible to combine disks of different capacities into a
single RAID array, but with mirroring the two arrays must be identical. Choose RAID
controllers and hard disks carefully.
As the areal density of hard drive platters
continues to increase, so too does hard drive storage capacities. Some time in the
not to distant future, the areal density of hard drive platters will hit a physical wall
and other types of physical media technology will be required. However, as drive
manufacturers continue to increase storage capacity, they will phase out many of the lower
capacity hard drives. While this is not an issue with single and dual hard drive
systems, it is a problem with RAID systems. For example, the maximum capacity a
customer may need is 72 GB, but only 36 GB hard drives are the smallest available.
If the optimal RAID 5 configuration utilizes five hard drives, the majority of the storage
capacity available on each hard drive will never be used. Consequently, storage
costs will actually increase per megabyte over using 9 GB or 18 GB hard drives.
On the other hand, companies that currently
have large data centers to house their storage systems will be able to cut their costs by
using fewer hard drives and fewer tower or rackmount enclosures.
Copyright 1998 - 2003 Solumedia. All rights reserved.
Legal Info: This
information may not be duplicated in any manner without the express written permission of
Solumedia. The RAIDbook is copyrighted by the RAID Advisory Board.
This page was last updated on 04/29/05.
|