RAID is a way of combining several independent and relatively small disks into a single large memory. The disks included in the array are members of the array. Disks can be combined into an array in various ways, called RAID levels. This method is used to increase reliability or increase the speed of data exchange. The acronym RAID first appeared in 1988 at the University of Berkeley, in an article written by Patterson, Gibson, and Katz. A series of articles written by these three authors and others have identified and classified several data protection and performance models for disk arrays.
RAID 0
RAID 0 technology is used to increase disk system performance. When using it, the data is written uniformly in small pieces on each hard disk involved in the creation of the array. At the same time, the speed of data access increases in proportion to the number of information carriers involved in the creation of the array, due to the spread of the I / O load on several channels. Moreover, the system capacity is equal to the sum of the disk capacities. Also, this technology is easy to implement and design.
The disadvantage of this method of organization is the low reliability of data storage, if one disk fails, all information will be lost. Therefore, RAID 0 is not recommended for use in systems designed to store important information.
This technology can be used on computers designed for editing photos and video files, for preparing print materials and others that perform tasks that require high throughput disk system.
RAID 1
The RAID 1 array is also called mirrored due to the fact that data is stored simultaneously on two or more disks. It has gained popularity due to its simplicity and high level of reliability. This organization of data storage provides increased productivity and increased reliability. However, with this method, the same information is stored on two disks, which increases the cost of storage.
RAID 1 is recommended for use on PCs used for payroll, accounting, working with financial programs and other tasks requiring high reliability.
RAID 2
The organization of data storage in a RAID 2 array is based on the use of Hamming code. In this case, separate disks are used for data storage and for error correction. It is possible to calculate how many disks are required to organize an array using the following formulas: if k is the number of hard drives used for error correction, then the data will be saved on 2k – k – 1 disks. In total, 2k — 1 hard drives will be required. In this way, it can be considered that in order for information to be more profitable to store on RAID 2 compared to a RAID 1 array, at least seven hard drives (4 with information and 3 for error correction) will be required.
Despite the fact that this data storage organization scheme provides high reliability and performance, it is practically not applied due to the fact that it requires a large number of information storage devices.
RAID 3
With this storage method, one hard drive is needed for parity blocks, and the rest contains data divided into blocks. With such an organization of the disk array in case of failure of one of the hard drives, information can be restored using auxiliary data saved on other hard drives. RAID 3 technology guarantees high transfer rates and relative low cost. The advantage of this method of data storage can be called high performance when reading and writing, fault tolerance and low cost.
However, due to frequent access, a high load is created on the disk that stores parity blocks, so its reliability is lower than that of disks that store data. The disadvantages include the high consumption of computing resources during the software organization of the array, and the complexity of the controller with the hardware.
RAID 3 is used for image editing, streaming, video editing and other applications requiring high bandwidth.
RAID 4
In RAID 4 arrays, information is not divided into bytes, but into blocks, so the speed of working with small files is quite high. Reliability of data storage is ensured, as for RAID 3, due to the disk that stores parity bits. RAID 4 arrays require a minimum of three hard drives. The advantages of this method include a good speed of reading and transmitting information, low cost due to the fact that it requires only one disk for parity bits.
The disadvantages of this method of data storage include the complexity of the controller and the frequent failures of the hard disk that stores parity blocks.
RAID 5
When using RAID 5 technology, data and checksums are stored on all disks in the array. Thanks to this, with such an organization of memory, it becomes possible to perform parallel operations for both reading and writing information. As control bits, the result of the exclusive-OR operation is used. The smallest number of hard drives needed to organize the array, as in RAID 3 and RAID 4, is 3. The maximum amount of data stored in the array can be calculated using the formula: (k - 1) * sizeHDD, where k is the number of hard drives and sizeHDD is capacity of one HDD (if different hard drives participate in the array, then the smallest one is taken). RAID 5 storage technology offers very high performance, good reliability and low cost.
In the event of a failure of one of the hard drives included in the RAID 5 level, the reliability of information storage drops sharply. Data recovery requires a large number of read and write cycles, which can sometimes lead to the breakdown of other disks included in the array. Also, during the recovery process, previously unknown errors may appear (they come across information that is rarely accessed), which makes recovery impossible. Another drawback is the complex controller design.
RAID 5 is the most versatile way to store data. Such organization of data is used in the Internet (www, e-mail), file and other servers.
RAID 6
RAID 6 is an expansion of the RAID 5 level that provides additional fault tolerance. Organizing it requires at least five disks, three for data and two for the information you need to recover. As in RAID 5, data and control bits are stored alternately on all disks. Using this technology, the data will remain intact even if any two drives fail. However, due to the high loads on the controller, the performance of such a system will be lower by about 10-15%.
RAID 6 is the ideal solution for mission-critical applications.
RAID 7
RAID 7 cannot be called an independent RAID level, rights to it are owned by the brand Computer Storage Corporation. With this method, one hard drive is needed to restore information (parity bits), and the rest is recorded information. When writing information to a disk, caching technology is used, for which purpose the RAM of the computer is used. For reliable operation of the RAID 7 array, you need an uninterruptible power supply, because if there are problems with the power supply, the data can be corrupted.
The architectural features of RAID 7 include: an integrated communication channel managed by the operating system, a high-speed internal cache data bus (X-bus), parity bit generation integrated into the cache. Thanks to this, the overall write speed is 25-90% higher than that of a single hard drive, and 1.5-6 times than other RAID levels. As the number of hard drives participating in the array increases, write performance increases and data access time decreases.
However, keep in mind that RAID 7 is a proprietary solution from one vendor. Also, to prevent data loss, you must connect the computer via the UPS (uninterruptible power supply).
RAID 0 + 1 (RAID 01)
RAID 0 + 1 technology is a combination of RAID 0 and RAID 1 arrays. At least four hard drives are required for its implementation, information will be saved on two, and a mirror copy on the other two. This ensures a good speed of data access, comparable to RAID 0, and fault tolerance, as when using RAID 1. However, the total capacity of hard drives should be twice the amount of data stored.
Computers with such a storage organization are commonly used in file servers and for image processing.
RAID 1 + 0 (RAID 10)
An array made using RAID 1 + 0 technology is organized as a mirror. The information in it is broken into small pieces and is located on several hard drives. This data organization scheme is essentially RAID 0 technology, whose segments are RAID 1 arrays. A RAID level 10 array requires at least four hard drives.
RAID 1 + 0 array is characterized by high speed and very good data storage reliability. Under certain circumstances, a RAID 1 + 0 array can maintain performance, even if multiple simultaneous hard drives fail.
The disadvantage of this method of data storage is its high cost and low scalability.
RAID 1 + 0 array is most often used in database servers, for which high performance and fault tolerance are important.
RAID 1E
RAID 1E is an improved RAID 1 array. It is built as a mirrored array, but it can handle an odd number of hard drives.
There are two ways to organize a RAID 1E array:
- Near (it is also called striped). With this method of storing data, a portion of the data is first recorded on hard drives 1 and 2. Another portion is on 3 and 4 hard drives. If the physical hard drives have run out, for example, there are only three of them in the system, then recording starts from the first. That is, in the case of three hard drives, the first piece of data is written to disks 1 and 2, the second to 3 and 1, and so on.
- When using the interleaved method, information is recorded by sector. In this case, information is recorded in the first sector, and in the second its full copy. At the time of transition from one sector to the next, the number of the storage device increases by one, that is, there is a transition to the next storage device. The data is stored in this order: on the first disk in the first sector, the first portion of data is recorded, and on the second disk in the second sector their full copy. The next piece of data is written to the first sector of the second disk and the second sector of the third hard drive.
A RAID 1E array requires at least three storage devices.
The advantages of this method are high performance, short access time and the ability to use mirror storage of information on an odd number of devices.
The disadvantages include the high cost of storing data (as in RAID 1, only half of the total capacity of all hard drives is used to store data), only one of the storage devices can fail, so when using an even number of hard drives, it is recommended to use RAID 10.
Implementation
RAID can be created in two different ways:
- Using operating system drivers, this method is called software RAID.
- Using special equipment, this method is called hardware RAID.
Software implementation
Software support for RAID levels is one of the most inexpensive ways to create a RAID array. Nowadays, almost any operating system has a built-in RAID capability, although not for all RAID levels.
Windows home releases only allow RAID 0, while RAID 1 and RAID 5 can only be created using Windows server editions. The RAID array created using Windows is inextricably linked with the operating system, so its partitions cannot be used to organize dual boot.
Linux based operating systems with a kernel of 2.5.28 and later support RAID 0, RAID 1, RAID 4, RAID 5, RAID 6, RAID 10 arrays. Downloading is supported with any way of organizing information storage.
Starting with version 7.2, the FreeBSD operating system allows organizing an entry level RAID array 0, 1, 5, 6.
When software RAID is used, the computer processor is used, which reduces the performance of the entire system. In case of RAID 0 and 1 arrays, the processor load is negligible, but for parity-based RAID (e.g. RAID 2, RAID 3, RAID 4, RAID 5, RAID6), the processor load can vary from 1 to 5 percent depending on the CPU power and number Winchesters.
In addition, there are certain restrictions on the use of software RAID for organizing system boot. Only RAID 1 can contain a boot partition. System booting is not possible using software RAID 5 and RAID 0.
Hardware implementation
Hardware RAID is created using special hardware. This method of data storage has a number of advantages compared to hardware implementation:
- does not use a computer processor;
- allows the user to create bootable partitions at any RAID level;
- supports hot swapping.
Features
Each of the disk arrays and RAID levels has its own individual characteristics:
- Fault tolerance, which is the ability to ensure the safety of information in the event of failure of one or more hard drives.
- Performance, which shows an increase in the read and write speeds of the entire array compared to a single disk.
- Array capacity, shows the amount of user data that can be written to RAID. The capacity depends on the RAID level and does not always correspond to the sum of the volumes of information stored on the hard drives included in the array.
a brief description of
To choose which array is right for you, the following is a brief description of each level:
- RAID 0 array: provides good data processing speed, but low fault tolerance. A minimum of two drives is required.
- RAID 1 array: has high reliability of information storage at low speed. The smallest number of hard drives is two.
- RAID 2: it has a good speed and reliability of information storage, but is rarely used, since it requires a lot of hard drives to organize it.
- RAID 3: has good access speeds and medium reliability. The cost of storing information is relatively low.
- RAID 4: has good speed, fault tolerance and low cost of data storage. The disadvantage is the complexity of the hardware controller.
- RAID 5: provides high speed and good fault tolerance. It is considered the most universal way of organizing the preservation of information.
- RAID 6: guarantees high fault tolerance.
- RAID 7: a technical solution from the American company SCC. For reliable operation, you need to use a UPS.
- RAID 01 and RAID 10 are a combination of RAID 0 and RAID 1.
- RAID 1E: is an advanced RAID 1. This is the only mirrored array that can handle an odd number of hard drives.
Depending on what you need (high performance, reliability or low cost), you can choose the method of data storage that is right for you.