Every disk, regardless of its type, will fail someday, so when someone uses some type of RAID, it increases the availability of the system and is a step forward. If continuity of work and maintaining all recently entered data is not a priority, a backup copy may be an equally good solution, enabling the system to be restored after a complete failure.
In small solutions with several disks, we can actually do whatever we want, you can use disks from the same manufacturer, but the idea of disks from different manufacturers (with the same parameters) can also work, if it does not interfere with the controller/software (in most cases it does not will be). In large solutions with several dozen or several hundred drives, using drives from different manufacturer production cycles may be a good idea. Highest availability solutions will use two or more data replication arrays, which will be located in different DC locations or even in geographically dispersed DCs.
Where performance is important, a hardware controller will be a good solution, where high availability is also needed, two AA or AP hardware controllers + 2xPS + multiple network interfaces, two power paths, two LAN paths, etc. will be useful. In less demanding environments, it will be an equally good idea Software-based RAID on general-use components.
It is also important to use the appropriate class of solutions consistent with the intended use, for example the mentioned types of disks depending on the type of load. As for the type of load, quite a contrasting distinction is the disk in the CCTV recorder (continuous stream from cameras and sometimes reading of the area), home NAS for backups and shared files (mainly records and small streams reading video/audio/photo files), shared storage for many virtual machines (large amounts of I/O).
The drives are also different, but the production of a special drive for a specific matrix supplier is rather a myth or a rarity, most often, a product of a specific type with a "sticker" is used + possibly tests and modified firmware in consultation with the hdd and storage manufacturer. The exception are a few HDD and storage manufacturers - they can afford drives dedicated to their products.
For comparison, for a desktop HDD, Seagate specifies the maximum annual load at 55TB/year, for desktop SSD drives and sometimes also larger enterprise-class SSDs, the DWPD value may be fractional https://www.kingston.com/pl/ssd/dwpd
I have saved ~15TB on my laptop drive for 3 years, but this of course strongly depends on how I use the PC. It is worth installing free software from the SSD manufacturer, which tracks changes in SMART parameters, determines the condition of the drive and provides information on, among others, the number of TB saved, any problems, etc.
Since we exchange information and compare our ideas quite loosely here, what do you think about a "proactive controller". This is an idea for a system/controller that understands data directed to a pool of disks, based on historical statistics, predicts when a certain load will occur, and monitors the condition of disks via SMART and response times, this data is also stored as trends.
Let`s assume that one of the disks starts behaving suspiciously, the controller could deliberately direct a large artificial I/O load to the disk with a lower production traffic load (it understands the block structure, so it can do both artificial reads and writes). If, with such artificial load on the disk, the SMART parameters start to change rapidly (or the disk breaks down), you can start rebuilding it with a spare disk (or you can start doing it earlier). We anticipate a failure, we actually cause it ourselves and eliminate the failed element, and in the background we already have a replacement disk filled with data.
Another, crazier idea is that if everything is OK, we select one disk from a common series and deliberately load it with I/O. If everything is OK, we can predict that other disks from this group will statistically last longer than the loaded one.
Of course, not everything can be predicted, one of the disks may stop responding at some point without warning (this is a rare phenomenon, but it sometimes happens), such measures can statistically increase reliability in a more global cut.
What do you think about such conceptual ideas?
My opinion is that traditional "simple" RAID, where the controller rebuilds disks 1: 1, still has applications in some solutions, but the future lies in RAID virtualization, SDS and in large big data solutions such as object memories, distributed systems such as Hadoop, GlusterFS . Even though I have a fondness for mechanical drives and LTO libraries when it comes to archiving solutions (LTO-7 tapes reached 6TB, and LTO-8 were planned for 12TB), I may be wrong, but the future belongs to flash...
In the hardware solution, e.g. the mentioned chunklets in the HPE implementation:
Let`s assume that one of the disks starts behaving suspiciously
This is IMHO a sufficient reason to replace the disk, without going into the cause and wondering whether the disk is actually failing. The disk costs pennies, the data it contains costs a fortune.
It`s clear that in smaller installations, if we have any doubts, we can throw out the media so as not to risk it.
The problem is that on large-capacity drives, e.g. relocated sectors are not necessarily a reason to replace the drive. What is more important is the trend. If, for example, 3 sectors fell and were reallocated, then nothing happens for a month, you can leave the disk. If one sector increases every day, you have to run away. Similarly, other phenomena visible in SMART are problems with the platter drive, head, and interface electronics (here the reasons may also be related to the backplane, cables, controller). Problematic start (buzzing) may be related to e.g. the disk power supply, but also to the mechanics, etc.
The same with delays, e.g. the disk will show a delay of 300ms when writing a sector (because, for example, it was moving a sector), while a delay that is an order of magnitude larger than usual is dangerous, if we know the reason and it is sporadic, you can observe the disk instead to exchange.
It`s even better with SSD drives, they really have a life of their own and have many internal mechanisms (e.g. WL, TRIM, GC) to keep the flash healthy. The drive performs many of these operations autonomously and sometimes this can impact access times.
The DSs of these drives do not contain this information: Ultrastar He12, ST12000DM0007, ST10000NM0206, AL14SXBxxEx Series, MG07ACAxxx Series, so which DS does it contain? Or where in DS I can find the above, because maybe I didn`t notice.
Those that do not have this information - it means that it will not be taken into account in any complaints.
You didn`t really look for it on purpose. First on your list:
Quote:
lUtrastar® He12 HDD. Designed to handle workloads up to 550TB per year
I saw this parameter, but this disk has 2.5M MTTF. The manufacturer claims that it should withstand at least 156963TB of writes and/or reads. For a 14TB it`s over 11k times of writing the entire drive, for smaller ones even more.
Moreover, you wrote that I will quote:
Piotrus_999 wrote:
In addition, the number of surface remagnetizations. Good server drives have this 500-1000.
So I would like you to show me where you got the "surface remagnetization limit", where can I find it in DS?
Workload doesn`t mean you can save anything there. I suggest that you familiarize yourself with the topic and not theorize.
Quoting DS MG07ACA, "Workload is defined as the amount of data written, read or verified by commands from host system". So it`s clear why anyone needs a disk that will only be written to, but the definition says that workload means *any* data transfer, it can only be writes.
Piotrus_999 wrote:
For someone who buys drives, AFR is much more important than MTBF.
You`ve already answered all the questions I didn`t ask. Will you finally answer the one thing I asked? -> "So I would like you to show me where you got the "surface remagnetization limit", where can I find it in DS?" So where did you get this (exactly) overmagnetization limit? Just this question, leave all other topics.
I deleted the personal comments. Please maintain the level of the discussion and respond substantively.
I didn`t write it
Sareph wrote:
"surface remagnetization limit"
Just
Piotrus_999 wrote:
This is related to the number of head movements. Eventually the mechanism doesn`t hold up. In addition, the number of surface remagnetizations.
This mainly affects the degradation of the "reliability" of the disk operation.[/b]
Sareph wrote:
Just this question, leave all other topics.
As you can see, this "limit" is just a figment of your imagination. and 500-1000 it is TB/year and is a reply to another post.
Moderated By tmf:
I deleted the personal comments. Please maintain the level of the discussion and respond substantively.
MTBF AFR, if you have one, two or 5 disks, is only important because you can use it as a reference for the quality of workmanship. It does not guarantee that a specific disk will not fail after 5 minutes or a month of operation. But if you have a system in which you have to ensure the availability and durability of data with appropriate certainty, if you have this data, several hundred disks, and if you know the load and its impact on the change of this data, you can calculate the amount of redundancy needed.
I deleted the personal comments. Please keep the discussion level.
Piotrus_999 wrote:
As you can see, this "limit" is just a figment of your imagination. and 500-1000 it is TB/year and is a reply to another post.
Ok, I misunderstood then, but:
* How can a person know whether "this" refers to "workload" from the quote or to the preceding sentence in which the limit was revealed? * I asked several times clearly about the *remagnetization limit*, and instead of answering "it wasn`t about that, it was referring to the quote above, not to the preceding sentence", which would probably have ended the discussion, you... are nitpicking at everything. * So what is this limit, if it exists and is important?
Piotrus_999 wrote:
But it seems too difficult for you to understand.
Maybe, or maybe you can`t construct statements sensibly. But it`s not for me to judge, sometimes I have trouble understanding people, so maybe, maybe. Well, since you *finally* wrote what it was about, that pretty much closes the topic for me.
But answer the question, how do you have a conversation like this:
- Cockatoo is also nice, but I prefer cats. - Well, they have nice fur. Beautiful wings. And those whiskers too.
It`s common knowledge that these cats don`t have wings, but still... ;D
I looked here and my hands dropped In general, it is not worth throwing words into the wind and nitpicking on some details.
It`s good to write when you have something sensible to say, remembering that we can always make a mistake and a sensible discussion may tell us that we are wrong.
If you have time to write something that will interest others, It`s a waste of time to argue and waste your time, my friend tmf .
When someone talks too much and does not support himself with sources, links, examples, he is not taken seriously over time... and it is a pity when you come across a person who sometimes has something interesting to say, but it disappears in the thicket of other less sensible statements.
Coming back to the topic, RAID on SD cards, it`s surprising that an idea that sounded like a joke turned into a product you can buy:
Good SD cards cost money. Sometimes even branded cards go bad.
It`s better to invest in a cheaper SSD drive with MLC bones. It will be cheaper and more reliable.
When I bought my drives, I was guided by the price for 1 GB. After checking the prices, it turned out that 2TB drives will offer the best price for 1GB. A larger number of disks also made it possible to set up RAID6.
The disadvantages of this approach are increased power consumption (and heat) and a larger amount of space needed.
The drives get very hot, without fans it would be bad.
I usually use budget solutions, so the largest RAID I have is RAID-5 on 4 WD-GREEN 3TB drives, on a completely non-server rack (i3) running on Ubuntu. The matrix, of course, via dmraid. After turning off the "green" functions, even this worked. I`m gradually switching to WD-PURPLE, I don`t see any difference with WD-RED, and the price is slightly lower... I often had defective drives at the start, but when I found a working one, it usually worked for a few seconds.
Mine is an Athlon II X2 250 foldable, but it has an HP P400 server RAID card. Debian stands on it.
Previously, I used a more energy-efficient i3 550, but the board did not like the RAID card (it freezes when loading the RAID card BIOS).
At the beginning I had two disks that were damaged out of nowhere. One dropped reallocated after undergoing a full surface scan at MHDD.
7 hours for a 2TB drive, so it took me two days to check the whole thing. Then RMA two disks. One dead (bad sectors immediately after booting) and the other reallocated.
I have a 3-year warranty on them, so I use them.
Many people with such arrays first perform long-term tests before connecting them to a computer. Pads will usually immediately throw away the reallocated sectors. This is a method of screening out the weakest links.
The discussion revolves around the feasibility and implications of using different HDD brands in a RAID6 configuration for enhanced reliability. Participants express mixed opinions on the practice, weighing the potential benefits of avoiding common manufacturing defects against the risks of performance inconsistencies and RAID controller errors. Some argue that using drives from different manufacturers can mitigate the risk of simultaneous failures due to shared defects, while others caution against the complications arising from differing specifications and performance characteristics. The conversation also touches on the importance of using enterprise-class drives for critical applications, the role of SMART monitoring in assessing drive health, and the significance of workload ratings in determining drive suitability for RAID setups. Overall, the consensus leans towards caution, emphasizing the need for careful selection and testing of drives in RAID configurations. Summary generated by the language model.