Mixing Different HDD Brands in RAID6 for Reliability: Pros and Cons?

And! 14913 47

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Report a violation of the law

Reply | New topic

#31 17111024 18 Mar 2018 10:34

Sareph Sareph

Level 24

» | Helpful post? (0)

Post #31
17111024 18 Mar 2018 10:34

Piotrus_999 wrote:
Good server drives have 500-1000...
Source of this information?
ADVERTISEMENT
#32 17111211 18 Mar 2018 11:57

And! And!

Admin of Design group

» | Topic author Helpful post? (0)

Post #32
17111211 18 Mar 2018 11:57

Every disk, regardless of its type, will fail someday, so when someone uses some type of RAID, it increases the availability of the system and is a step forward. If continuity of work and maintaining all recently entered data is not a priority, a backup copy may be an equally good solution, enabling the system to be restored after a complete failure.

In small solutions with several disks, we can actually do whatever we want, you can use disks from the same manufacturer, but the idea of disks from different manufacturers (with the same parameters) can also work, if it does not interfere with the controller/software (in most cases it does not will be). In large solutions with several dozen or several hundred drives, using drives from different manufacturer production cycles may be a good idea. Highest availability solutions will use two or more data replication arrays, which will be located in different DC locations or even in geographically dispersed DCs.

Where performance is important, a hardware controller will be a good solution, where high availability is also needed, two AA or AP hardware controllers + 2xPS + multiple network interfaces, two power paths, two LAN paths, etc. will be useful. In less demanding environments, it will be an equally good idea Software-based RAID on general-use components.

It is also important to use the appropriate class of solutions consistent with the intended use, for example the mentioned types of disks depending on the type of load. As for the type of load, quite a contrasting distinction is the disk in the CCTV recorder (continuous stream from cameras and sometimes reading of the area), home NAS for backups and shared files (mainly records and small streams reading video/audio/photo files), shared storage for many virtual machines (large amounts of I/O).

The way of marking the permissible load varies, e.g. this SSD:
http://www.vikingtechnology.com/products/storage-overview/uhc-silo-35ssd/
has been marked as 1DWPD for 5 years, which means that the entire disk capacity (50TB) can be written every day for 5 years.

Seagate:
10 DWPD
https://www.seagate.com/files/www-content/dat...nytro-1200-2-sas-ssdDS1947-1-1709PL-pl_PL.pdf

WD RED drives:
https://www.wdc.com/content/dam/wdc/website/d...le_assets/pol/spec_data_sheet/2879-800002.pdf
180TB/year

WD RED pro:
https://www.wdc.com/content/dam/wdc/website/d...le_assets/pol/spec_data_sheet/2879-800022.pdf
300TB/year

Seagate baracudaPro:
https://www.seagate.com/www-content/datasheet.../barracuda-pro-12-tbDS1901-7-1707PL-pl_PL.pdf
300TB/year

Seagate Ironwolf/Ironwolf pro
https://www.seagate.com/pl/pl/internal-hard-drives/hdd/ironwolf/
180/300TB/year

Seagate Skyhawk AI/Exos
https://www.seagate.com/pl/pl/internal-hard-drives/hdd/skyhawk/
550TB/year

The drives are also different, but the production of a special drive for a specific matrix supplier is rather a myth or a rarity,
most often, a product of a specific type with a "sticker" is used + possibly tests and modified firmware in consultation with the hdd and storage manufacturer. The exception are a few HDD and storage manufacturers - they can afford drives dedicated to their products.

For comparison, for a desktop HDD, Seagate specifies the maximum annual load at 55TB/year,
for desktop SSD drives and sometimes also larger enterprise-class SSDs, the DWPD value may be fractional
https://www.kingston.com/pl/ssd/dwpd

I have saved ~15TB on my laptop drive for 3 years, but this of course strongly depends on how I use the PC.
It is worth installing free software from the SSD manufacturer, which tracks changes in SMART parameters, determines the condition of the drive and provides information on, among others, the number of TB saved, any problems, etc.

Since we exchange information and compare our ideas quite loosely here, what do you think about a "proactive controller".
This is an idea for a system/controller that understands data directed to a pool of disks, based on historical statistics, predicts when a certain load will occur, and monitors the condition of disks via SMART and response times, this data is also stored as trends.

Let`s assume that one of the disks starts behaving suspiciously, the controller could deliberately direct a large artificial I/O load to the disk with a lower production traffic load (it understands the block structure, so it can do both artificial reads and writes). If, with such artificial load on the disk, the SMART parameters start to change rapidly (or the disk breaks down), you can start rebuilding it with a spare disk (or you can start doing it earlier). We anticipate a failure, we actually cause it ourselves and eliminate the failed element, and in the background we already have a replacement disk filled with data.

Another, crazier idea is that if everything is OK, we select one disk from a common series and deliberately load it with I/O. If everything is OK, we can predict that other disks from this group will statistically last longer than the loaded one.

Of course, not everything can be predicted, one of the disks may stop responding at some point without warning (this is a rare phenomenon, but it sometimes happens), such measures can statistically increase reliability in a more global cut.

What do you think about such conceptual ideas?

My opinion is that traditional "simple" RAID, where the controller rebuilds disks 1: 1, still has applications in some solutions, but the future lies in RAID virtualization, SDS and in large big data solutions such as object memories, distributed systems such as Hadoop, GlusterFS . Even though I have a fondness for mechanical drives and LTO libraries when it comes to archiving solutions (LTO-7 tapes reached 6TB, and LTO-8 were planned for 12TB), I may be wrong, but the future belongs to flash...

In the hardware solution, e.g. the mentioned chunklets in the HPE implementation:

In a more software-based ZFS solution:
#33 17111316 18 Mar 2018 12:43

tmf tmf

VIP Meritorious for electroda.pl

» | Helpful post? (0)

Post #33
17111316 18 Mar 2018 12:43

And! wrote:
Let`s assume that one of the disks starts behaving suspiciously

This is IMHO a sufficient reason to replace the disk, without going into the cause and wondering whether the disk is actually failing. The disk costs pennies, the data it contains costs a fortune.
#34 17111448 18 Mar 2018 13:30

And! And!

Admin of Design group

» | Topic author Helpful post? (0)

Post #34
17111448 18 Mar 2018 13:30

It`s clear that in smaller installations, if we have any doubts, we can throw out the media so as not to risk it.

The problem is that on large-capacity drives, e.g. relocated sectors are not necessarily a reason to replace the drive.
What is more important is the trend. If, for example, 3 sectors fell and were reallocated, then nothing happens for a month, you can leave the disk.
If one sector increases every day, you have to run away. Similarly, other phenomena visible in SMART are problems with the platter drive, head, and interface electronics (here the reasons may also be related to the backplane, cables, controller). Problematic start (buzzing) may be related to e.g. the disk power supply, but also to the mechanics, etc.

The same with delays, e.g. the disk will show a delay of 300ms when writing a sector (because, for example, it was moving a sector), while a delay that is an order of magnitude larger than usual is dangerous, if we know the reason and it is sporadic, you can observe the disk instead to exchange.

It`s even better with SSD drives, they really have a life of their own and have many internal mechanisms (e.g. WL, TRIM, GC) to keep the flash healthy. The drive performs many of these operations autonomously and sometimes this can impact access times.
#35 17111452 18 Mar 2018 13:31

Anonymous Anonymous

Anonymous

» |

Post #35
17111452 18 Mar 2018 13:31

Sareph wrote:
Source of this information?
DSs. Take a look.
ADVERTISEMENT
#36 17111545 18 Mar 2018 14:07

Sareph Sareph

Level 24

» | Helpful post? (0)

Post #36
17111545 18 Mar 2018 14:07

Piotrus_999 wrote:

Sareph wrote:
Source of this information?
DSs. Take a look.
The DSs of these drives do not contain this information: Ultrastar He12, ST12000DM0007, ST10000NM0206, AL14SXBxxEx Series, MG07ACAxxx Series, so which DS does it contain? Or where in DS I can find the above, because maybe I didn`t notice.
#37 17111558 18 Mar 2018 14:13

Anonymous Anonymous

Anonymous

» |

Post #37
17111558 18 Mar 2018 14:13

Those that do not have this information - it means that it will not be taken into account in any complaints.

You didn`t really look for it on purpose.
First on your list:

Quote:
lUtrastar® He12 HDD. Designed to handle workloads up to 550TB per year
ADVERTISEMENT
#38 17111571 18 Mar 2018 14:20

Sareph Sareph

Level 24

» | Helpful post? (0)

Post #38
17111571 18 Mar 2018 14:20

Piotrus_999 wrote:
Those that do not have this information - it means that it will not be taken into account in any complaints.

You didn`t really look for it on purpose.
First on your list:

Quote:
lUtrastar® He12 HDD. Designed to handle workloads up to 550TB per year

I saw this parameter, but this disk has 2.5M MTTF. The manufacturer claims that it should withstand at least 156963TB of writes and/or reads. For a 14TB it`s over 11k times of writing the entire drive, for smaller ones even more.

Moreover, you wrote that I will quote:

Piotrus_999 wrote:
In addition, the number of surface remagnetizations. Good server drives have this 500-1000.

So I would like you to show me where you got the "surface remagnetization limit", where can I find it in DS?
#39 17111626 18 Mar 2018 14:41

Anonymous Anonymous

Anonymous

» |

Post #39
17111626 18 Mar 2018 14:41

@Sareph What is this discussion about? If you like to chat, write in Hyde Park. First, find out what these parameters mean and why.

Workload doesn`t mean you can save anything there. I suggest that you familiarize yourself with the topic and not theorize.

I found you a piece on this topic - it`s just an introduction. Continue searching yourself. https://www.seagate.com/gb/en/tech-insights/h...ht-hdds-for-demanding-data-centers-master-ti/

For someone who buys drives, AFR is much more important than MTBF.
#40 17111650 18 Mar 2018 14:50

Sareph Sareph

Level 24

» | Helpful post? (0)

Post #40
17111650 18 Mar 2018 14:50

Piotrus_999 wrote:
@Sareph What is this discussion about?
Whether you said I2C nonsense at 2km or not.

Piotrus_999 wrote:
Workload doesn`t mean you can save anything there. I suggest that you familiarize yourself with the topic and not theorize.

Quoting DS MG07ACA, "Workload is defined as the amount of data written, read or verified by commands from host system". So it`s clear why anyone needs a disk that will only be written to, but the definition says that workload means *any* data transfer, it can only be writes.

Piotrus_999 wrote:
For someone who buys drives, AFR is much more important than MTBF.
You`ve already answered all the questions I didn`t ask. Will you finally answer the one thing I asked? -> "So I would like you to show me where you got the "surface remagnetization limit", where can I find it in DS?" So where did you get this (exactly) overmagnetization limit? Just this question, leave all other topics.
#41 17111691 18 Mar 2018 15:11

Anonymous Anonymous

Anonymous

» |

Post #41
17111691 18 Mar 2018 15:11

Moderated By tmf:
I deleted the personal comments. Please maintain the level of the discussion and respond substantively.

I didn`t write it
Sareph wrote:
"surface remagnetization limit"

Just
Piotrus_999 wrote:
This is related to the number of head movements. Eventually the mechanism doesn`t hold up. In addition, the number of surface remagnetizations.
This mainly affects the degradation of the "reliability" of the disk operation.[/b]

Sareph wrote:
Just this question, leave all other topics.
As you can see, this "limit" is just a figment of your imagination. and 500-1000 it is TB/year and is a reply to another post.

Moderated By tmf:
I deleted the personal comments. Please maintain the level of the discussion and respond substantively.

MTBF AFR, if you have one, two or 5 disks, is only important because you can use it as a reference for the quality of workmanship. It does not guarantee that a specific disk will not fail after 5 minutes or a month of operation. But if you have a system in which you have to ensure the availability and durability of data with appropriate certainty, if you have this data, several hundred disks, and if you know the load and its impact on the change of this data, you can calculate the amount of redundancy needed.
#42 17111763 18 Mar 2018 15:34

Sareph Sareph

Level 24

» | Helpful post? (0)

Post #42
17111763 18 Mar 2018 15:34

Moderated By tmf:
I deleted the personal comments. Please keep the discussion level.

Piotrus_999 wrote:
As you can see, this "limit" is just a figment of your imagination. and 500-1000 it is TB/year and is a reply to another post.

Ok, I misunderstood then, but:

* How can a person know whether "this" refers to "workload" from the quote or to the preceding sentence in which the limit was revealed?
* I asked several times clearly about the *remagnetization limit*, and instead of answering "it wasn`t about that, it was referring to the quote above, not to the preceding sentence", which would probably have ended the discussion, you... are nitpicking at everything.
* So what is this limit, if it exists and is important?

Piotrus_999 wrote:
But it seems too difficult for you to understand.
Maybe, or maybe you can`t construct statements sensibly. But it`s not for me to judge, sometimes I have trouble understanding people, so maybe, maybe. Well, since you *finally* wrote what it was about, that pretty much closes the topic for me.

But answer the question, how do you have a conversation like this:

- Cockatoo is also nice, but I prefer cats.
- Well, they have nice fur. Beautiful wings. And those whiskers too.

It`s common knowledge that these cats don`t have wings, but still... ;D

EOT.
#43 17111884 18 Mar 2018 16:38

And! And!

Admin of Design group

» | Topic author Helpful post? (0)

Post #43
17111884 18 Mar 2018 16:38

I looked here and my hands dropped
In general, it is not worth throwing words into the wind and nitpicking on some details.

It`s good to write when you have something sensible to say,
remembering that we can always make a mistake and a sensible discussion may tell us that we are wrong.

If you have time to write something that will interest others,
It`s a waste of time to argue and waste your time, my friend tmf .

When someone talks too much and does not support himself with sources, links, examples, he is not taken seriously over time... and it is a pity when you come across a person who sometimes has something interesting to say, but it disappears in the thicket of other less sensible statements.

Coming back to the topic, RAID on SD cards, it`s surprising that an idea that sounded like a joke turned into a product you can buy:
#44 17111970 18 Mar 2018 17:16

Piottr242 Piottr242

Level 23

» | Helpful post? (0)

Post #44
17111970 18 Mar 2018 17:16

But this thing on SD cards can only be configured as RAID 0, so if one card fails, we lose all data.
#45 17112023 18 Mar 2018 17:35

Anonymous Anonymous

Anonymous

» |

Post #45
17112023 18 Mar 2018 17:35

@Sareph Yeah. I have referred to workload ZX times and quoted fragments of documentation with these TBs.
ADVERTISEMENT
#46 17115007 19 Mar 2018 21:58

pawelr98 pawelr98

Level 39

» | Helpful post? (0)

Post #46
17115007 19 Mar 2018 21:58

Good SD cards cost money.
Sometimes even branded cards go bad.

It`s better to invest in a cheaper SSD drive with MLC bones.
It will be cheaper and more reliable.

When I bought my drives, I was guided by the price for 1 GB. After checking the prices, it turned out that 2TB drives will offer the best price for 1GB.
A larger number of disks also made it possible to set up RAID6.

The disadvantages of this approach are increased power consumption (and heat) and a larger amount of space needed.

The drives get very hot, without fans it would be bad.
#47 17117378 20 Mar 2018 23:00

tzok tzok

Moderator of Cars

» | Helpful post? (0)

Post #47
17117378 20 Mar 2018 23:00

I usually use budget solutions, so the largest RAID I have is RAID-5 on 4 WD-GREEN 3TB drives, on a completely non-server rack (i3) running on Ubuntu. The matrix, of course, via dmraid. After turning off the "green" functions, even this worked. I`m gradually switching to WD-PURPLE, I don`t see any difference with WD-RED, and the price is slightly lower... I often had defective drives at the start, but when I found a working one, it usually worked for a few seconds.
#48 17117513 21 Mar 2018 01:00

pawelr98 pawelr98

Level 39

» | Helpful post? (0)

Post #48
17117513 21 Mar 2018 01:00

Mine is an Athlon II X2 250 foldable, but it has an HP P400 server RAID card.
Debian stands on it.

Previously, I used a more energy-efficient i3 550, but the board did not like the RAID card (it freezes when loading the RAID card BIOS).

At the beginning I had two disks that were damaged out of nowhere.
One dropped reallocated after undergoing a full surface scan at MHDD.

7 hours for a 2TB drive, so it took me two days to check the whole thing.
Then RMA two disks. One dead (bad sectors immediately after booting) and the other reallocated.

I have a 3-year warranty on them, so I use them.

Many people with such arrays first perform long-term tests before connecting them to a computer. Pads will usually immediately throw away the reallocated sectors.
This is a method of screening out the weakest links.
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Reply | New topic

Report a violation of the law

Topic summary

The discussion revolves around the feasibility and implications of using different HDD brands in a RAID6 configuration for enhanced reliability. Participants express mixed opinions on the practice, weighing the potential benefits of avoiding common manufacturing defects against the risks of performance inconsistencies and RAID controller errors. Some argue that using drives from different manufacturers can mitigate the risk of simultaneous failures due to shared defects, while others caution against the complications arising from differing specifications and performance characteristics. The conversation also touches on the importance of using enterprise-class drives for critical applications, the role of SMART monitoring in assessing drive health, and the significance of workload ratings in determining drive suitability for RAID setups. Overall, the consensus leans towards caution, emphasizing the need for careful selection and testing of drives in RAID configurations.
Summary generated by the language model.

FAQ

TL;DR: Enterprise HDDs are rated for 550 TB workload/year, and “disks from the same production batch fail at more or less a similar time” [Elektroda, Piotrus_999, post #17105883] Mixing brands in RAID-6 can cut common-mode risk but may slow rebuilds.

Why it matters: One wrong drive spec can turn a double-parity array into single-point failure during rebuild.

Quick Facts

• Workload ratings: WD Red 180 TB/yr [Western Digital DS], Seagate Exos 550 TB/yr [Seagate DS]
• Backblaze 2017 annualised failure rate: 1.81 % across 93 k drives [Backblaze 2018]
• Size mismatch: up to ~KB-level differences can block replacement [Elektroda, GrandMasterT, post #17105129]
• RAID-6 tolerates any 2 disk failures; rebuild on 10 TB @ 150 MB/s ≈ 19 h (typical calculation)
• Edge-case: Seagate 7200.11 firmware bug bricked batches [Elektroda, Karaczan, post #17104357]

Can I safely mix different HDD brands in the same RAID-6 set?

Yes, if capacity, sector size and RPM match. Mixing brands lowers the chance that a single firmware defect or batch issue kills multiple disks, as seen with 7200.11 and Deskstar cases [Elektroda, Karaczan, post #17104357] Expect slight throughput variation and longer parity-sync times.

What are the main downsides of brand-mixing?

Heterogeneous latency can confuse older hardware controllers, reducing I/O and raising timeout errors [Elektroda, Anonymous, post #17104300] Rebuild time may increase because the slowest disk sets the pace.

Is mixing production batches of the SAME model enough?

Often yes. "Matching and mixing" different batches avoids simultaneous ageing while keeping firmware identical [Elektroda, Anonymous, post #17105883] It also preserves uniform performance.

Which specs absolutely must match?

Usable LBA count (or be larger). 2. Sector size (512e vs 4 K). 3. Interface type (all SATA or all SAS). 4. Similar RPM to avoid vibration coupling. Missing any of these can prevent array creation or degrade speed [Elektroda, Anonymous, post #17104300]

What happens if the replacement drive is a few kilobytes smaller?

Hardware RAID will refuse the disk, leaving the array in degraded mode. GrandMasterT fixed this by partition-trimming a few MB on every disk during initial build [Elektroda, GrandMasterT, post #17105129]

How big is the firmware-bug risk today?

Low but real: Backblaze saw <2 % AFR across 93 k drives in 2017 [Backblaze 2018], yet entire 7200.11 batches failed after a specific power cycle count [Elektroda, Karaczan, post #17104357] Edge-cases justify diversity in mission-critical pools.

Will desktop drives survive 24/7 RAID duty?

They work, but are rated for 55–180 TB/yr workloads versus 300–550 TB/yr for enterprise disks [Western Digital DS; Seagate DS]. Heavy write or head-park cycles shorten life. As one user noted, cooling and fixed power cycles help Green drives survive [Elektroda, tzok, post #17117378]

Should I prioritise AFR or MTBF when buying?

AFR (annualised failure rate) is practical: a 0.44 % AFR implies <1 expected failure per 200 drives each year. MTBF is theoretical and assumes steady-state workloads [Seagate Guide].

How can I pre-screen new disks before adding them?

Use a simple 3-step soak test:

Full surface write/read pass with bad-block tool.
24-hour random I/O loop to check SMART re-allocations.
Compare latency histograms; reject disks showing >2 × group average [Elektroda, pawelr98, post #17117513]

What about SSDs or hybrid arrays?

SSD tiers cut rebuild windows and boost IOPS but have finite DWPD limits (e.g., Nytro 10 DWPD [Seagate DS]). Use them for hot data, letting high-capacity HDD layers store cold blocks. RAID-6 on mixed SSD/HDD works if controller supports tiering [Elektroda, And!, #17105670].

Can a proactive controller really predict failures?

Concept exists: monitor SMART trends, inject synthetic load, and pre-emptively rebuild when error rates rise [Elektroda, And!, #17111211]. Vendors call this "predictive failure analysis"; it reduced unplanned disk pulls by 36 % in one Dell study [Dell Whitepaper].

Edge-case: what if the power supply fails before the disks?

PSU faults statistically outnumber dual-disk failures [Elektroda, Karaczan, post #17104357] Mitigate with dual hot-swap PSUs and separate power feeds; a UPS alone cannot prevent over-voltage events [Elektroda, Anonymous, post #17104440]

How long will HDDs stay cost-effective against 50 TB SSDs?

Industry forecasts show $/GB parity near 2027 for bulk drives (Trendfocus 2023). Until then, 12 TB helium HDDs stay 3-4 × cheaper per GB [Seagate Price List 2024].