logo elektroda
logo elektroda
X
logo elektroda

Mixing Different HDD Brands in RAID6 for Reliability: Pros and Cons?

And! 14232 47
ADVERTISEMENT
Treść została przetłumaczona polish » english Zobacz oryginalną wersję tematu
  • #1 17104160
    And!
    Admin of Design group
    RAID, i.e. storing data on a group of disks, increases security (resistance to HDD failure) and also increases the speed of data access (strip and split the data stream sequentially onto several HDDs). The RAID type and HDD type affect the characteristics of the obtained storage. We can focus on security and use several redundant HDDs. You can build an archive on a high-capacity SATA HDD, e.g. 10TB each, or achieve more IOPS on smaller SAS HDDs, or build an all-flash on SSD.

    I have another question: is it worth using different drives from different manufacturers in RAID6, e.g. for 5 HDDs?

    It sounds strange, because the same HDD model provides the same access time and transfer, but the same drives also carry the risk of a common manufacturing defect, firmware error, simultaneous aging and common failure. What do you think about such an unusual idea?

    As for HDD models, when performance is not in the foreground and it is about a capacious storage, I like WD RED. With more iops, Seagate 2.5` SAS 10k are durable and provide high density.

    What HDD do you use in small and large RAID groups?

    Due to numerous reports, information: the topic was created on purpose in the `after hours` section of the design section, I want to exchange experiences and ideas with people visiting the section, and conduct the topic in a slightly more relaxed atmosphere in accordance with: https://www.elektroda. pl/rtvforum/topic3382856.html because the topic is somewhat research-oriented :)
  • ADVERTISEMENT
  • #2 17104193
    Sareph
    Level 24  
    And! wrote:
    but the same drives also carry the risk of a common manufacturing defect, firmware error, simultaneous aging and common failure.
    This doesn`t happen, at least not all at once. So personally I don`t see the point as a "preventive measure". The meaning of this IMO is more like "I have a lot of similar drives from different manufacturers and I want to do something with them".

    And! wrote:
    What HDD do you use in small and large RAID groups?
    They have always been Seagate Barracudas, they still are, none of them have failed so far, maybe they have comfortable working conditions. ;)
  • #3 17104300
    Anonymous
    Anonymous  
  • #4 17104357
    Karaczan
    Level 42  
    @Piotrus_999 Have you heard about the bug in the firmware in Barracuda 7200.11? Or the Hungarian DeskStars? ;)
    The cases may be extreme, but they show that nothing can be ruled out.

    But my colleague`s comments regarding the controller`s stability are absolutely correct.

    A much greater probability than the unexpected failure of two drives is a power supply failure :D And UPS On-Line will not help here either.
    And then, in the worst case, it will damage even the most redundant array.
  • #5 17104440
    Anonymous
    Anonymous  
  • ADVERTISEMENT
  • #6 17104488
    And!
    Admin of Design group
    I`m glad I managed to spark an unconventional discussion.
    Anyone who knows the subject of storage well may feel an internal contradiction when looking at the topic. On the one hand, it is generally known `what should be done and what should not be done`, but a professional knows exactly why and sometimes goes beyond the limits to gain benefits.
    In the case of traditional `hardware` controllers, there may actually be problems with I/O synchronicity on different drives (although this is a less and less problem, and HDDs have similar parameters).
    Another issue is `RAIDs` based on `chunks`, which have less in common with traditional RAID and allow us to become more and more disconnected from the hardware. Another SDS case and side topics.
    Firmware errors, uniform I/O load, common aging means a non-zero risk of failure of the second HDD during the rebuild process when the HDDs receive a large I/O load.
    Thanks for your comments so far and I`m waiting for more, I suspect that I will learn something, especially from these unconventional and rebellious ideas.
  • #7 17105129
    GrandMasterT
    Level 26  
    However, I saw with my own eyes in a very large company, where administrators are responsible for business continuity, arrays deliberately built from different disks with similar parameters. Unfortunately, I was not able to learn the details of the configuration, but I know that they did it just in case a series of disks turned out to be defective. Note that for a large and very loaded array, rebuilding to 100% of the assumed redundancy after a failure may take quite a while and it is worth scratching your head in advance and wondering what if another failure occurs during the reconstruction.

    I have experience (so far) only with small and medium-sized arrays, the largest of which has maybe 40 disks. Except for one important case, which I will write about below, I have not noticed any special problems so far if the disks had more or less the same parameters, but from different manufacturers. However, using very different ones sometimes causes complications, such as an unnatural drop in performance, much greater than would result from the "eye" estimation, so I try to avoid it.

    The special case where it can be considered a failure is the size of the disk. Disks from different manufacturers, especially SATA, because I don`t remember if I have encountered such a phenomenon among SAS drives, may differ slightly in size and depending on how the matrix was configured, it may turn out that a 4TB disk fails and cannot be replaced with another 4TB one. because there will be a few kilobytes missing. Personally, I came across this once and since then I have been paying attention, and when creating "software" arrays, e.g. on MD, I cut off the last few MB on the disk to possibly have a small reserve in case one of the disks needs to be replaced with another one in the future.
  • #8 17105254
    Anonymous
    Anonymous  
  • #9 17105504
    GrandMasterT
    Level 26  
    And I think that these guests will keep their heads until they retire, these are not some poor round-ups, and any "amateurism" would have been detected long ago by the audits that regularly audit them. There must be a plan and justification behind the adopted solutions, otherwise it would certainly not have happened. In general, I saw a few other, strange to me solutions. Different drives don`t surprise me that much, because I have often heard opinions that, if possible, at least not use all drives from the same series.

    However, I would be afraid to mess around with a mix of disks in a larger array, but there is a difference between the "poverty" in the IT department and a company that can afford long tests of the selected configuration before it goes into production.
  • #10 17105670
    And!
    Admin of Design group
    @GrandMasterT thanks for the information from practice confirming my seemingly crazy idea, it`s good to know that someone has tried it in production. More than one person has probably fallen for the disk-sized trap...

    Generally, in RAID disk systems with hundreds or dozens of disks, the length of one RAID group usually does not exceed a dozen or so HDDs. This means we have many groups and the degradation of one does not affect the operation of the others. Groups are logically linked into one or more memory pools, so while the degradation of one group does not affect another, unfortunately corruption of a specific group will result in data loss.

    We can add parity disks within one system (e.g. in ZFS Z1, Z2, Z3), making the groups resistant to simultaneous failure of more than one HDD.

    The next step are arrays/systems with synchronous (asynchronous) replication, allowing business continuity in the event of a complete failure of the device or even the processing center. Currently, cloud solutions sometimes serve as such an element of an HA cluster.

    Chunklets, a type of RAID virtualization, are something that exceeds the capabilities of traditional RAID:
    https://blog.mwpreston.net/2015/11/07/learnin...nklets-logical-disk-cpgs-and-virtual-volumes/
    it gives you, for example, the opportunity to add another HDD to the group and start equalizing HDD occupancy with `chunklets`.

    The next step is distributed data storage systems such as Apache Hadoop and GlusterFS.

    However, an even higher level of abstraction is probably object memories.

    BTW. Have you heard about HDD with GBE interface? This is an idea where the controller disappears, there is only a group of powered disks and a network switch, IP transport and logic embedded in the software controller.

    @Piotrus_999 yes, these are "different" drives, their difference often lies in the included support service/free replacement, the modified firmware gives additional possibilities and compatibility with the product (and sometimes blocking "unauthorized" HDDs, as in the simplified version it used to be with SPD data in RAM ). Sometimes such a special drive has WORM-type capabilities implemented in the firmware, and sometimes it is a drive that has undergone additional tests and has a company sticker.

    I think that good R&D departments make much more complicated attempts than we describe in this topic :)
  • #11 17105883
    Anonymous
    Anonymous  
  • ADVERTISEMENT
  • #12 17105975
    And!
    Admin of Design group
    Regardless of what "matching and mixing" method was used, we come back to the point, i.e. the idea of avoiding failure of all HDDs at a similar time.

    These "different design" drives are not manufactured by the array manufacturer, but by the disk manufacturer (with a few minor exceptions),
    you can also count on optimization and applying vendor stickers to a larger general series and possibly tuning the firmware.

    Desktop drives are not suitable for 24/7 operation (which no one wrote about here :) unless it is about SATA drives, but here there is no direct correlation with PC drives) and e.g. CCTV drives are not suitable for operation in NAS, etc. HDD product lines intended for specific applications have been created for a long time, e.g. https:// www.wdc.com/pl-pl/products/internal-storage.html

    As you can see, from the beginning of the topic we moved from the statement "that this is not done" to the statement "that sometimes it is done and it even has a name" :) it`s good, sometimes it`s worth doing what seems unwise because:

    "A reasonable man adapts to the world. An unreasonable man tries to adapt the world to himself. Therefore, great progress is made thanks to unreasonable people."
    George Bernard Shaw

    ;)
  • #13 17106068
    GrandMasterT
    Level 26  
    Depending on the application, however, the physical structure of the disks differs greatly, even the relatively ordinary ones intended for use in e.g. I even saw a bracket that was probably used to mount the engine from the other side, you won`t find such things in regular ones.

    However, I do not agree with the theory that ordinary drives, at least the more solid ones and not the cheapest crap, are not suitable for 24/7 operation. It is obvious that under too much load they will fail quite quickly, but I think that apart from that they still work well, if not better than those that are often turned off. Hard Disk Sentinel has an interesting option where you can compare the SMART of your disk to other identical disks and my old dinosaurs look much better there than the average, having 4-5x more working time. Some of them have already exceeded the working time assumed by the manufacturer and are still doing well, I should have replaced them a long time ago, but in general I continue to believe, among others, curiosity. I have 2 sheet metal machines used for heavier calculations and they are practically turned off several times a year.
  • #14 17106101
    Anonymous
    Anonymous  
  • #15 17106124
    And!
    Admin of Design group
    @GrandMasterT a large number of start/stops may be more destructive than the number of hours worked (both parameters are in SMART), my definition of 24/7 was quite imprecise, I was more concerned with the burden of the amount of data exchanged.

    @Piotrus_999 Enough vague statements to comment on, but that`s the way it is: everything is relative. Even the "serious" uses mentioned above depend on what we consider serious. SATA comes in good solutions, cost-effective and even with 2 controllers, because the disk bays contain a piece of electronics that provides two interfaces like in SAS. There are also many solutions, and each has specific applications.
  • #16 17106160
    GrandMasterT
    Level 26  
    Piotrus_999 wrote:
    It`s obvious you don`t know what you`re talking about. Desktop drives, even if they are turned on 24/7, do not work 24/7. The motors turn off, the rotational speed is reduced, the heads do not move left and right all the time.


    You must be joking at this point. In the power options, just set it to never turn off and at that point it will run 24/7/365. Most disks have a start/stop count where each stop and start of the engine will increase the counter, some have head flying hours, which roughly determines the time when the heads were not parked but were hanging somewhere above the platters. I`ve had a disk that slowed down while idling only once so far and it was a 2.5" disk. I know they exist, but if I were buying a new one now, I would rather avoid them unless I had no choice.
    Besides, the disks in my computers never completely idle, if they were to just spin uselessly, I would take them out, it would be a waste of electricity.

    Below is an example. 78k hours, ~1300 start/stop cycles, the graph shows that it has been turned off several dozen times since 2016. Most of the start/stop problems happened when the engine failed, and when trying to start it turned on, stood for a while, turned off, and so on.

    Mixing Different HDD Brands in RAID6 for Reliability: Pros and Cons?
  • #17 17106163
    Anonymous
    Anonymous  
  • #18 17106194
    And!
    Admin of Design group
    Not necessarily large systems, the 5-disk example from the first post is not one of them.

    However, in large systems, the coexistence of SATA, SAS and SSD fits well into the tiering mechanism. In laptops, SSD is practically replacing HDD, we will see if and when flash will dominate DC. Each write to SSD is a step towards damaging the cell. Here, the dependence of consumption on the number of saved data is clearly visible.

    SMART and its usefulness and effectiveness or ineffectiveness is another topic regarding storage reliability.
  • #19 17106210
    GrandMasterT
    Level 26  
    Piotrus_999 wrote:
    GrandMasterT wrote:
    You must be joking at this point.
    I see that you don`t see any difference in disk operation on a busy server and on a desktop. The server is constantly processing data, the desktop is running some programs that want something from the disk from time to time. These are completely incomparable things.


    So, in your opinion, every desktop is a toy? :D Strange, it seemed to me that there are also computers that are not servers, but are used for serious work and are loaded more than many servers.

    And! wrote:

    SMART and its usefulness and effectiveness or ineffectiveness is another topic regarding storage reliability.


    In my opinion, SMART is very effective in drives where manufacturers have made sure of it. Unfortunately, in some disks, half of the parameters are a grim joke, where the parameters will only go down to the values that trigger alarms when the disk will be ready for scrapping long ago, because the thresholds are set very low.
    Maybe you can notice something a little earlier by checking SMART manually from time to time. However, it is probably best to use something that collects historical parameter values, then you can usually see when something suddenly starts to change and you know that in such a case the changes will probably not be positive. For some disk models, you can even notice characteristic behavior signaling an impending failure or progressive degradation of the disk. I personally use Hard Disk Sentinel for this purpose on several computers where I have a license and smartmontools on the rest. In both cases, you can create notifications that warn you when something starts happening.
  • #20 17106246
    Anonymous
    Anonymous  
  • #21 17106392
    And!
    Admin of Design group
    As an interesting fact, a few years ago a disk with a specific P/N HP for the NL HP server was delivered once as a 3.5` Seagate hdd from the Constellation series and once as a WD storage works with an HP sticker. This was probably due to the supply chain and the fact that it did not matter to the controller, rather it was only about costs and availability, while reliability was not taken into account here.
  • ADVERTISEMENT
  • #23 17109613
    And!
    Admin of Design group
    Interesting statistics, there is a bit of a problem with a similar sample size for each manufacturer.
  • #24 17109722
    Anonymous
    Anonymous  
  • #25 17109795
    And!
    Admin of Design group
    @Piotrus_999 It`s worth noting that everyone is right in this topic because it`s a loose discussion, an exchange of ideas and observations, so it`s hard to come to your own conclusion. :) Additionally, where does this belief about marketing itself come from? from your experiences or from this topic, if so, from which post?
    As I mentioned in post #12, HDDs are available for various applications and both their prices and properties vary, which is probably no surprise to anyone.

    The topic generally does not apply to desktops, and the RAID6 on 5 HDD mentioned in the first post in a desktop solution will be a rather marginal solution.

    I wonder what will happen next with HDD. On the one hand, the price per GB in 12TB HDD will probably be lower than in SSD, but SSD will make up for it in terms of performance, energy consumption and heat production. There are also announcements of what may appear in the future, e.g. 50TB SSD: http://www.vikingtechnology.com/products/storage-overview/uhc-silo-35ssd/

    It seems that the HDD`s survival can only be determined by price?

    In the desktop segment discussed in this topic, hybrid HDD + SSD solutions, i.e. SHDD, also appeared, but they are probably becoming extinct?
  • #26 17109914
    Anonymous
    Anonymous  
  • #27 17109942
    komatssu
    Level 29  
    All users are worried about the limited number of writes on SSD drives, but in the case of HDD, manufacturers also provide the so-called load factor, which is the amount of data a user transfers from or to the hard drive. For example, for popular WD RED drives, this parameter is 180 TB/year, regardless of the disk capacity.
  • #28 17110031
    Anonymous
    Anonymous  
  • #29 17110159
    pawelr98
    Level 39  
    In the case of backblaze there is a problem in the form of disk operating conditions.

    Standard drives used in server conditions, i.e. higher temperatures, vibrations, more frequent operation.
    If someone has a smaller home server with much lighter disks, it may turn out that the service life will be much better.

    If a standard disk can withstand server conditions, it is a good sign. The issue of price difference compared to the competition. If it is small, buy it, but if it is large, I would consider it.

    You can also notice the influence of head parking on the service life. Therefore, if we want to have good service life, we should not allow the drives to be turned off.

    I have 6x2TB from Seagate in RAID6 (HP P400 256MB with battery). UPS and additional transiles (both 12V and 5V) soldered on the hard drive power lines. It has been working 24/7 with sporadic and light load (transfer limited by 1Gbit/s network bandwidth) for a long time without any problems.

    The best thing is that the disks are mounted using angle brackets from the DIY store :D . Significant savings on housing.
  • #30 17110595
    Anonymous
    Anonymous  

Topic summary

The discussion revolves around the feasibility and implications of using different HDD brands in a RAID6 configuration for enhanced reliability. Participants express mixed opinions on the practice, weighing the potential benefits of avoiding common manufacturing defects against the risks of performance inconsistencies and RAID controller errors. Some argue that using drives from different manufacturers can mitigate the risk of simultaneous failures due to shared defects, while others caution against the complications arising from differing specifications and performance characteristics. The conversation also touches on the importance of using enterprise-class drives for critical applications, the role of SMART monitoring in assessing drive health, and the significance of workload ratings in determining drive suitability for RAID setups. Overall, the consensus leans towards caution, emphasizing the need for careful selection and testing of drives in RAID configurations.
Summary generated by the language model.
ADVERTISEMENT