Does RAID Zero Help Parallel Plotting in Windows?

grayscale photo of group of horse with carriage running on body of water
Photo by Martin Damboldt on Pexels.com

As discussed in some of my previous posts, Plotting depends on the three main system components; CPU, RAM, and Temp Space. The best Temp Space hardware right now is the NVMe, with some exceptions. It uses the PCI-E bus to achieve blazing fast speeds. Some Chia Farmers, like myself, are lucky enough to have two of these NVMe drives. In this post, I test if its better to use these two in RAID0 or if leaving them separated is better. If you think about it, keeping them separate would mean less I/O for each drive. Although the speed is doubled in RAID, each NVMe would be touched by every active plotter instead of just half. Here is the environment for these tests:

  • Version 1.1.2 was used. Its an older version but I had to keep this version constant to reduce variables. The Chia Team is really pushing out fixes. Much thanks to them.
  • My system specs are list on my About page.
  • The drives used NTFS with a 64k allocation. The 64k allocation came as a result of my previous testing.
  • A script was used to launch all plotters at the same time. No delay or stagger was used.
  • The built-in Windows 10 RAID was used to stripe the drives. Its the easiest RAID for someone with windows to perform. You can see the crystaldiskmark results at the bottom of this post.

Lets look at the first test, 2 plotters in parallel:

2 Plotters
-r6/-b3416
Phase 1Phase 2Phase 3Phase 4Total
RAID0P1:5091
P2:5066
P1:2665
P2:2670
P1:5487
P2:5495
P1:404
P2:401
P1:13649
P2:13633
1 EachP1:4983
P2:5061
P1:2631
P2:2628
P1:5298
P2:5312
P1:390
P2:406
P1:13305
P2:13408
2 plotters in parallel. Time in Seconds

Above you can see the phase times for each plotter with the different NVMe configurations. I also display what the plotter settings are at the top left of the table. These plot times are the ones that show in the plotter log. They are in seconds and I’m removing the digits past the decimal. For this test, keeping the NVMe drives separate is faster. Not by much however, roughly 200-300 seconds. Lets proceed to 4 Plotters in parallel:

4 Plotters
-r6/-b3416
Phase 1Phase 2Phase 3Phase 4Total
RAID0P1:6389
P2:6390
P3:6391
P4:6392
P1:3060
P2:3057
P3:3066
P4:3059
P1:6257
P2:6255
P3:6250
P4:6253
P1:460
P2:453
P3:455
P4:458
P1:16166
P2:16157
P3:16164
P4:16163
2 Each
1: P1&P2
2: P3&P4
P1:6492
P2:6491
P3:6386
P4:6489
P1:3116
P2:3122
P3:2974
P4:3122
P1:6356
P2:6367
P3:5734
P4:6366
P1:473
P2:503
P3:425
P4:502
P1:16438
P2:16484
P3:15520
P4:16481
4 plotters in parallel. Time in Seconds

RAID won in this test of 4 plotters. An interesting reversal from the previous test. The RAID times are more consistent too. Keeping the NVMe drives separate in this case did worse…..except for one outlier. I’m not sure how it happened, but one time was much different than the other three. I don’t have an explanation for it, which is why I’m showing the raw data. In the final test, I performed 8 plotters in parallel. I had to drop the CPU to 3 for this one so that all plotters would not have to fight for threads:

8 Plotters
-r3/-b3416
Phase 1Phase 2Phase 3Phase 4Total
RAID0P1:10435
P2:10368
P3:10345
P4:10482
P5:10393
P6:10345
P7:10301
P8:10325
P1:3966
P2:3982
P3:3978
P4:3940
P5:3983
P6:3981
P7:3961
P8:3969
P1:7662
P2:7679
P3:7684
P4:7652
P5:7665
P6:7681
P7:7690
P8:7696
P1:584
P2:686
P3:640
P4:595
P5:599
P6:640
P7:574
P8:589
P1:22649
P2:22716
P3:22649
P4:22670
P5:22642
P6:22648
P7:22528
P8:22578
4 Each
1: P1-P4
2: P5-P8
P1:10249
P2:10375
P3:10222
P4:10572
P5:10495
P6:10261
P7:10229
P8:10584
P1:3971
P2:3932
P3:3965
P4:3846
P5:3862
P6:3964
P7:3962
P8:3848
P1:7826
P2:7812
P3:7816
P4:7788
P5:7807
P6:7820
P7:7807
P8:7775
P1:592
P2:593
P3:595
P4:703
P5:660
P6:601
P7:593
P8:913
P1:22639
P2:22713
P3:22600
P4:22911
P5:22825
P6:22646
P7:22592
P8:23122
8 plotters in parallel. Time in Seconds

Looking at the results, it is easy to add the total numbers and declare a winner. The issue is, however, that one configuration is not vastly superior to the other. Even with the previous tests, nothing screamed that one was better than the other. So is RAID0 worth it? In my opinion, no. Its an added complexity that doesn’t really do much. In my case it smoothed out the plot times a little. Now does this mean that RAID0 in every hardware configuration is a waste of time? No. This is just one dataset under one configuration. Maybe hardware RAID is better? Maybe doing 16 parallel plotters is where RAID shines? Maybe RAID shines through when using a delay between plotters? I can’t go higher than 8 because I’m limited in both RAM and CPU threads. Ultimately my recommendation is, keep your system simple. There are less things that can go wrong the simpler it is.

There is one last data point I would like to present. In my testing I accidentally ran 8 plotters on one of the 2TB NVMe drives. We’ve been told that 256GB is the temp space used by one plotter so I was assuming all the plotters would have failed but they didn’t. To my surprise, they all finished. Here is the data:

8 Plotters
-r3/-b3416
Phase 1Phase 2Phase 3Phase 4Total
Solo NVMeP1:10842
P2:10841
P3:10841
P4:10933
P5:10841
P6:10841
P7:10877
P8:10877
P1:4493
P2:4480
P3:4489
P4:4576
P5:4480
P6:4472
P7:4527
P8:4514
P1:9966
P2:9979
P3:9971
P4:9940
P5:9972
P6:9981
P7:9927
P8:9939
P1:727
P2:823
P3:809
P4:622
P5:829
P6:830
P7:787
P8:798
P1:26030
P2:26124
P3:26111
P4:26073
P5:26124
P6:26125
P7:26119
P8:26130
8 plotters in parallel, solo NVMe. Time in Seconds

These plotters all started at the same time. This means that the temp space isn’t really 256GB. It’s less than 250GB, but I don’t know the exact number. With this news, I’ve modified my formula here to reflect this change. Comparing this data with the previous one, you can see the benefit of having two NVMe drives. Its about 4000 seconds slower with just one NVMe at it’s limit. That’s more than an hour.

Check out my other blog posts here, there is tons of useful information. Keep on plotting on.

10 thoughts on “Does RAID Zero Help Parallel Plotting in Windows?

  1. Been saying this all along. don’t use raid if you can plot parallel.

    1. I have two 512nvme in raid0. in one 512 you can make one plot. In another 512 you can make one plot.
      In raid0 1tb with both, you can make 3 plots!! Size matters 🙂

  2. 1. Chia plotting does not write huge chunk of data sustainably that exceeds NVMe interface nor NAND controller limit.
    2. Chia I/O does not exceed what a decent NVMe can handle in IOPS terms.

    Result: No performance gain from RAID 0 in plotting application.

    1. So if this is the case, why is plotting time increasing as we add more parallel plots? If it can handle Chia I/O, then adding another parallel we shouldn’t see much increase in time to plot no? What’s causing it to slow down as we add more parallel plots

      1. If I had to guess, then it is your SSD.

        One Chia plotting job would need 250-300 MB/s of throughput. If your SSD can only sustain 600 MB/s, going anything above 2 jobs will slow it down.

        A decent drive like 970 Evo Plus can sustain at 1800 MB/s, doing 6 jobs in parallel is just a breeze.

  3. I am not quite sure what this test was supposed to show, but I don’t believe you have tested what you think you where testing.

    The results also kind of show this, you basically arrive at the same speed for both setups, that generally doesn’t make sense unless there is another bottleneck which is saturated before the one you are testing.

    Some points:
    – Windows RAID isn’t great
    – Windows plotting isn’t great either
    – 2 of those SSDs can still easily create a bottleneck for parallel plots
    – *NOT* staggering your plots kind of invalidates the test to start with
    (ALL parallel plots will hit the disk heavy or the CPU heavy test at the same time, making very inefficient use of available resources)
    – Why did you not measure average IO latency/IOwait times? That will give you data on how the storage performed
    – I get measuring total time can also work, but as said, you have another bottleneck you are hitting first judging from your test results and measuring actual storage latency and wait times would have shown this.
    – Why do your plot times increase the more you add? You are hitting a bottleneck (which might, or might not be storage)!

    I have 2x 5900x boxes running 4x 1TB NVMe drives in MDADM RAID0 (correctly aligned) with XFS on top of it, which do plots in 5~6Hrs when running 12 in parallel. For each plot there was maybe a minute of IOwait at maximum and generally the box sits at 1% IOwait load. Which means, storage is no longer the bottleneck and in my case the CPU has become it.

    Hope it helps in figuring out a way to re-do your testing since the current results just don’t make sense.

    1. Hi Quindor, welcome from the ChiaForum to my blog. I’ll hopefully be able to answer all your points.
      1) Currently, this is purely a Windows blog. There are people who won’t touch Linux because they are unfamiliar with it. This is a service to them.
      2) I chose to not stagger to fully realize if there was any benefit between windows raid and just using them natively. I did see some nice write spikes launching eight all at once, but none came close to the max rating of the drives. So they did hit the disk heavy but not enough to max out the PCI-e lanes. I know it could possibly be an I/O limit issue. Also, in one of my previous tests, I found that windows raid0 did perform better during solo plotting compared to one NVMe.
      3) I actually did capture the disk queue length of each drive for each test. But did not use them because the total plot times were not that significantly different. In the last test of 8 parallel plots, the separate NVMes had disk queue lengths of at most 4, while the raid had disk queue lengths of at most 8.
      4) ‘Plot times increasing the more you add’ might be a windows specific effect. I’ve seen people talk about times increasing as the more plotters are in parallel. I need to do more research to see if its windows specific.
      5) My 8 plotters in parallel finish in roughly 6.5 hours, accounting for the 10%, it would be 6 hours maybe. I can’t do 12 in parallel due to not having enough ram which I hopefully can test out next week when it comes in. Then I might update this with a 12 v 12 test.
      6) I won’t be redoing the testing here since its Windows specific and that anyone can do without extra hardware. I plan to reference this if we find out that NTFS is doing something horrible and another file system is better. Oh, that reminds me, brtfs does better than XFS as storagejm has found. Have you tried that with your RAID?

Comments are closed.

%d bloggers like this: