For some of the tinkerers and gamers out there, optimization comes with the territory. The feeling that you are squeezing every ounce of performance from a system and leaving nothing on the table is a sweet one. The same can be said for Chia Plotting. It has so many switches and knobs that its almost irresistible to figure out what each one does. To do this however, like the image above, is a lot of trial and error. You’ll bend nails and break nails with different hammering techniques. In this post, I’ll talk about each aspect of plotting and my experiences with experimenting with them.
Lets start at a high level. Chia Plotting has two markers; Total Plot Speed and Total TiB per Day. Plot Speed is more like a badge of honor that you can display. “I can pump out a plot in X seconds!…with no other plots running.” Its almost like the speed running community. The real stat you want to maximize is TiB/Day. This stat tells you how fast you can fill up that fat hard drive with plots and begin to farm them. So how do you do this? There are three main factors to TiB per day; CPU, RAM, Temp Drives. This may get lengthy, but stay with me.
Number one, CPU. The amount of threads a CPU has will determine one part of how many plots you can run in parallel. Parallel meaning, how many plotters your can run at the same time on your system. We’re talking total threads here, not cores. In the plotter settings, you can elect how many threads you want to dedicate to it. From experience, 2 threads is way faster than 1 thread. Always elect 2 if you can. With my CPU (5900x – 24 threads) I did a speed test at 2 threads and 3389 ram on an NVMe and came to 15510:
I then proceeded to up the CPU threads while holding everything else constant. 4 threads was thirty minutes faster than 2 threads. 6 threads, however, was only five minutes faster than 4 threads. And in my case 8 threads and beyond was actually slower than 6 threads. There are diminishing returns past 4 threads. My theory for why 6 threads is better than 8 threads could be due to the CCXs that the 5900x has.
Some additional info, plotters work in four phases. You probably have noticed this in the plotter logs. The thread config setting only affects the first phase. Phase two, three and four are all single threaded. Your plotters can technically over subscribe the CPU and it won’t crash, it will just take longer to generate the plots. With my 5900x, I’m running 12 plotters with four threads each, kicking off 2 plotters per forty minutes until reaching 12 plotters. I put the forty minutes in there so that there is time for plotters to be well into phase 1 before the other plotters start.
Number two, RAM. This also has its hand in determining how many plots you can run in parallel but in a different sense. Something that is a bit undocumented is that RAM requirements change depending on how many threads you assign to the plotter. The default of 3389 is OK for 2 threads. If you are going to use 4 threads, I’ve found that 3408 works perfect. 6 threads? 3416. 8 threads? 3432. How can you tell if its enough RAM? Lets take a look at one of the plotter’s lines when its generating the plots in phase one:
Bucket 0 uniform sort. Ram: 3.261GiB, u_sort min: 1.125GiB, qs min: 0.281GiB.
Lets break this down:
- Bucket 0 – This is the current bucket its working on. How many buckets it needs to work on is set in the plotter configuration settings. I have messed around with this setting by setting it to 32, 64, and 256. No difference in overall plot speed. But, changes the RAM requirements drastically. 64 buckets needs double the RAM requirement of 128 buckets. 32 buckets needs double the RAM requirement of 64 buckets. 256, however, needs half of the RAM requirement of 128. You can see the pattern here. If someone is RAM limited, I could see how going to 256 may reduce your RAM requirements. It does…but you are also doubling the I/O requests on your temp drive. Use caution here.
- uniform sort. – This tells you what sorting method it used for the bucket. Uniform sort means that the entire bucket was able to fit into memory and the processor can work on it and put it back. There is another method, the second one is called QuickSort, or QS for short. QuickSort breaks up the data into smaller pieces so that it can fit into the allocated RAM.
- If you see QS here, don’t be alarmed. Some of the plotting actually requires QS. Usually the last bucket and some other parts of phase 3. You will be able to know its mandatory if at the end of the line it says “force_qs: 1”. If you see a QS and “force_qs: 0” then that means it used QS because not enough RAM was available. This is also not a bad thing. I am RAM limited and recently found that adding more plotters with reduced RAM increased my TiB/day than less plotters with the optimal RAM allocation. My plot times increase by 1-2 hours, but there are more of them so its more over time.
- Ram: 3.261GiB – This is the amount of RAM the plotter is configured with. Sometimes this might show up as half the amount but again, its a plotter defined section of the plot process, not a configuration error. Also, note the notation here! this is GiB. The RAM you configure on the plotter is in MiB. This is a real easy way to check if you set the correct RAM. Just start a plot and wait for the first bucket to be processed. With 128 buckets configured, you’re shooting for a number higher at or higher than 3.250GiB.
- u_sort min: 1.125GiB – OK, here is a juicy bit of information. This is the minimum amount of RAM needed to perform a uniform sort on the current bucket. Got more than this? Perfect. Got less? I’m doing QuickSort. This is the metric you use to optimize your RAM settings if you want minimal QS to happen.
- qs min: 0.281GiB – As you may have guessed, this is the minimum size needed to perform a QuickSort on the bucket. What happens if you have less than this? It crashes and you lost the plot being worked on. Do not oversubscribe your RAM because you run the risk of losing all the plots being worked on. With Windows, I normally leave 4 GBs of free ram for the Operating system to that the swap file isn’t used. This has, so far, worked very well.
My system has 32GB of ram. So, in order to have 12 plotters in parallel and maintain the 4GB of free memory I had to set my plotters to 2400 RAM. This means that the plotters must do QS on a portion of the buckets that require the highest amount of ram. But this is OK because in the end, my TiB/day is higher with 12 @ 2400 RAM vs 8 @ 3408 RAM.
Finally, Temp Drive. The last of the trifecta of plotting. Each plotter requires 256GB (238GiB) of temp space. It used to be 356GB before version 1.0.4, so it is much improved. Speaking of which, I’m sure you have noticed, the notation of GiB. For those uninitiated, the difference is that GiB represents 1 073 741 824 bytes and GB represents 1 000 000 000 bytes. What makes this confusing though is that Windows labels space as “GB” but actually uses GiB in the background. Hard drive manufacturers use pure GB notation. Hence why a 12TB Hard drive shows up as 10.9TB in Windows.
Back on topic, Temp drive speed and Temp drive interface type is important. There are many types but here is a boiled down list.
- Non-Volatile Memory Express (NVMe) – This is the best interface to plot with. Running on the PCI-Express bus, it has the highest speeds and typically the highest IOPS available. It is a solid state drive, this means that write endurance is something to consider. In my machine I’m using two Inland Premium 2TB drives (amazon affiliate link). I chose these drives from recommendations of the community. Their write endurance is good (3200 Terabytes Written) and their price was also good (at the time, $220). These also do very well with multiple plotters using them for temp drive space (I have 6 plotters per 2TB NVMe at the moment).
- If you go with these, ensure you have a heat sink attached to it. The plotters put a good stress on it and you want running temps to be less than 60 degrees Celsius so that it hits its life expectancy.
- These drives use 4x PCI-E lanes each. Processors have a certain amount of PCI-E lanes available. You need to ensure that you have enough PCI-E lanes for the amount you want to use.
- Also, some motherboards don’t have on-board M.2 slots for these drives. Know that there are PCI-E to M.2 cards that will allow you to run these on older Motherboards.
- Finally, older motherboards may have multiple M.2 slots but some may only run in SATA mode instead of NVMe mode. Please check your motherboard manuals. This reduces speed significantly.
- Solid State Drives (SSD) – These are 2.5″ Hard drives that run on the SATA Bus. These will produce plots slower than their NVMe counterparts (with the exception of Enterprise-Grade SSDs) but still faster than a spinning disk hard drive. I haven’t really tested how well it does with multiple plotters, The highest I got to was two plotters on my SSD. Even then, those two plotters produced plots slower than the four plotters on my NVMe. Once again, look at the write endurance if planning on purchasing one.
- Hard Disk Drive (HDD) – These are the standard 3.5″ Hard drives that having spinning disks inside. These are not great at plotting. It will get the job done, but, you can only have one plotter at a time. Trying two plotters will slow it down significantly due to seek times. Some people have been successful using an External USB3 HDD as a temp drive however. Definitely doable, but slow.
I have two of the Inland Premiums so in my system I have 6 plotters on each, spaced forty minutes apart from each other. With all of these settings and equipment, I roughly get 3.6TiB/Day (12 plots every 8 hours).
Here is another Pro-Tip that has given me good results. Using a staging drive as the final directory with the plotters. A staging drive is where all of the plotters put the plots and then I have a robocopy loop script to move those over the network to my farming machine. This essentially mitigates the potential backup that can be caused if plotters are writing to a spinning hard drive directly. I use a 1TB consumer NVMe for this purpose. The plotters finish writing their plots to the staging drive in 2 minutes (as shown in the image showing plot speed) and proceed with the next plot while robocopy takes the 15-20 minutes to transfer it over the network. Below is the batch script I use to do this. Just place this in a new notepad document and save it as “plotrobocopy.bat” and select “All Files” when saving.
@echo off :loop set "source=D:\plot" set "destination=\\<Your 2nd machine>\<your plot folder>" robocopy "%source%" "%destination%" /mov *.plot timeout /t 30 goto loop
Before running the batch file, connect to the folder you’re sharing on your second machine. Then replace the source path with your source path and the destination with your destination path. To run the script, just double click it to open. The script checks every 30 seconds for a new file to “move”. Once the copy is done, it will delete the plot from the source.
This should be a good starting point to try and optimize you system. A tool that has helped to monitor CPU, RAM and Temp Directory usages has been HWInfo. I downloaded the portable version and when it boots up open the sensors. This gives you mins and max for each data sensor. The cool thing however is you can right-click a sensor and “Show Graph” and you can see a nice graph of the sensor. I don’t turn on any logging because that will use Disk Space. I monitor the Graphs and tweak settings. Good luck to you all, optimizing in Chia is not for the faint of heart.
Edit: Below is the PowerShell script I use to kick off my plots:
For an explanation on how the script functions, visit my page here.