There is not much posted about how to actually get a high performance yet cool 4-way machine to work. I’m working on this build at Pc Parts Picker. It is definitely a niche product but here are the goals:

  • Reliability. If you are running two week ML jobs, you do not want the thing to crash. That means adding ECC memory and that means going to Xeon. It means lots and lots of clean power as well.
  • Performance. Well with these things we are experimenting, but clearly these jobs take a long time. Saving 10% makes a nice snappy machine or gives you 4 fps in a game, but it would give you nearly a day and a half on a 14 day machine learning run. The keys are 128GB+ memory, m.2 NVMe disks, 4 slots with 16x PCI express lanes and  4 double width slots at least.
  • Quiet and cool. These are are little bit competitive, but a machine like this is going to be loud and also hot when all those GPUS are running. Having an efficient case and good quiet fans really matters.
  • Connectivity. It would be nice to have USB 3.1 with it’s 10GBps and ideally Thunderbolt 3 (or at least Thunderbolt 2) with their 40 and 20GBps respectively for drive arrays if the data sets get too big. And with big data sets having dual 1Gbps Ethernet or ideally dual 10Gbps would be amazing.

Heck you are spending $7,000 on a computer, but at least it isn’t a $120K box from nVidia!

Intel E-1620 or E-1650 V3 Processor

Well there is only one family of processor that really works, the Haswell-E. These chips are overclockable and about the same prices as their desktop variants, the i7-5xxx series. The value leader is the e1620 V3 for $300 running at 3.5 GHz that should overclock to 4Ghz or even 4.4 with good cooling. Much harder to find is its bin sorted e1630 V3 at 3.7GHz which might overclock better at $400. Finally the fastest chip is the hexacore e1650V3 at $600.
These all give you ECC and 40 PCI Express lanes. You need every lane you can get with these builds. The only way to figure out if hexacore is needed (there is much folk lore on this) is by some benchmarking. I suspect the e1620 V3 is good enough but since most of the cost of the machine is the GPUs, it doesn’t make that much of a difference.
Note that we are using the older Haswell-EP (Efficient Performance) line which does overclock. The latest current version is the Broadwell-EP (aka v4) but these are just about to be announced but do not hold your breadth improvements are on the order of maybe 5% for the new chips. Gone are the days of rapid Intel improvement but Broadwell was definitely sandwiched between Skylake coming and delays. This makes it hard time to buy systems right now.

nVidia GTX-1080 Pascal GPUs

Well there is a big tradeoff now. The GTX1080 is coming May 28 but has just 8GB (did I just say that!) of VRAM and it is much more power efficient and at least 2x as fast. If you must have 12GB then you are stuck with the older generation TitanX. So if you need more VRAM I’d say wait for the updated TitanX. The other nice thing is that these things need less power at 180 watts per board vs 250 watts per board for the older Maxwell architecture.

Kingston or Corsair RDIMM ECC 16GB x 4

With ECC, you do not get as fast overclocking but you do get up to 128GB of memory if you pick the right motherboard. The two value leaders seem to be the Kingston 4×16 set and the Corsair 2×16 Registered ECC at DDR4-2133. No one seems to have tried overclocking these parts. Machine learning seems to want to buffer very large sets of images in main memory, so 64GB seems like the minimum. And you definitely want to get them as x4 as these are four channel systems.
If you need more than that, then you need to go to a true server board that supports LRDIMMS that is low power Registered. These are incredibly expensive, but you can go to 32GB chips and with 8 slots that gives you 256GB of RAM?!

Samsung 950 Pro and Sandisk Extreme 960 SSDs

Samsung 950Pro or XP951 512GB m.2 SSDs. What’s clear is that these are currently the big winners in the performance battle. They require 4x PCI Express and you have to make sure that the motherboard doesn’t disable any slots when you put one in. But with NVMe, you get super high speed 2 GBps performance plus 10x reduction in latency for short IOs. These are the ideal boot drivers and processing drives. Beware than some motherboards require a daughter card to use this and we won’t have any open slots.
You want to arbitrage between the consumer 950Pro and the XP951 which is an OEM version that can sometimes be $10 cheaper.
Sands Extreme 960 1TB SATA SSD. If you have large data sets, you need to put them on a backing drive and although expensive at $300, this is the fastest SATA drive made running easily at 600 MBps

eVGA SuperNova P2 1600 or SeaSonic 1250 PSU

There isn’t much choice at the top end and these things are going to need lots of power, so at $320 for the eVGA or $180 for the SeaSonic it is a fortune, but looking at Haswell-E (145 watts) plus 4xGTX-1080 (4×180 watts = 720) plus most folks do not realize this but power supplies age and provide less, so you need a 20% buffer and when overclocking you draw more. So the 900 watt or so base needs at least 1100 watts. And with overclocking, you probably need a little over voltage so call it 1200 watts minimum if you go to GTX-1080 or 1500 watts if stay on Titan X.
Remember power supplies are most efficient in at 50% use, so you do want to size it right. Also there is a some difference between Gold (87% at 100% usage), Platinum (89%) and Titanium (90%) if you plan to run this 24×7 so you need at least Gold. eVGA helpfully makes this easy with a T2 ($380), P2 ($333) and G2 ($300) so you can see the price tradeoffs, but as usually the P2 is a decent price (save the planet!) although the G2 is a good budget choice if you can call it that. For the lower 1200 watts there is the Seasonic X 1200 Platinum for $230 or the SeaSonic Series X Gold for $180

Scythe Kotetsu cooler

For a build like this with Haswell-E power is a real issue. With the Skylake’s their lower TDP of 95 watts makes it easy but 145 watts is quite a bit more. At SilentPCReview, this is currently the air-cooled winner. It cools better than my favorite Noctua by 2-4C (which is quite a bit) and is as whisper quiet. The best coolers are pretty unavailable. So the Prolimatech Genesis is really hard to find. The Thermalright Silver Arrow seems to be obsolete. So the next one on the list is the Scythe Kotetsu. I’ll report on how this does vs the trusty Noctua’s we’ve been using.

Fractal Design Define XL R2

While we have gotten a really big case, the fact is that you can usually reuse cases as they have not changed in years. But if you want a new one that is quiet, the silentpcreview.com sees like the best source. In our case we want lots of cooling, so the so called gaming cases seem the best. We are giving up some noise for good airflow but here are some other good quiet choices:

  • Fractal Design Define XL R2 (73 liters, eATX). This is a huge case but has eATX support and is half the price of the Rave RV01. The Arc XL is a stable mate with more expensive
  • Silverstone Raven RV01 (eATX). This is a huge case but does support the eATX and the Thermalright Arrow so its really the choice here. This is pretty much required for 4-way SLI but does not support eATX.
  • Fractal Design Define R5 (55 liters, ATX) . It is a SilentPC Review editor’s choice. The R5 is a monster case and the S model doesn’t have the front drive bays. The Define S is $30 cheaper, so good if you want to put a lot of water coolers in or do your own modding or put a water pump in it. It is larger and will fit the Thermalright but not an eATX board.
  • Silverstone Raven RV05 (64 liters, ATX). This uses a rotated motherboard and is very cool. The upgrade is an aluminum wrap around called the FT05. It doesn’t accept eATX but does like ATX but in most cases the extra slot of the eATX is wasted anyway with 4 dual slot GPUs. This is a smaller case and it has height limits so the Thermalright Silver Arrow will not fit.

Since we are it, might as well refresh our microATX and mini-ITX case recommendations which can fit full size 305mm (12 inch) graphics cards for mini machine learning setups:

  • Lian Li  PC-Q18 (12 liters, mITX, $120). This allows a full card slot unlike the Antek small boxes I’ve been using, so you can put a single graphics card in.
  • Silverstone TJ08-E (28 liters, mATX, $110), this replaces the quite similar SG-10B that we have been using successfully and uses a rotated motherboard for improved cooling.

ASRock X99 WS-E or ASUS 99-E WS/USB3.1 Motherboard?

I spent the most time researching this choice. You want to get true 4 slots with 16x lanes available. This pushes you to find motherboards with with the PLX PCI switch chips. Note that with 4 cards in, you cannot get any additional small PCI Express cards in. Those motherboards with m.2 accessories for instance do not work.
There were not a lot of ideal candidates and availability is not good. I wonder if this is because there is going to be a model switchover with new Skylake Xeon parts coming. These will not be overclockable and require the C series of server parts. So in order of recommendations:

  • ASRock X99 WS-E ($500 at Newegg with 4 stars but out of stock, $567 from Amazon, $443 SuperBiiz) . It does look like the WS-E is hard to get and is only at Superbiiz if you dare. It uses two PLX PEX 8747 to give you 4×16 PCI and there are two M.2 that disable SATA ports. The only thing missing are the USB 3.1 and Thunderbolt 3. And the reviews are not super but at least not as bad as the ASUS 99-E
  • ASUS X99-E WS/USB3.1 ($494 Superbiiz, $558 Amazon). This is a variant that adds USB 3.1 but is only available form third parties for $800 from Amazon?! Although B&H shows it is available for $530 this coming week.
  • ASUS X99-E WS  ($600)  There are quite a few flavors of this workstation oriented system but the one on the site is dual gigabit ethernet and has dual PLX switch to give a full 4×16 in slots.  This is what nVidia uses on their dedicated machine learning build. It has terrible reviews on Newegg though for reliability and it is currently out of stock everywhere so it may be obsolete. Too bad! It does have dual PLX chips so delivers 4×16 lanes for GPUs. Although all X99 and Z97 .boards from ASUS now support NVMe and it does have a 4x PCIe m.2 slot
  • ASRock X99 Extreme11 ($553 at Newegg with three stars after $40 rebate). Like the ASUS X99-E WS, this has two PLX 8747 chips but like that model, it uses the last 8 lanes to go directly into an 8 drive SAS system. It is in the ATX form factor, but be sure to get something that fits EATX as the last slot with have a graphics card that overhangs the motherboard. It also had two m.2 PCI Express x4 slots as well (these disable various SATA drives which makes sense for this build we do not lead lots of drives) and dual gigabit ethernet adapters. It is also one of the few boards with onboard SAS drives support with the LSI 3008 chip so you can put in enterprise class driver. The main issue is the number of DOA and other problems with it. It draws nearly 250 watts itself.
  • ASRock WS-E/10G  has a single M.2 as 8 lanes are used for the two 10Gbps controllers. It’s not clear if you need 10GBps, but it’s pretty cheap to have on this board given that a single NIC is right now $250 and takes up 4x and a full slot. Also remember that with 10Gbps (so total possible bandwidth is 2GBps!)  you rapidly hit memory limits as a RAM to RAM transfer peaks at about 2.5GBps. Put another way, you need lots of multiple streams to fill up 10GBps so it is mainly good for servers.]

Monitor

I’ve loved the Philips BDM40 as a 40 inch monitor with VA panel and most importantly true 60 Hertz plus full 4:4:4 color at $700. Now there is the 43 inch version with an IPS panel, so need to see if that is better. Lists for $800 so worth it to get a little more screen and a better panel.
 

I’m Rich & Co.

Welcome to Tongfamily, our cozy corner of the internet dedicated to all things technology and interesting. Here, we invite you to join us on a journey of tips, tricks, and traps. Let’s get geeky!

Let’s connect