Building a Machine Learning machine

Well, if this summer you need some heavy iron to train models, here are some tips. The main issue is that both AMD and Intel are in a transition, so availability of the really cool hardware is delayed.


So you have two choices, one is to stick with the last generation CPU and use the latest generation GPUs where the main compute lives that means:

  • X99 Motherboard. You want one with extra PCI switches so each graphics card see a full 16 lanes to the RAM for loading images. The ASUS workstation board does this and we’ve used successfully.
  • Broadwell-E processor. If you get the Xeon 1650 V3, you can even overclock it to get about 30% more peak power. However it would if you need a lot more threads running then you need more cores. For instance the 18 core 2699 V4 runs at 2.2GHz CS the 4.3GHz you get from overflowing the above so in theory if you relatively few threads you should go to the $3k vs the $600 chip.
  • nVidia Tesla cards. This is the big change, depending on how much VRAM you need, that’s either the Titan or the 1080 Ti.

if you can hold a little bit then it makes sense to use the upcoming Skylake X chips which will have many more lanes and cores:

  • X299 Motherboard (yes they skipped the X199 for some reason). This has Optane support for hybrid hard drives.
  • Skylake X. These are coming in the second half of the year and will be monsters with lots of cores (way past the 4 cores in the current Kaby Lake X). You can even overclock some of them. You will need to spend at least $1k to 44 pcie lanes. And the 1 core monster is $1k

Finally there is AMD with Rizen which has more cores at lower clock speeds. And with lower prices

  • Rizen 1900X. It is 8 cores and about $500 vs the Kaby Lake version. Also all supports ECC with this Much memory you need ECC
  • Threadripper. Ships later this month 16 cores and 60 PCI Express lanes. Not sure the price but perhaps $1k
  • Epyx. This is 32 cores and 128 PCIe lanes so perfect for big multithreaded jobs.
  • X399 motherboard for the above

Net net some tough tradeoffs. Main advice is to hold off if you can until July to see these new parts. Limp by in x99 if you must. But it will be interesting to see what works better an Epyx vs Kant Lake X or HAswell-EP vs Rizen 7