Back on Christmas Eve I struggled to get this board to do much of anything. When it did boot it’d bluescreen Windows with a useless ‘ IRQL NOT LESS OR EQUAL ‘ error. I took it that the board was crap, and just shelved it.
Today however I’m working on another project and I need to emulate a ‘datacentre’ deployment so I’m stuck looking for machines with RAM, and of course cores. They can be slow I don’t care, but I need to run a TONNE of VM’s. I need egress, ingress, routers, policing, domains, email, servers, various databases, some container infrastructure to provision some other apps and all kinds of crap. I’d planned on maxxing this board out to 256GB or more, but it wasn’t playing nice. And now that it’s like the end of the world out here, getting more RAM from China really isn’t going to happen.
So this time I tried something different.
I have this Dell r710 server with a bunch of memory but it’s CPU’s are frankly lacking. I took the 32GB I had reserved for the Jingsha and swapped out 64GB from the Dell to put into here. I also picked up some massive GAMEMAX GM-1650 power supply which I figured would be more than enough for the dual processor board. The PSU clearly was built for a miner, and I had to use some really lame Y cable for the CPU power. I wasn’t sure what would happen if any of it even worked. My expectations were pretty low, and the first few times the board didn’t appear to do anything at all.
Then suddenly it beeped!
I shoved in a Windows install USB, and it actually booted this time!
I bought some more of those ALSEYE 120mm cpu radiators as they couple with socket 2011 just fine.
Although finding an E-ATX case that isn’t some ‘bling surprise’ is kind of difficult. It’s annoying all these ‘glass window’ cases, and other nonsense. What was wrong with IBM AT BEIGE?!
The lack of a boot diagnostic LED display really hurts this board. Clearly there was something about the memory I had it doesn’t like. It ran fine in the Huananzhi dual x79 board, and it runs fine in the Dell r710.
So yeah. Now I have 3 machines with 64GB of ram, and one with 96. It’d be easier to order but here we are.
So 6 weeks later the Jingsha finally did something useful.
One thing to mention as well is that like others have mentioned while running Linux it’s not uncommon to freeze, reboot and other fun things. Meanwhile Windows is fine. What is going on here?!
I enabled NUMA thinking if something was going wrong maybe it’d be isolated to a single NUMA node, and not take down the entire machine. I’m not convinced I was right, HOWEVER I did capture this error message!
Feb 19 07:31:27 rancher kernel: [ 5.820094] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 7: cc00008000010093
Feb 19 07:31:27 rancher kernel: [ 5.821328] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 7: cc00008000010093
Feb 19 07:31:27 rancher kernel: [ 6.812122] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 7: cc0069c000010093
Well that’s not good. So what is more interesting, is that I entered the BIOS after hammering F2/DEL like a typical end user and found this in the memory settings:
As you can see the current memory speed is 1333 MHz.
However as you can see, I’m using hynix 8gb 2rx4 pc3-10600r-9-11-e2 memory sticks, which means they should be clocked down to 1066 MHz. It’s probably a bit premature to write this, but I’m 30 minutes up, which is a record running Linux on the Jingsha.
It feels like RAM is the new Lupus. it’s never Lupus, but it’s always Lupus.