Hi All,
We have a nice new server in pre prod that we want to use in prod as new hyper-v host. Only problem is it is not stable.
Server Specs
Dell PowerEdge R730XD
100% up to date firmware and drivers. Dell support confirms there is no hardware problems after running offline hardware diagnostics.
Server is 2016 Std with Desktop Experience (10.0.14393 build 14393). Installed Roles and Features, HyperV, Windows Server Backup, SNMP. (Yes i know we should be running core for hyperV, however customer loathes core as has had nothing but 5 years of problems trying to manage old server 2008 r2 core install via powershell, so we are under specific instructions to run desktop on it so they can better manage it.)
OS is running on the s130 BIOS RAID Controller. Server has two addtional PERC730 raid controller for running data storage over a split back pane. OS is RAID1 on two SSD in the rear expansion ports of the server. Windows s130 driver version is 4.2.0.12.
Issue Is this:
Server runs fine for a few days and then when you go to reboot it, it makes out like it is shutting down, but never actually does. Remains pingable, however RDP doesnt work, directly connected monitor is blank. KB and mouse lights like num lock go on and off, no mouse pointer. idrac virtual console say's cannot connect to host. Eventually we decide to cold boot by holding in the power, at this point, graphics all pop up briefly on the monitor then server shuts down.
Most recent incident of this occurred after server had been running for 5 days and server manager was used to install Windows backup. Post installation we went for a reboot and then it hung. It has also previously randomly rebooted.
Windows logs dont show much. A VSS error 12889 "Volume shadow copy service error: Unexpected error while calling GetstorageDependenyInformation. hr = 0x80070002, The system cannot find the resource specified.
A few timeout errors in the system logs for services like hidserv, Ncbservice, NlaSvc, ScDeviceEnum, all have 7011 event ids.
Only two other issues I can see which I don't believe are related to my problem is the Broadcom Drivers for Gigabit ethernet are only available in version 17 for Server 2016 but version 20 for server 2012.
Also the intel SSD's that the OS runs on, report a manufacture date of 2006, which i think is someone just making a typo and should be 2016.
Final thing is I note this update is available but has to be downloaded installed manually. It says in the fixes that it fixes and issue with 'engaged reboot' whatever that is. Should probably just go and install it.
support.microsoft.com/.../windows-10-update-kb3216755
Anyone have any suggestions??? OR a similiar experience?
Thanks,
PS - I have a similiar post on MSoft Technet forums.