=== himcesjf_ is now known as him-cesjf [15:40] Tried to use 4.4.0-116 kernel on the DELL T3610 with A16 BIOS and latest microcode installed and it also hangs on first attempt to log in to GUI [15:40] uname -r [15:40] 4.4.0-116-generic [15:40] dpkg -l | grep microcode [15:40] ii  intel-microcode                             3.20180312.0~ubuntu16.04.1 [15:58] dijuremo: do these various PCs share a common GPU maker? [15:59] dijuremo: or, put another way, is this freeze GUI specific or is there a way to reproduce the freeze without the GUI starting, via console or SSH for example? [16:07] TJ: With the laptops, the GPU was the Intel one part of the CPU. The freeze is usually triggered in some occassions after X starts. In some other after login in (during login in itself) or after log out. [16:07] TJ: Most of the workstation machines have either a Geforce or Quadro card. [16:08] TJ: Most of our Geforce series are eVGA branded, no idea who makes the Quadro cards for DELL. [16:13] dijuremo: if we could reduce the proof to something simple without GUI it'd be much easier to track down the cause, over SSH would even discount it being console/video related [16:14] TJ: So I do not know how to trigger the freeze from a console. Any suggestions on what to run? We started noticing when running X related things, ie lightdm or sometimes even we can survive a log in, but the machine freezes after loging out when lightdm restarts. [16:15] TJ: We first notice the issues when we applied updates, the machines would boot up and freeze at the lightdm login screen [16:16] # uptime [16:16] 12:15:41 up 54 min,  1 user,  load average: 0.00, 0.00, 0.00 [16:16] So the machine has been up for that long at the login screen without freezing. [16:16] I am logged in to it via SSH [16:17] dijuremo: maybe get it do a lot of I/O? "sudo find / >/dev/null" [16:20] It completed without freezing [16:22] dijuremo: trouble is we don't know if that is a reasonable way to provoke the issue or not :s [16:22] Si I am running prime95, started 12:22 [16:23] top - 12:23:11 up  1:02,  2 users,  load average: 9.58, 3.42, 1.25 [16:23] That should use lots of CPU and memory [16:24] of those are OK, it's worth trying the same commands from a direct console. If there's an issue with the GPU side that might trigger it [16:24] I gotta go to a meeting so will let prime95 torture the hardware for 1.5 hours... if it has not crashed, then we can confirm it is related to GPU/GUI [16:25] dijuremo: good plan [16:27] For the record, we are currently on 4.4.0-116 kernel [17:57] Still running... [17:57] top - 12:23:11 up  1:02,  2 users,  load average: 9.58, 3.42, 1.25 [17:59] I guess I should try and get lightdm out of the equation, perhaps log in one of the virtual terminals and run startx and see if that freezes the machine... [18:03] Oh well, machine froze. I had two ssh sessions going into it, one running prime95, the other top. I quit top, exited the ssh session and then the machine froze [18:08] dijuremo: with no console activity? [18:13] prime95 was running as I disconnected my ssh session and then it froze [18:14] I had two separate ssh sessions [18:14] I did not get to try login in the console and running startx yet. [18:15] This was as I was going to do that, I disconnected one ssh session, was going to stop prime95 and then go directly to the machine to try login in and running startx. [18:17] Now I rebooted into 4.4.13, sshd in on shell 1, started prime95, then on shell2 I tried to ssh, saw the motd and now frozen. [18:17] So it is more prone to freezing with 4.4.13 than it is with 4.4.0 [18:18] 4.4.13 or 4.13 ? [18:38] Sorry, typo more prone to crash with 4.13.0- than 4.4.0-x [18:40] dijuremo: can you monitor temperatures? I'm wondering if this freeze is actually an overheat [18:41] I usually do that with gkrellm... not so sure how to do it form the command line... will have to look [18:44] Package id 0:  +69.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] And it just hung again.... [18:45] Package id 0:  +69.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] Core 0:        +64.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] Core 1:        +67.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] Core 2:        +65.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] Core 3:        +67.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] Core 4:        +69.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] Core 5:        +67.0°C  (high = +85.0°C, crit = +95.0°C) [18:45] I do not think it is overheating... well. Thiis is it for today, gotta run to a meeting... :)