=== himcesjf_ is now known as him-cesjf | ||
dijuremo | Tried to use 4.4.0-116 kernel on the DELL T3610 with A16 BIOS and latest microcode installed and it also hangs on first attempt to log in to GUI | 15:40 |
---|---|---|
dijuremo | uname -r | 15:40 |
dijuremo | 4.4.0-116-generic | 15:40 |
dijuremo | dpkg -l | grep microcode | 15:40 |
dijuremo | ii intel-microcode 3.20180312.0~ubuntu16.04.1 | 15:40 |
TJ- | dijuremo: do these various PCs share a common GPU maker? | 15:58 |
TJ- | dijuremo: or, put another way, is this freeze GUI specific or is there a way to reproduce the freeze without the GUI starting, via console or SSH for example? | 15:59 |
dijuremo | TJ: With the laptops, the GPU was the Intel one part of the CPU. The freeze is usually triggered in some occassions after X starts. In some other after login in (during login in itself) or after log out. | 16:07 |
dijuremo | TJ: Most of the workstation machines have either a Geforce or Quadro card. | 16:07 |
dijuremo | TJ: Most of our Geforce series are eVGA branded, no idea who makes the Quadro cards for DELL. | 16:08 |
TJ- | dijuremo: if we could reduce the proof to something simple without GUI it'd be much easier to track down the cause, over SSH would even discount it being console/video related | 16:13 |
dijuremo | TJ: So I do not know how to trigger the freeze from a console. Any suggestions on what to run? We started noticing when running X related things, ie lightdm or sometimes even we can survive a log in, but the machine freezes after loging out when lightdm restarts. | 16:14 |
dijuremo | TJ: We first notice the issues when we applied updates, the machines would boot up and freeze at the lightdm login screen | 16:15 |
dijuremo | # uptime | 16:16 |
dijuremo | 12:15:41 up 54 min, 1 user, load average: 0.00, 0.00, 0.00 | 16:16 |
dijuremo | So the machine has been up for that long at the login screen without freezing. | 16:16 |
dijuremo | I am logged in to it via SSH | 16:16 |
TJ- | dijuremo: maybe get it do a lot of I/O? "sudo find / >/dev/null" | 16:17 |
dijuremo | It completed without freezing | 16:20 |
TJ- | dijuremo: trouble is we don't know if that is a reasonable way to provoke the issue or not :s | 16:22 |
dijuremo | Si I am running prime95, started 12:22 | 16:22 |
dijuremo | top - 12:23:11 up 1:02, 2 users, load average: 9.58, 3.42, 1.25 | 16:23 |
dijuremo | That should use lots of CPU and memory | 16:23 |
TJ- | of those are OK, it's worth trying the same commands from a direct console. If there's an issue with the GPU side that might trigger it | 16:24 |
dijuremo | I gotta go to a meeting so will let prime95 torture the hardware for 1.5 hours... if it has not crashed, then we can confirm it is related to GPU/GUI | 16:24 |
TJ- | dijuremo: good plan | 16:25 |
dijuremo | For the record, we are currently on 4.4.0-116 kernel | 16:27 |
dijuremo | Still running... | 17:57 |
dijuremo | top - 12:23:11 up 1:02, 2 users, load average: 9.58, 3.42, 1.25 | 17:57 |
dijuremo | I guess I should try and get lightdm out of the equation, perhaps log in one of the virtual terminals and run startx and see if that freezes the machine... | 17:59 |
dijuremo | Oh well, machine froze. I had two ssh sessions going into it, one running prime95, the other top. I quit top, exited the ssh session and then the machine froze | 18:03 |
TJ- | dijuremo: with no console activity? | 18:08 |
dijuremo | prime95 was running as I disconnected my ssh session and then it froze | 18:13 |
dijuremo | I had two separate ssh sessions | 18:14 |
dijuremo | I did not get to try login in the console and running startx yet. | 18:14 |
dijuremo | This was as I was going to do that, I disconnected one ssh session, was going to stop prime95 and then go directly to the machine to try login in and running startx. | 18:15 |
dijuremo | Now I rebooted into 4.4.13, sshd in on shell 1, started prime95, then on shell2 I tried to ssh, saw the motd and now frozen. | 18:17 |
dijuremo | So it is more prone to freezing with 4.4.13 than it is with 4.4.0 | 18:17 |
TJ- | 4.4.13 or 4.13 ? | 18:18 |
dijuremo | Sorry, typo more prone to crash with 4.13.0- than 4.4.0-x | 18:38 |
TJ- | dijuremo: can you monitor temperatures? I'm wondering if this freeze is actually an overheat | 18:40 |
dijuremo | I usually do that with gkrellm... not so sure how to do it form the command line... will have to look | 18:41 |
dijuremo | Package id 0: +69.0°C (high = +85.0°C, crit = +95.0°C) | 18:44 |
dijuremo | And it just hung again.... | 18:45 |
dijuremo | Package id 0: +69.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | Core 0: +64.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | Core 1: +67.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | Core 2: +65.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | Core 3: +67.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | Core 4: +69.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | Core 5: +67.0°C (high = +85.0°C, crit = +95.0°C) | 18:45 |
dijuremo | I do not think it is overheating... well. Thiis is it for today, gotta run to a meeting... :) | 18:45 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!