/srv/irclogs.ubuntu.com/2015/05/12/#ubuntu-kernel.txt

genkgoHello. We have a huge problem with Ubuntu 14.04 VPS inside a Hyper V platform. Running Windows Server Backup (VSS) changes the filesystem into a read-only filesystem. It is not a specific VPS problem: all three Ubuntu machines have exactly the same problem. In the same cluster we have a CentOS machine that is not having any problem at all. The Ubuntu machines are all on 3.13.0-52-generic. Because the machines are in production, our08:34
apwgenkgo, is there a dmesg error at the time of the switch to read-only ?08:37
genkgoapw: no, there is no log at time the machine switches to read-only. That is exactly what it makes so hard. The problems occur randomly (at least I am not able to see a pattern). During some backups we have these logs: http://pastebin.com/MvGuDyRL. But also during other backups we have these logs: http://pastebin.com/sExsZKhV.08:42
genkgoapw: but we never see any log before the machine goes into read-only filesystem. those logs only occur during backups that finish successfully do not cause the filesystem switch. they maybe (and I guess so) an indication of other problems.08:43
apwgenkgo, and then what happens, the filesystem reports a full error and moves the filesystem r/o ?08:43
genkgoapw: no errors, the filesystem is in read-only mode. so every service that tries to save files is down.08:45
genkgoapw: it is a simple webserver: nginx + apache + php-fpm. our http requests cannot be delivered.08:46
genkgoapw: we have tried multiple backup strategies: all fail. and the weird thing is: the centos inside the cluster is doing very fine.08:47
genkgoapw: i now see that there is a difference between centos and ubuntu. the ubuntu machine is using etx4 while centos uses xfs.08:49
apwgenkgo, could you file a bug against linux for me, and i will ask someone who has a hyper-v system to see if they can reproduce the behaviour08:52
apw"ubuntu-bug linux"08:52
genkgoapw: will do. is there anything you can advise me?08:54
genkgoapw: to fix the problem temporarily?08:55
apwgenkgo, when you say it goes read-only what makes you say its read-only if we have no diagnostics saying that ?08:56
genkgoapw: ok, when I was logged in to the machine, certain commands fail due to read-only filesystem.08:57
apwgenkgo, and yet the end of dmesg output does not indicate it going read-only ?08:58
apwgenkgo, could the filesystem be frozen for the backup, i know some of the backup bits do that on hyper-v08:59
apwbecause /boot being ext2 and it not supporting freeze was an issue for a while09:00
genkgoapw: hmm, now I am having doubts. I believe I did not look at dmesg while the system was in read-only mode, but only after the reboot. And then there was nothing on read-only.09:00
apwgenkgo, right, if you just reboot it won't get flushed to a permenant file, if it had gone read-onoly09:01
genkgoapw: alright: so while being in the read-only I should immediately run dmesg09:01
genkgoapw: I will do that and see what the logs say.09:02
apwgenkgo, for sure, as the end of that might indicate something kernel side triggering read-only09:02
genkgoapw: I do not know Hyper V well enough to know whether it freezes the filesystem.09:03
genkgoapw: I will not file the bug untill I have the dmesg output09:04
apwgenkgo, no, i think our best bet is that there is a kernel diagnostic at the end of dmesg, if you have a dev environment or something you can tickle this in09:04
genkgoapw: will try to do that. the problem is the randomness. it is hard to reproduce the issue. last time (last night) the system went into read-only after the last machine was finished with backup.09:06
apwgenkgo, odd indeed09:06
genkgoso there were 4 machines in the backup sequence, no problem at all during backup. when last one finished, another machine got into read-only filesystem.09:07
apwgenkgo, that sounds pretty odd doesn't it09:07
apwthe one doing the work is ok, and another is collateral dammage09:07
genkgoapw: sounds extremly crazy to me.09:07
genkgoapw: I think the Hyper V host sends a signal to the guest machines after backup, other something like that.09:08
genkgoapw: thanks for the help anyway. lets see what dmesg has to say during read-only filesystem.09:12
apwgenkgo, yeah, that is one reason i am wondering about the freeze bits, but yes, lets gets a dmesg and cat /proc/mounts as well09:15
apwgenkgo, also make sure we know what kernel version we are talking about in the report09:17
genkgoapw: yes, at the moment is is 3.13.0-52-generic09:17
apwgenkgo, i would also look at whether the backup is requesting fs freeze, as there is definatly a hyper-v interface to ask the kernel to freeze and unfreeze filesystems09:18
genkgoapw: what does /proc/mounts tell us09:23
genkgo?09:23
genkgohmm I see :)09:23
apwgenkgo, lots of things09:24
apwincluding whether we think it is read-only or not09:25
genkgoapw: regarding the freeze bit, we tried multiple backup strategies, all failed09:25
apwsadly i know next to nothing about VSS backup09:26
apwgenkgo, how long does the backup take btw, across all the nodes 09:26
genkgo1 hour and a few minutes09:26
apwand everything is working in parallel with that until the last second when the backup ends, and sometimes one of the members breaks09:27
apwwell indeed we need to see if there is anything in that dmesg when it occurs, as i suspect your ext4 filesystems are mounted to go r/o on any error09:28
apwwhy xfs wouldn't have the same hissy fit at a failed IO is an interesting question, assuming it sees them too09:28
genkgoapw: the backups are not simultaneously, it takes 1 hour to backup all 4 machines in a row09:29
lifelessgenkgo: do you have a VSS agent running in Ubuntu ?09:32
genkgolifeless: yes, I believe so.09:32
genkgolifeless: /usr/lib/linux-tools/3.13.0-52-generic/hv_vss_daemon is running09:33
lifelessgenkgo: does it log anything?09:33
lifelessalso https://msdn.microsoft.com/en-us/library/aa384589(v=vs.85).aspx is a little terrifying09:34
genkgolifeless: let me check that, I have not found any logs of the vss daemon before09:37
genkgolifeless: that picture frightens me too!09:38
genkgolifeless: I do not see any hv daemon logs09:52
apwgenkgo, i do not believe that forms separate logs, it should log to syslog09:58
apwi also cannot see how this daemon guarentees it is able to run if for instance it gets paged out during the backup09:58
apwnor does it seem to report anything on thaw failures, hrumph, not helpful10:00
genkgoapw: we have planned to replicate one of the machines today (we do not want anymore downtime) and backup that machine hourly until things go wrong10:08
genkgoapw: hopefully i can give additional information afterwards10:09
genkgoapw: anyway, thanks a lot so far!10:11
apwgenkgo, sounsd purfect10:12
=== hugbot is now known as swordsmanz
genkgoapw: INFO: task rs:main Q:Reg:605 blocked for more than 120 seconds. What could that mean?13:30
apwgenkgo, that implies a task is unable to finish in the kernel13:46
genkgoafw: during boot I see also the following message: init: plymouth-upstart-bridge main process (298) terminated with status 113:57
genkgoand scsi scan: INQUIRY result too short (5), using 3613:57
genkgoafw: would it make sense to dump the complete output of one of the machines?14:05
genkgocomplete output of the boot sequence (/var/log/syslog)14:05
genkgoafw: http://pastebin.com/zGqiMkAc that's the complete syslog of one of the Ubuntu VPS machines since it booted this morning (07:45 local time).14:13
genkgooh, I did |grep kernel |grep -v UFW14:15
nessitajsalisbury, hello again. Regarding LP: #1201528, I reproduced the issue by playing a youtube video. Audio is completely lost, no way to recover unless I reboot. Added to the bug debug logs from pulseaudio. Anything else I can do to help?14:20
ubot5Launchpad bug 1201528 in linux (Ubuntu Saucy) "[INTEL DP55WG,Realtek ALC889] - Audio Playback Unavailable" [Medium,Won't fix] https://launchpad.net/bugs/120152814:20
=== bdmurray_ is now known as bdmurray
jsalisburynessita, thanks for the update.  I'll review the bug again14:50
nessitajsalisbury, thank you!14:53
jsalisburynessita, Just to confirm, you reproduced the bug on Vivid?14:54
jsalisbury**14:57
jsalisbury** Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting14:57
jsalisbury**14:57
nessitajsalisbury, yes, vivid and kernel 3.19.0-16-generic14:57
jsalisburynessita, thanks 14:58
cristian_cjsalisbury, hello14:58
cristian_cjsalisbury, are there any news about build of that kernel you told me?15:05
jsalisburycristian_c, not as of yet, but should be soon15:21
hallynjjohansen: apw: danwest reports https://bugs.launchpad.net/apparmor/+bug/1408833 appears to be back in 14.04.215:24
ubot5Ubuntu bug 1408833 in AppArmor "broken postinst test for uvtool-libvirt on utopic" [Undecided,Confirmed]15:24
cristian_cjsalisbury, ok15:26
jsalisbury##16:55
jsalisbury## Kernel team meeting in 5 minutes16:55
jsalisbury##16:55
apwhallyn, how long has .2 been out there ?16:57
hallynnot a clue16:58
hallyndanwest: could you (or whoever ran into that bug) do an 'apport-collect 1408833' on the affected host?16:59
hallynthat should save apw/jjohansen some time (assuming it works)16:59
apwhallyn, danwest, the fix we applied still seems to be applied at least17:02
infinityapw: It was never fixed on 3.13, afaict, maybe danwest's seeing it on the trusty kernel, not the hwe-u kernel.17:08
infinityapw: At least, the bug log implies it was only fixed in 3.16 (and I hope carried forward), no indication that it was backported to older kernels.17:09
apwinfinity, but .2 had the utopic kernel on ?17:11
apwno ?17:11
smbapw, the server iso but who knows about cloud-image which maybe they use17:12
infinityapw: Well, that depends on how you define ".2", doesn't it?17:12
infinityapw: lsb_release on any up-to-date trusty host will tell you it's 14.04.217:13
apwinfinity, ahh good point17:13
infinityapw: What kernel you have installed is irrelevant.17:13
apwheh ... yay for useless monikas17:13
infinity14.04.2 isn't useless, it's just wrong for people to claim it relates to the HWE stack we happen to release at the same (ish) time.17:14
infinityBut I stopped fighting that battle a while ago.17:14
=== jsalisbury changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Ubuntu Kernel Team Meeting - Tues May 19th, 2015 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! If the question is should I file a bug for something, likely you can assume yes. || Channel logs: http://irclogs.ubuntu.com/
infinitysmb: Cloud images typically don't use HWE kernels, until a cloud requires it for some reason (like, the Azure precise images moved to lts-s because they had to, and then lts-t)17:15
smbinfinity, Yeah. I was just thinking about it for the reasons you said for naming that 14.04.2 as well. But forgetting that upgrade also results in the same17:17
infinitysmb: Yeah, 14.04.2 is a point in time in the archive, the only relation it has to HWE stacks is the ISOs.17:19
infinityWhcih does make it confusing when people talk about it, but the whole HWE stack thing is confusing in general.17:20
smbTrue. More reasons to insist on bug reports with proper data (or it never happened) ...17:21
=== adrian is now known as alvesadrian
hallyndanwest: ^  so looks like we need data;  thx17:37
danwestinfinity, HWE is confusing - I still don't truly get it - not obvious what it is (just a kernel??), where and how I get it, etc...17:57
danwesthallyn, what data? that apport-collect tries to open a browser which turns out to be something text based like lynx17:58
danwestapw, infinity, hallyn: 3.16.0-30-generic is my current kernel17:58
infinitydanwest: Knowing which kernel is a big help already, yes.  If you could at least include that in the bug report.18:02
danwestinfinity, will do 18:03
=== kees_ is now known as kees
=== eseifert is now known as seiferteric

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!