=== tmpRAOF is now known as RAOF
=== gerald is now known as Guest76602
=== ming is now known as Guest29940
tomtiWhat is the easiest way to find the latest official kernel version for a particular ubuntu release? I tried apt-cache show|madison|showpkg but none give what I want10:18
genkgoafw: We spoke yesterday on the read-only filesystem issue with ubuntu (3.13/ext4) vs centos (3.10/xfs) inside a HyperV platform. We created a new VPS and started to backup the machine hourly. This machine just went into read-only state and I grabbed info from dmesg: http://pastebin.com/48CK60Hi and /proc/mounts http://pastebin.com/WwLej7r7 as you told me.10:48
genkgoafw: the machine is still in read-only mode, so if you need more info, please let me know.10:50
infinitygenkgo: Nothing earlier in dmesg before that?   Looks like the (fake) disk driver exploding.11:16
genkgoinfinity: yes, http://pastebin.com/DTmvZgHS11:19
genkgoinfinity: the lines I just added happen during every VSS backup11:20
genkgoinfinity: I explained yesterday to afw that I have four machines, three ubuntu 3.13.0-52 with ext4 and one with centos 3.10.0-123 with xfs. The Ubuntu go into read-only mode randomly while the CentOS machine is doing fine.11:22
genkgoinfinity: Sorry, go randomly into read-only while HyperV is creating a backup of the four machines (in a row, not simultaneously). That could be during a backup of the specific is created but also at the end of the complete backup process.11:24
genkgo* of the specific machine11:24
xnox.... hyperv backup does freeze on to the filesystems and devices.11:26
xnoxand expects the freeze & unfreeze to work...11:26
genkgolifeless showed me a picture of the HyperV VSS process: https://msdn.microsoft.com/en-us/library/aa384589(v=vs.85).aspx confirming information is exchanged between the hyperv cluster and the guest machines at the end of a backup11:26
genkgoxnox: yes, that is confirmed on the link I just pasted11:27
genkgobut why do the ubuntu machines go into read-only mode while the centos is doing fine?11:27
genkgoxnox: and what does the dmesg http://pastebin.com/48CK60Hi output tell me?11:28
genkgoexcept from being in read-only: we knew that, I cannot find a cause in the message.11:29
xnoxwritting journal was aborted, then time jumps 200ms, journal is attempted to be read and is incomplete.11:30
genkgook, and this means filesystem inconsistency and therefore ubuntu kernel switch the filesystem to read-only?11:30
xnoximho that smells like "sync" did not complete, yet "freeze" returned and backup kicked in thus expoding at "thaw" and remounting ro11:30
genkgoxnox: do you have any advice what to do? I have machines in production that are affected by this, causing downtime.11:32
genkgoafw asked me yesterday to file a bug if I knew what was going on (dmesg). Now I have that information. Is it a bug? Is it Ubuntu Kernel related?11:33
xnoxdunno, i would have asked cking to stress test freezing/unfreezing vms under I/O workload to figure out what's going on.11:33
xnoxit should be reproducible outside hypervm11:33
xnoxnot sure who afw is, you mean apw?11:34
genkgoxnox: sorry, I mean apw indeed :)11:34
apwgenkgo, well you have a pretty clear disk error there11:35
apwend_request: I/O error, dev sda, sector 6512725611:35
apwthat IO failed so the filesystem wen't offline11:35
genkgoapw: ok, so you guess bad hardware?11:35
apwgenkgo, i would like to see more of the dmesg before that11:35
apwgenkgo, it is a VM so it is likely not actual h/w failure, it presumably is talking about a virtual disk11:36
xnoxalso it would be interesting to know how hyperv initiates vm freeze... given that we probably lack fsfreeze and xfs_freeze userspace tools in 14.0411:36
genkgoapw: http://pastebin.com/DTmvZgHS contains the other lines (I also showed you yesterday). Would you like me to include boot sequence too? Because there is nothing more in between.11:36
apwgenkgo, if you showed afw yesterday, then i'd have not noticed11:37
xnoxgenkgo: use pate.canonical.com and show everything =)11:37
genkgohehe :)11:37
xnoxgenkgo: also paste.ubuntu.com works nicer with pastebinit utility ;-)11:37
apwgenkgo, remind me of the kernel version again 11:39
apwgenkgo, and do the ones which do not fail also report those changed operating definition11:43
genkgoapw: yes, they do11:44
apwi see you are using 3.13 kernels on these hyper-v guests, we are mostly producing images with HWE kernels installed for hyper-v11:45
apwbecause the hypervisor interface is evolving so very fast at the moment11:46
apwgenkgo, that one also shows an aborted journal11:46
apw[66392.076569] end_request: I/O error, dev sda, sector 6512725611:46
apw[66392.076610] Aborting journal on device sda5-8.11:46
genkgoapw: correct, this is is full output of dmesg11:47
genkgoof the same machine11:47
apwor is that a change for each backup, and only the last output if the only one which failed11:47
genkgoyeah, we replicated a machine as test machine yesterday, started backup hourly until the system went into read-only, which just happened11:47
genkgothis is the full output from boot yesterday untill now11:48
genkgoapw: we are using 3.13 kernels for all ubuntu machines (the centos one is using 3.10)11:48
apwgenkgo, is the centos running the same workload as the ubuntu machines in the backup set ?11:49
xnoxgenkgo: ....  centos is xfs which always had freeze support, e.g. ext2 only gained freeze support in 3.19 kernel.11:50
genkgoyeah, every machine has other purposes and therefore other services, but yeah, I think there is no difference in load11:50
xnoxgenkgo: plus centos version numbers are a bit pointless, as 3.10 can have eons of cherrypicked patches.11:50
apwgenkgo, i mean are they doing the exact same things?  i'd say the one which has failed had an IO in flight when the change request popped out and that has made it go pop11:50
xnoxand we default to mounting ext2 filesystems with the ext4 driver. so logs are different.11:50
genkgoxnox: I noticed we are on version 3.10.0-123 so yeah I imagined the pataches11:51
xnoximho you should _only_ be using hwe kernels on hyperv.11:51
genkgoapw: no, in that case they are doing really different things11:51
xnoxapw: centos is usind different filesystem type....11:51
genkgocentos is doing mail (imap and smtp)11:51
xnoxas in no IO at all...11:52
genkgowhile two ubuntu machines are handling http requests11:52
xnoxwhich logs all the time to disk...11:52
genkgothe final ubuntu is helper machine with all kinds of services (tomcat / libreoffice converter etc.)11:53
genkgoxnox: so you are saying we should switch filesystem?11:54
xnoxgenkgo: no.11:55
xnoxgenkgo: i am saying it's uneven comparison with centos. oranges and apples.11:55
genkgoxnox: ok11:55
xnoxgenkgo: you should switch to our hwe kernels, and check if you can reproduce this with 3.19 - vivid's kernel.11:56
xnoxgenkgo: and azure people want ubuntu to use 3.19 kernel and better... to get the ext2 freeze support11:56
xnoxcause default server config uses ext2 + lvm volume group and they can't freeze that for backup across the board.11:57
xnoxon other clouds we default to hwe kernels. e.g. on ec2 and similar.11:57
genkgoxnox: ok, I will do that. switching to xfs makes no sense?11:59
xnoxgenkgo: we cannot do that, no.11:59
xnoxgenkgo: we are talking about all ubuntu vms launched in azure, not just your three vms.11:59
genkgoxnox: allright, I never meant to talk about all ubuntu vms12:00
genkgoxnox: so I leave to fs to ext4 and upgrade to HWE kernels12:00
genkgobeing 3.1912:01
genkgoxnox: this page does not indicate there is a 3.19 https://wiki.ubuntu.com/Kernel/LTSEnablementStack12:02
genkgoxnox: is this the ppa ppa:canonical-kernel-team/ppa I should use?12:04
xnoxit's in proposed12:05
genkgoxnox: thank you very much for helping me out12:06
genkgoI will install it and see what happends12:06
apwgenkgo, if this is a test box, i would suggest that you run a test using the linux-lts-utopic 12:17
apwas in theory that is what is being tested in majority in azure12:17
genkgoapw: I am already installed 3.19 on the test machine, using sudo add-apt-repository ppa:canonical-kernel-team/ppa, sudo apt-get install linux-generic-lts-vivid12:18
genkgohmm, now I am into dependency troubles12:26
genkgohmm, this dependency issue is harder than I had before12:37
genkgodpkg-deb: error: subprocess paste was killed by signal (Broken pipe)12:39
genkgowhile trying to install tools and could tools12:39
genkgotrying to overwrite '/usr/bin/perf', which is also in package linux-tools-common 3.13.0-52.8612:44
genkgoI see this was a problem before12:44
ubot5Ubuntu bug 1410278 in linux (Ubuntu) "package linux-cloud-tools-common 3.16.0-29.39 failed to install/upgrade: subprocess installed post-installation script returned error exit status 1" [Medium,Confirmed]12:44
genkgoI cannot remove reinstall 3.1912:46
genkgoxnox: how should I install hv-kvp-daemon-init in combination with vivid kernel?13:15
genkgoif I just do apt-get install I asks me to install the cloud tools of the older kernel13:16
genkgoI now have 3.19 + tools + cloud tools13:16
genkgobut no hv-kvp-daemon-init13:17
apwlinux-cloud-tools-lts-vivid perhaps ?13:24
genkgoapw: that is already installed13:26
genkgoapw: http://paste.ubuntu.com/11113627/13:27
genkgoAnd I am currently on 3.19.0-17-generic.13:28
genkgoxnox: apw: There is no current release of this source package in The Vivid Vervet (hv-kvp-daemon-init).14:04
apwgenkgo, hv-kvp-daemon-init should not be needed14:04
apwthose are carried in the kernel now14:05
genkgoah alright, perfect14:05
apwand /usr/sbin/hv_kvp_daemon  should start it, and it should be being started automatically from upstart14:06
genkgoapw: there is a binary over there14:06
apwdid it start correctly thought14:07
genkgoapw: it is not in the list of processes, I only see hv_vmbus_con hv_vmbus_ctl14:08
genkgoapw: I do see some additional errors in dmesg when booting14:09
genkgovisorutil: module is from the staging directory, the quality is unknown, you have been warned14:09
genkgoand some visorchannel errors14:10
apwgenkgo, what does "initctl status | grep hv" sat14:10
genkgoinitctl: missing job name14:11
apwsorry initctl list | grep hv14:11
apwthis is trusty right?  so it is running upstart ?14:11
genkgoapw: this is vivid14:11
apwoh now we are getting confused, i thought it was trusty with lts-vivid installed ?14:12
genkgothis 14.04 with vivid14:12
apwso trusty right14:12
genkgo:) yes14:12
apwwith the hew vivid kernel14:12
apwand "initctl list | head" has jobs listed14:13
genkgoapw: yes, there are jobs14:13
apwls -l /etc/init/hv-*14:13
genkgoand I installed the kernel by sudo add-apt-repository ppa:canonical-kernel-team/ppa, sudo apt-get install linux-generic-lts-vivid14:13
apwand do you have the hv- init configuration ?14:14
genkgols: cannot access /etc/init/hv-*: No such file or directory14:14
genkgoapw: I guess not, before I just install cloud tools and tools together with the hv daemon14:15
genkgoapw: should I add that file?14:16
apwwell if you have linux-cloud-tools-lts-vivid installed you should have linunx-cloud-tools-common installed as a dependancy14:16
genkgoapw: I have linux-lts-vivid-cloud-tools-common installed14:19
genkgonot linux-cloud-tools-common14:19
genkgoif I do, it tries to install the 3.13.0 one14:19
apwi don't believe i expect there to _be_ an linux-lts-vivid-cloud-tools-common14:20
apwand yes i expect it to use the 3.13 common one, as it is common to _all_ versions14:20
apwit only carries the wrapper scripts which are common14:20
apwand the same between them all14:20
apwwell that seems bust to me14:21
genkgoapw, so I should remove the linux-lts-vivid-cloud-tools-common14:21
genkgoan install the common one again14:22
apwif it will let you yes, as i think the vivid one is empty.  it should also not exist14:22
apwif it is a depenency of linux-cloud-tools-generic-lts-vivid or whatever you installed, then it is broke14:22
genkgook, so now I have common tools and cloud tools (3.13.0-52.86) and the vivid kernel14:25
=== JanC_ is now known as JanC
genkgohv-kvp-daemon stop/waiting14:25
apwit think this kernel may have broken tools dependancies14:25
genkgosame for vss and fcopy daemons14:25
apwi am looking at it14:25
=== txspud` is now known as txspud
genkgoapw: I changed the linux-lts-vivid-cloud-tools to the common one14:27
genkgobut the hv daemons are not starting14:27
apwyep, and it has deinstalled the actual daemons14:28
apwi think this is just broken14:28
apwand i am not sure the utopic one is any better14:28
* apw checks properly14:28
genkgoapw: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/141027814:29
ubot5Ubuntu bug 1410278 in linux (Ubuntu) "package linux-cloud-tools-common 3.16.0-29.39 failed to install/upgrade: subprocess installed post-installation script returned error exit status 1" [Medium,Confirmed]14:29
genkgoapw: is it broken indeed?14:59
=== zyga is now known as zyga-phone
=== zyga-phone is now known as zyga
=== kloeri_ is now known as kloeri
smoserhey... wonder if someone could confirm my suspicion / conclusion in bug https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/144354218:00
ubot5Ubuntu bug 1443542 in curtin (Ubuntu) "curtin race on vivid when /dev/sda1 doesn't exist" [Undecided,Confirmed]18:00
smosermaybe, wonder if there is a way to acheive what i want there, without monitoring udev hooks myself or somethingto that effect.18:04
apwsmoser, welll i can say whne you do the reread ioctl the udev message has been queued before we return to you19:40
apwwhether udev would include pending ones it has not yet read in its idea of pending is still in the air19:41
smoser   udevadm settle [options]19:42
smoser       Watches the udev event queue, and exits if all current events are19:42
smoser       handled.19:42
smoserwhat else would be the point, apw ?19:42
apwsmoser, i'd say it ought to see them, to my reading of that english, which is of course not the source code19:47
apwsmoser, all i can really for sure say is if you did the reread ioctl, and that returned 0 then it will have completed the:19:49
apw        kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);19:49
apwthat is that that has been queued to all listeners19:50
smoserapw, k. thanks.19:51
smosernow i'm back to not knowing what was wrong.19:51
smoseri think you shot my theory19:52
smoserrbasak, ^ just fyi.19:52
apwsmoser, and from what i can see in udev that even it we got out of the kernel and into udevadm settle before udev is woken to read the event, we will read the event before checkign if we are idle and responding20:02
apwto the settle20:02
smoserapw, so i think you're saying that it should work like i originally expected / coded for.20:03
smoser a.) echo "2048," | sfdisk /dev/sda20:04
smoser b.) blockdev --rereadpt20:04
smoser c.) udevadm settle20:04
smoser d.) expect /dev/sda1 to exist20:04
apwsmoser, though i guess it depends if more than one is produced20:04
apwsmoser, and whether you are waiting for the second one20:05
smosermore than one?20:05
apwyes, the event i listed was the "device has changed" for i assume sda in this case20:05
apwis it sda1 you are waiting for ?20:05
smoserso are you saying that the kernel would emit "device_has_changed(sda)", then return from blockdev, then subsequently emit "device_has_changed(sda1)" ?20:08
smoserthat would seem unfortunate.20:08
apwsmoser, oh ... but ... actually the interface for settle is a bit odd, it is actually using a file in /run20:08
apwsmoser, no it queues them all i believe before returning 020:08
smoserand then udevadm settle *should* wait until it has processed the entire queue20:09
smoserat least it says it will.20:09
smoser(or 120 seconds, but i dont htink thats the issue here)20:09
apwso i think although it is using a file, it is interlocking with udevd by pinging it, so they at least think they are doing the right thing20:12
apwdo you get the events in the end in your scenario ?  20:12
apwsmoser, ^20:12
smoserwell, all i have to go on is the bug at this point.20:13
smoserand the code i pointed to20:13
smoserapw, thanks for your help.20:20
rbasaksmoser: I think beisner said he can reliably reproduce it?20:20
smoseryeah, but i can't have access at the moment.20:20
rbasakI guess maybe the next step is to log udev events and compare the timing of those to the timing of the commands20:21
smoseryeah... given apw's assesement, i think maybe we're in a different path than i originally thought. 20:22
rbasak<apw> that is that that has been queued to all listeners20:23
rbasakapw: does that definitely mean that it's visible to udev in userspace by that point?20:23
apwrbasak, to my understanding of the netlink code yes20:24
rbasakI know that's what you're saying; just want to eliminate the possibility of there being some other queue in kernelspace in the way20:24
rbasakOK, thanks.20:24
rbasakThen I wonder if there's a race in udev between reading that and handling "settle".20:24
apwrbasak, there may be, but it is at least claiming to handle the proposed race20:25
apwrbasak, but i also don't think we have any proof the right thing was actually done yet ... ie that the do appear20:25
apw(the events)20:25
rbasakI am curious enough to dig into udev's source, but I'm busy this evening20:26
* rbasak should go20:26
=== pgraner is now known as pgraner-afk
smoserrbasak, apw http://paste.ubuntu.com/11119254/20:54
smoseri can get that to fail.20:55
smoserBLKRRPART: Device or resource busy20:55
smoserwaitfor after partition2 failed20:55
smoseri think that the script is doing all sane things.20:55
apwwhat says BLKRRPART: De...20:56
smoseri thikn20:58
smoserbut i can patch to make sure20:58
apwso the wait is bound to fail, as you didn't actually do the partition reload20:58
apwwhich might indicate something has one of the partitions open20:59
apwif the blockdev failed, then it didn't change anything20:59
apwand didn't emit anything to wait for20:59
smosersorry. i have to run. oi'll look more tomorrow.21:00
nessitajsalisbury, hi, quick question, in the audio bug you mention kernel /v4.1-rc3-vivid/ but I only see v4.1-rc2-vivid in http://kernel.ubuntu.com/~kernel-ppa/mainline/21:08
nessitajsalisbury, shall I try v4.1-rc2-vivid or v4.1-rc3-unstable21:08
=== pgraner-afk is now known as pgraner
jsalisburynessita, I would suggest v4.1-rc321:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!