[00:04] apw: thanks - that worked! [00:08] anyone know if ports.ubuntu is having issues [00:11] looks like some of the ports mirrors might be down... oh well [00:17] apw: the patch is not quite right as it should grab another reference to the namespace, but it doesn't seem to blow up and does fix the bug, so i'll send a proper patch tongiht - thanks again [00:24] or does it not matter? [00:25] are we guaranteed that the cgns will stick aroudn for the duration of the mount due to task->mntns ? [00:25] that actually may be the case bu ti need to think it through [00:44] apw, shit! [00:44] (initramfs) echo "81780AD9-068C-4A80-A795-56856973B8F9" | tr '[:upper:]' '[:lowe [00:44] r:]' [00:44] 81780AD9-068C-4A80-A795-56856973B8F9 [00:45] apw, so tr trick won't work in initramfs environemt [00:52] apw, (initramfs) echo "81780AD9-068C-4A80-A795-56856973B8F9" | awk '{print tolower($0 [00:52] )}' [00:52] 81780ad9-068c-4a80-a795-56856973b8f9 [00:52] that works [01:17] apw, revised patch attached to the bug with real system boot test results [02:41] lamont, I posted the last bunch of test kernels in a directory tree. Location and how they are organized posted to the bug. [02:45] jsalisbury: awesome [02:45] I'll smash through the bisect tomorrow [07:27] do you prefer an oldskool pull request or a merge request via lp? (for xenial) === shuduo is now known as shuduo-afk [09:12] tjaalton, old skool pull request is prolly easier for people as that is what they are used to [09:14] ok, I'll do that then [11:19] apw: enjoy! :) [11:22] manjo, i've uploaded an alternative fix (to avoid adding the first awk dependency) to ppa:apw/ubuntu/initramfs-tools-test, if you could test that for me then i can get it uploaded for real later today [11:22] tjaalton, that is highly doubtful :/ [11:23] hehe [11:28] tjaalton, pththththth [12:18] hallyn, yo ... did you bottom out that cgroups fix? i don't see an email [14:27] anyone install dist-upgrade 4.2.0-30 this morning and have trouble booting? [14:27] i had to revert back to 4.2.0-27 in order to get it working [14:28] bjf, kamal ^ [14:28] HerrAmeise, what was your symtoms [14:28] you are cirtinaly the first report I have seen of issues with that update [14:29] Unity would not start up [14:29] so it went through the normal boot sequence [14:29] and then just hung forever [14:30] I couldn't even open a terminal [14:30] HerrAmeise, ugg, sounds bad. could you file a bug against the kernle "ubuntu-bug linux" from your working kernel [14:30] so I had to go to grub and boot with a different kernel version [14:30] and put whatever details you have in it, and tell us the bug# here [14:30] yup no problem [14:30] it is also worth looking in syslog to see if there was anything reported when it broke [14:30] yep i can do that one sec [14:31] i am assuming you receieved the kernel via a simple update (update manager or apt-get dist-upgrade sort of thing) [14:32] yea i did apt-get dist-upgrade [14:32] didn't build it myself or anything crazy [14:32] and its hard to not get all the pieces you need as a complete set when its done that way [14:33] (so that eliminates one possibility) [14:33] once you have a bug, i'll paste some initial kernels to test to try and figure out which of the three updates is at fault [14:33] apw, news to me. but -30 only has been in -updates for less than 24 hrs. [14:34] yep, that indeed [14:34] apw: ok, I am also running this on a VM at work if that is relevant [14:34] VMware Workstation 10 [14:35] HerrAmeise, VMs would be a common test subject but mostly in KVM [14:42] HerrAmeise, got a bug # yet ? [15:06] apw: did our uploads fir your kernel adt problem? [15:07] stgraber, yes ... final one just went green in the last 15m [15:07] (for xenial) [15:07] good! [15:19] stgraber, and we're seeing ppc64el lxc ADT failures on trusty regardless of kernel by the looks of it, is this expected ? [15:19] stgraber, and if not i'll file you anohter bug :) [15:20] apw: is there any other way to report bugs other than through Apport? [15:21] it's really a PITA [15:21] in theory you can ask apport to file the info to a blob you can move to anohter machine and submit form there [15:22] and its not just an application crash [15:25] btw i upgraded the kernel to 4.2.0-30 on my 32-bit Ubuntu VM and the same thing happened [15:25] so definitely able to replicate the error [15:25] first one was 64-bit [15:25] it is as likely a vmware realted issue as anything [15:27] true [15:28] i'll try natively when i get home tonight [15:29] HerrAmeise, the first debugging steps are to try -28 and -29 to see which of 28,29,30 are the first broken one [15:29] https://launchpad.net/ubuntu/+source/linux/4.2.0-29.34 [15:30] https://launchpad.net/ubuntu/+source/linux/4.2.0-28.33 [15:30] binary packages for those are in the librarian ^ [15:37] 33674 [15:38] oops sorry wrong window [15:41] apw: the latest ppc64el failure on trusty indicates a DC network failure [15:41] unable to reach the gpg network and cloud-images.ubuntu.com [15:42] stgraber, i'll ask for them again and see if it goes away then [15:42] the rest of your results look good so I'd say your kernel is fine, it's just the test runner having some network difficulties [15:43] I know that IS changed the squid proxy IP recently, could be that the ppc64el VMs don't have the right firewall rule or something [15:43] if it fails again, we'll involve pitti [15:43] stgraber, ack, will let you know [15:55] apw: sorry, no, had some technical difficulties. will try to send it out this morning [15:56] hallyn, the pending fixes which break you are time sensitive, so i would like to get to a place where i have a plan to add something or rip something and upload at the latest tommorrow [15:57] apw: should i send a patch first upstream or first to ubuntu-kernel@? [15:58] hallyn, either is fine, if you are confident in the fix i cna apply it while upstream grinds on it [15:58] were still in a reaosnably felxible period so we can rip it and replace it if upstream has a better idea [15:59] as i assume my other option is to rip the other thing you applied which causes the issue [15:59] the cgn i think it was [15:59] apw: http://paste.ubuntu.com/15181046/ <- does that mean anything to you? [16:00] (it kinda blocks me when the server hosting all my work keeps hanging with crap like that - from 3.13 to 3.16 and now on 4.2) [16:00] if it migth be hw then i'll just $%)(*$%)($ switch [16:01] i wonder if that could be the fuse cve i was just reviewing [16:01] oh [16:01] hallyn: do you hit the WARN_ON at the end of the cgroup_mount after your change? [16:02] which warn_on? [16:02] hallyn: was also thinking it might make sense to make kernfs to use sget_userns with init_user_ns always, haven't had time to really think it through yet though [16:02] the one in the if (pinned_sb) block [16:03] sforshee: i don't think so [16:03] sforshee, you know fuse well, is fuse_fill_write_pages() use on the write or read path ? [16:04] it ought to be write, but hey, nothing is clear in this world [16:04] sforshee: we cannot have cgroup mount use init_user_ns always, if you end up doing the comparison for the sb. [16:04] apw: write it would seem, invoked by fuse's write_iter callback [16:05] sforshee, thanks, not that cve then hallyn [16:05] sforshee, that stack trace might be something you grok better than i: http://paste.ubuntu.com/15181046/ [16:05] apw: trying to decie whether to halt my world to have the people hosting the server check the hw for 10 hours [16:06] apw: looks to me like the kernel is hung waiting for userspace to respond [16:06] hallyn, if it is always that, it look to my shallow knowledge of fuse that it is waiting on a userspace provider, and isn't interruptible [16:06] sforshee, ok you see the same as me [16:06] ok, thx :) [16:06] hallyn, so i'd not be keen to blame h/w but whatever crap is mounted on fuse [16:06] is that lxcfs :) [16:07] hm, could be. [16:07] sforshee, do we really hand things to uspace and wait in an uninterruptible way for it to respond, that sounds mad to me [16:07] in that case the only thing i can think is that the kernel builds make oom happen killing it (bc nothing else was execising fuse), [16:07] except i've got 42g ram [16:08] hallyn, then you'd have an oom in there [16:09] hallyn, and it looks to start right there, with something not it before [16:10] hallyn: going back to cgfs ... prior to my changes it was going to reuse an existing superblock. Now it still wants to but sget refuses because it's a different userns. Is that right? [16:15] apw: I think fuse does wait in an uninterruptible way to respond. But there is some kind of abort connection sysfs node to break those waits. [16:32] hallyn: it does seem to me that it will possible to hit that WARN_ON(new_sb) if using kernfs_mount_ns. Does that represent some real problem? [16:33] hallyn: also I'm not sure what you were getting at wrt using init_user_ns always [16:33] in effect that's what's happening before we have s_user_ns. But like I said I need to think it through some more. [16:38] sforshee: just to make sure, you see why i need something like it right? [16:39] looking at fs/sysfs/mount.c, i think i just need to grab/release the ns ((i suppose sb release could be done lazily) [16:39] hallyn: I think so. Previously you ended up reusing a superblock, but now sget returns EBUSY because you're in a different userns. [16:39] right [16:40] and by passing a namespace you force the kernfs test function to not match the old superblock [16:40] but there seems to be something inherent to this code that expects to reuse the superblock in some cases [16:40] is there? i was wondering that but didn't see it, [16:40] just look at pinned_sb [16:41] is there a simple way we could re-use it? [16:41] without hardcoding cgroupfs in sget_userns [16:42] well, that's where the though of doing sget_userns(..., &init_user_ns) in kernfs_mount_ns came from [16:42] *thought [16:42] or I was also considering whether maybe that check only makes sense for device-backed mounts [16:44] yeah it doesnt make sense for i.e. sysfs or proc, right? you can't get another userns's mount there [16:44] devpts? [16:45] i wonder whether this will quietly break containers doing 'mount -t devpts' without -o newinstance [16:46] well we pass file_system_type in, so i guess we *could* filter on cgroupfs; how would we tell whether it's blockdev-backed in sget? [16:47] you can't mount devpts from !init_user_ns without newinstance it appears [16:47] oh, good [16:48] there's a flag in the fs type that says whether or not it's device backed [16:48] FS_REQUIRES_DEV [16:49] jsalisbury: hi [16:49] so if (type->flags & FS_REQUIRES_DEV && user_ns != old->s_user_ns) ? [16:49] jsalisbury: finally read what you set up for me. you have outdone expectations, thanks. [16:50] hallyn: yeah, something like that. Still thinking though. [16:50] I'll smash through those sometime before the end of lunch today, expect an update in something like 2-4 hours. I'm assuming that our final kernel is top-of-branch, with the identified commit reverted? [16:53] hallyn: so essentially s_user_ns is used for 2 things. First is translating ids for the backing store, which doesn't apply to psuedo filesystems. [16:54] the second is for privileges towards the superblock. For cgroups/sysfs do we really want root in the userns to have privileges towards the superblock? [16:55] though if they aren't they can't remount [16:58] sforshee, well in the case of cgfs cgroup_mount() guards remount, but i'm not sure about proc and sysfs [16:58] i would assume so [17:00] else that would have been an issue already since we allow mount on those [17:01] sforshee: it seems we canassume it's safe if it's not dev-backed and it already has FS_USERNS_MOUNT [17:02] hallyn: assume what's safe? Skipping the checkin in sget_userns? [17:03] sforshee: yes [17:05] hallyn: I think at minimum it's probalby okay as a interim fix while we decide what the best fix is [17:10] bjf, what's the minimal amount of maas functionality is required to run kernel-maas testing for s390x? [17:11] as far as can tell that enablement is currently staggnating, and i'm wonder if that can be expedited somehow. [17:13] xnox, right now i only deal with bare-metal via maas [17:14] bjf, and it needs to boot to ssh right? [17:14] xnox, yes [17:15] bjf, and you do use the powercycle functions in maas right? e.g. poweroff/on/reboot? [17:15] xnox, yup [17:15] bjf, ok, cool. [17:17] apw, I had trouble installing from the PPA due to dependcy issues .. posted it to the bug [17:17] apw, https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1548120 [17:17] Launchpad bug 1548120 in initramfs-tools (Ubuntu) "[xenial][initramfs-tools] support uppercase and lowercase uuids" [High,In progress] [17:19] manjo, those deps are right [17:19] manjo, they are internal deps making sure the bits are at the same version from the same source package [17:21] apw, hmm I have xenial-propose enabled .. [17:22] apw, should I disable that 1st ? [17:22] manjo, nope shouldn't matter [17:22] manjo, did you add the ppa or download the .debs ? [17:22] added your repo [17:23] etc/apt/sources.list.d/apw-ubuntu-initramfs-tools-test-xenial.list [17:23] using apt-add-repo [17:23] initramfs-tools-bin_0.122ubuntu5~rc1_amd64.deb (81.7 KiB) [17:23] initramfs-tools-core_0.122ubuntu5~rc1_all.deb (116.8 KiB) [17:24] initramfs-tools_0.122ubuntu5~rc1_all.deb (84.0 KiB) [17:24] well that PPA has all three of those in there [17:24] so your deps should be found from the PPA [17:24] initramfs-tools: [17:24] Installed: (none) [17:24] Candidate: 0.122ubuntu5~rc1 [17:24] Version table: [17:24] 0.122ubuntu5~rc1 500 [17:24] 500 http://ppa.launchpad.net/apw/initramfs-tools-test/ubuntu xenial/main arm64 Packages [17:24] Candidate: 0.122ubuntu5~rc1 [17:24] Version table: [17:24] 0.122ubuntu5~rc1 500 [17:24] 500 http://ppa.launchpad.net/apw/initramfs-tools-test/ubuntu xenial/main arm64 Packages [17:25] apw, ah -bin comes from ports [17:25] OH its bloody arm64 [17:25] Candidate: 0.122ubuntu4 [17:25] Version table: [17:25] *** 0.122ubuntu4 500 [17:25] 500 http://ports.ubuntu.com/ubuntu-ports xenial-proposed/main arm64 Packages [17:25] hang on [17:25] ok ☺ [17:25] apw, welcome to my world [17:29] manjo, ok in about 10m there will a ~rc2 in there built for arm64 as well [17:30] cool [17:30] will post results to the bug === zequence_ is now known as zequence [17:47] hallyn: are you already preparing a patch for the cgfs issue or should I go ahead and make one? [17:48] sforshee: I'm looking at the code, but i'm not sure i'm doing it right [17:48] (adding an extra helper fn which tries to get the logic right of whether we need to check the user_ns) [17:49] sounds more complicated than what I was thinking ... [17:49] lemme finish tihs up and pastebin it and you can tell me i'm wrong :) [17:49] sounds good [17:51] sforshee: http://paste.ubuntu.com/15181923/ ? [17:54] seems to be buliding anyway [17:55] hallyn: I think that should work, but the FS_USERNS_MOUNT seems unnecessary [17:56] sforshee: so the idea of that is that any virtual fs which we cannot currently mount in a userns will be restricted to your own userns... [17:56] agreed not sure if it's needed, and it may be reckless... might break things... [17:56] but i wasn't certain that it woudl be safe otherwise [17:56] no shouldn't break things - it just allows what you had designed to work the way you menat it to for those filesystems :) [17:56] but if you can't mount it in a user_ns then both namespaces are always init_user_ns anyway [17:57] sforshee: oh, youdon't relax the need for FS_USERNS_MOUNT, right [17:57] so you're right, shouldn't be necessary so we don't need the helper really [17:58] sforshee: so i'll just build and test http://paste.ubuntu.com/15181976/ [17:58] if you can do a clenaer version of that then even better [17:58] hallyn: or even http://paste.ubuntu.com/15181975/ [18:00] right [18:00] do you mind sending that in then? :) [18:00] you want to test it first? [18:01] is building my version enough or do you want me to do your verbatim version? [18:03] hallyn: I'm probably going to build my version either way before I send it, so I can just give you that version to test [18:03] I hate to send patches without build testing, typos can be easy to overlook [18:10] builds taking forever [18:22] HerrAmeise, apw, bjf: So far, I'm unable reproduce Herr's issue. In a 15.10 amd64 qemu vm, I can successfully boot linux-image-4.2.0-{27,28,29,30}-generic with no apparent trouble. [18:24] kamal, thanks for looking at that [18:35] sforshee... and still building. if you get a build, so long as you can 'lxc launch cxenial x1' and x1 boots (gets >3 processes and gets an ip address) then it's good [18:37] hallyn: mine's still building too [18:45] jsalisbury: back to you. [18:45] and thanks muchly for making it less painful [18:45] lamont, thanks, I'll take a look [18:46] it's also good to have my gut confirmed. :D [18:47] lamont, I'll build a test kernel with that commit reverted, if it fixes the bug, I'll get in touch with the patch author [18:49] ack. scream when [18:52] hallyn: http://people.canonical.com/~sforshee/cgns/, updating my vm to test it now [18:53] jsalisbury: fwiw, http://global.shuttle.com/products/productsDetail?productId=1480 [18:56] sforshee: success! [18:57] i think [18:57] yeah. dunno why there is a 'lxc' cgroup there, but ... [19:00] hallyn: seems to work for me too [19:00] I'll send it [19:06] apw, hallyn: patch sent [19:08] umm yeah disregard the pull request, I'll send a new one [19:11] sforshee: awesome, thx [19:30] jsalisbury: hi, again === philipballew is now known as Guest44851 === RAOF_ is now known as RAOF