[00:12] <mwhudson> there's nothing in trusty-proposed currently that's going to eat my system, right? :)
[01:23] <Unit193> Hello, Xubuntu plans to upload a xfce4-power-manager to correctly build a plugin, and it's stuck/waiting on Bug #1361459.
[03:37] <jamin> xnox, there's a second breakage in oem-config
[03:39] <jamin> I've worked around it, looks like a change with python3 as the code looks the same as what was working in python2, but FDs are too aggressively closed, resulting in the handle to /dev/urandom being closed before a tempfile can be made
[03:52] <SpamapS> wow.. getting 107kB/s to archive.ubuntu.com from my 30Mbit connection here in Los Angeles..
[03:52] <SpamapS> where do the archive ops people hang out?
[04:19] <jamin> https://bugs.launchpad.net/ubuntu/+source/oem-config/+bug/1362920
[04:27] <jamin> xnox, bug 1362920 contains what appears to be the final blocker to getting oem-config to work on server installs again
[04:39] <pitti> Good morning
[04:40] <pitti> infinity, ogra_: no, I just need VERSION_ID and NAME in /etc/os-release for apport, nothing else; I changed the rest just for consistency
[04:41] <pitti> infinity, ogra_: we can change some bits back if necessary; but RELEASE ceratinly should not be 14.10, that'd be a lie?
[04:41] <pitti> and codename, too; 14.09 != utopic
[04:41] <pitti> sergiusens: what is a flo?
[04:43] <pitti> sergiusens: what do you mean with "ignored"? the rule isn't being applied, or udisks still treats it as a removable disk?
[04:44] <pitti> cjwatson: triggering autopkgtests from buildds is a bit early, as they need to be published and installable
[04:44] <pitti> cjwatson: running them *on* the buildds only works for tests which are simple enough to work in a chroot
[04:44] <pitti> cjwatson: while that's true for the majority, a sizable chunk need a container or even a full VM
[04:50] <pitti> lifeless: thomi is asking about an update of python-testtools to latest upstream; is it ok if I do that in Debian, or want to do that?
[04:50] <thomi> oh, I just asked in #subunit :D
[04:51] <thomi> parallel nagging :)
[05:25] <cjwatson> pitti: I don't mean on the buildds as in during normal package builds
[05:26] <cjwatson> pitti: I mean as a different job type that happens to be dispatched by launchpad-buildd
[05:27] <cjwatson> pitti: So it could do anything that can be jammed into launchpad-buildd, though if it needs starting a fresh VM as part of the tests then we'd have to think about whether nested virt is going to work ...
[05:28] <pitti> cjwatson: I tried nested KVM some six months ago, it was devastating :(
[05:28] <cjwatson> pitti: (Or run on the non-virtualised builders, but those don't scale well and ultimately we should be planning for them to go away)
[05:28] <pitti> maybe utopic's kernels and qemu are better, but for now we try to avoid it
[05:28] <cjwatson> pitti: Does it need to run a VM under control of some other part of the job, or would it be sufficient if the entire job were run in a fresh VM?
[05:28] <pitti> cjwatson: but actually the direction has been to distribute the jobs through rabbitmq and use a proper cloud (currently HP cloud)
[05:29] <cjwatson> pitti: Launchpad's virtualised builders are a proper cloud, FWIW, but OK ...
[05:29] <pitti> cjwatson: a fresh temporary VM is what we need; the control can even happen from a remote machine (although a local machine is more robust and avoids the assumption that ssh works all the time)
[05:30] <pitti> cjwatson: I mostly know that just running a gazillion jobs on the 4 machines that we currently have doesn't work
[05:30] <pitti> every gcc upload causes jamming if there's also upgrade and dkms jobs running in the background
[05:31] <cjwatson> We have three machines running all of Launcompute nodes running all Launchpad's dvirt builders right now, soon to be six
[05:31] <cjwatson> urgh, let's try that again
[05:31] <pitti> cjwatson: anyway, if we can/want to use the "LP builder cloud", that seems fine
[05:31] <cjwatson> We have three compute nodes running all Launchpad's virt builders right now, soon to be six
[05:32] <cjwatson> Well, if you have a solution already with HP cloud, then maybe there are other more useful things to focus on
[05:32] <cjwatson> I just thought it might be worth some consideration
[05:33] <pitti> cjwatson: ah, not yet "have"; the CI airline is using it, and vila has worked on running autopkgtests there
[05:33] <pitti> but it's not deployed yet
[05:33] <pitti> cjwatson: will the LP cloud sustain 50 job requests in parallel? it might quickly run into the same scaling problems?
[05:34] <pitti> cjwatson: I'm not emotionally attached to using one cloud or the other; I'm just eager to get rid of jenkins and manually administrating servers
[05:35] <cjwatson> pitti: Seems to sustain full parallelism for its number of builders right now; the thing that sucks is lots of parallel resets, but we mostly avoid storms there by cleaning at the end of builds rather than at the start
[05:35] <cjwatson> Which AIUI tends to spread things out a bit
[05:35] <pitti> *nod*; good to keep that option in mind
[05:36] <cjwatson> pitti: The new infra is certainly noticeably higher-performance than what we used to have, which took something like 20x the number of machines
[05:36] <pitti> cjwatson: ci.debian.net currently doesn't have any parallelism at all; it's a single machine which does everything and just runs amd64
[05:36] <pitti> cjwatson: but I suppose Debian's buildds couldn't take the load
[05:36] <pitti> cjwatson: I've been developing this in close coordination with terceiro to have something that works for both D/U
[05:37] <pitti> cjwatson: (if he's at debconf, say hello :) )
[05:37] <cjwatson> Debian's buildd network has rather less capacity than ours for the most part, and wanna-build isn't at all set up to be able to dispatch non-package-build jobs
[05:37] <cjwatson> I stopped into the last bit of his debci talk today, though missed most of it
[05:37] <cjwatson> Haven't actually spoken to him yet though
[05:38] <cjwatson> I actually do think doing it from the buildd network would be an even better idea for Debian as it is for Ubuntu, since Debian is often operating under tighter hardware constraints and that's exactly when you need to consolidate your resources
[05:38] <cjwatson> But I don't have time to implement it in Debian, so it's just talk :)
[05:39] <pitti> it'd certainly settle the "multiple architectures" aspect, although on most you can probably just get LXC; but that gives you 95% of what we need, and the remaining 5% can then just run on x86 (not a biggie)
[05:41] <cjwatson> Though for Ubuntu we'd need to have non-x86 working in scalingstack before we get that benefit.
[05:41] <cjwatson> (Which is on the roadmap, but I don't know when)
[05:42] <pitti> for armhf/ppc64el we can probalby just continue to use the bunch of "static" machines that we have; we shouldn't lock ourselves into migrating them all at once
[05:44] <errekerre> wolas
[05:45] <errekerre> saveis de linux?
[05:46] <errekerre> primero savies español?
[05:46] <errekerre> saveis?
[05:46] <pitti> errekerre: sorry, this is an English channel
[05:47] <errekerre> valla por dios
[05:47] <errekerre> alguna sala ke hablen español?
[05:55] <infinity> pitti: It's not really a lie, it's a snapshot of 14.10
[05:55] <pitti> infinity: it's called 14.09 in Launchpad..
[05:56] <infinity> pitti: Sure, but whatever.
[05:56] <pitti> so by not calling it what it is in LP we can't match it to bugs/crashes/etc.
[05:56] <infinity> pitti: There are build systems that define their behaviour based on version and codename, you're opening a can of worms by suddenly defining a new one.
[05:56] <pitti> and "Ubuntu RTM" doesn't have a 14.10 or utopic release
[05:57] <pitti> infinity: well, I didn't start with this -- we defined a wholly new distribution for this (not just a new Ubuntu release)
[05:57] <pitti> and that wholly new distribution doesn't have "utopic" or "14.10" releases..
[05:57] <pitti> so it doesn't make sense to talk about Ubuntu RTM utopic
[05:57] <infinity> pitti: Yeah, I know, but it's a short-lived fork (or bloody well should be), not something we should be adjusting tooling and potentially a bunch of packages for.
[05:58] <infinity> GCC, for instance (to pick one at random) picks its defaults based on release codenames.
[05:58] <pitti> infinity: hm, we spent weeks on fixing all our tools (LP, buildds, ddebs, retracers, transaltions, langpacks, etc) for a second distro
[05:58] <infinity> Possibly bugs in packages that do so, but a lightweight fork for a pre-release freeze isn't the time to find those bugs.
[05:59] <infinity> pitti: We fixed infrastructure, but are you prepared to find all the packages that will break if rebuilt against a base-files with a weird unknown distro?
[05:59] <pitti> we already found bugs because it's wrong -- add-apt-ftparchive, apport, etc.
[06:00] <zyga> infinity: hi
[06:00] <pitti> infinity: well, perhaps we should have considered that before we put that giant pain of "maintain a parallel distro" upon us?
[06:00] <pitti> instead of at least just a new ubuntu release
[06:00] <zyga> infinity: we've finished automatic testing of 3.2 yesterday but we got a few issues that needed to be re-checked
[06:00] <pitti> infinity: now it's broken either way
[06:00] <zyga> infinity: I'm just checking if we have the results of that
[06:00] <infinity> pitti: Really, modulo a few small forks, it *is* utopic/14.10 (though a pre-release), could we not just figure out how to make apport add an extra key only in the rtm version?
[06:01] <infinity> pitti: ie: patch apport in ubuntu-rtm instead of changing base-files and trying to find all the package bugs?
[06:01] <pitti> infinity: well, we could -- but like you said, are you prepared to find out everything which is now broken due to that? :-)
[06:02] <infinity> pitti: apport/whoopsie/daisy seems like a much smaller set of things to twiddle than "every package that we don't know might be broken".
[06:02] <pitti> e. g. apt thinks its package origin is "Ubuntu RTM"
[06:02] <pitti> infinity: how do you know that in advance?
[06:02] <pitti> infinity: if there's lots of packages broken by correcting os-release, then there's at least as many packages potentially broken by not fixing it
[06:02] <pitti> we can't know
[06:02] <infinity> pitti: Err, what would be broken by not changing it?
[06:03] <pitti> infinity: yes, we can certainly hack apport to hardcode "Ubuntu RTM" "14.09" and ignoring /etc/os-release, and do that for other things like add-apt-ftparchive too
[06:03] <pitti> but we have to find all those
[06:03] <infinity> How does add-apt-ftparchive relate?
[06:03] <pitti> infinity: adding a PPA can't add ubuntu/utopic, it has to add ubuntu-rtm/14.09 apt sources
[06:03] <infinity> And I assume you mean add-apt-repository?
[06:04] <pitti> err, yes
[06:04] <pitti> not sure whether aptdaemon looks at this
[06:04] <infinity> Do we really expect people to build a PPA community around this release whose entire raison d'etre is to not exist very long?
[06:04] <pitti> infinity: we already do -- they are called Ubuntu RTM silos, and people test them all the time
[06:05] <infinity> AFAIK, the goal is to drop it like a hot potato and move to utopic as soon as we can.
[06:05] <pitti> infinity: I certainly hope we can drop it again
[06:05] <pitti> infinity: but then I don't understand why we spent so many manweeks of making sure that our infrastructure can get along with that separate distro
[06:05] <pitti> it seems like an awful lot of investment for something which just lasts for a few weeks
[06:05] <infinity> The silo argument is a developer workflow thing, that's a little less interesting, as I'd hope our developers know how to add apt sources without a magic tool.
[06:06] <infinity> pitti: It was the lesser of many potential evils.  *shrug*
[06:07] <pitti> so how do we tell if we are running on RTM/14.09 or Ubuntu/Utopic without os-release/lsb-release?
[06:07] <infinity> pitti: apport could know by being forked.
[06:07] <pitti> infinity: you could say the same about gcc..
[06:07] <infinity> pitti: Yes, but gcc doesn't need to know, does it?
[06:07] <pitti> or aptdaemon, or add-apt-repository, or what not
[06:08] <infinity> pitti: You're trying to solve a specific problem here (you want bucketing to be different, though I'm not sure I actually see much value in that).
[06:08] <pitti> the entire point of having a correct os-release is that all our tools can look at it and not hardcode stuff
[06:08] <pitti> infinity: you said above that changing os-release breaks it?
[06:08] <infinity> "it"?
[06:08] <pitti> gcc
[06:09] <pitti> infinity | GCC, for instance (to pick one at random) picks its defaults based on release codenames.
[06:09] <infinity> I said it could potentially do so.  I'd have to look at debian/rules to be sure.
[06:09] <infinity> It'll break llvm.
[06:10] <pitti> so we need to grep RTM for usage of /etc/os-release, /etc/lsb-release, and lsb_release, and fix/hardcode stuff to use RTM/14.09 instead where appropriate
[06:10] <infinity> It might even break llvm at runtime, actually, which is even more fun than breaking gcc at build time.
[06:13] <infinity> pitti: I'm not sure I have the energy to discuss this tonight, but fundamentally, it feels like a waste of time to patch in support for a release that isn't going to get any ongoing support.
[06:14] <pitti> infinity: I don't know about the support; it just feels like we have wasted the previous weeks if RTM isn't a longer-lived thing
[06:14] <infinity> If it's a longer-lived thing, we failed miserably to achieve our goals.
[06:15] <infinity> The point of the fork was just to give the rtm people a stable base to work against because their release target is a little over a month before Ubuntu's.
[06:15] <infinity> Once utopic is out, RTM shouldn't be a thing anymore.
[06:15] <infinity> And, indeed, it should upgrade to released utopic.
[06:15] <pitti> infinity: so, don't get me wrong -- I'm ok with hardcoding stuff in apport and ignoring os-release if that's the smaller amount of work; the thing that just makes me grumpy is that now RTM is going to go away in a few weeks?
[06:16] <infinity> pitti: Probably more than a few weeks, I imagine people will iterate on it for a little while, but it should go away shortly after utopic is released unless we're doing something wrong.
[06:16] <infinity> pitti: Maintaining a long-term fork is something we don't have the resources to do, and we'd be crazy to try.
[06:16] <pitti> it was so much pain and effort to get that working, and doing an entirely new release seems absolutely pointless then
[06:17] <pitti> s/release/distro/ I mean
[06:17] <pitti> a new release would have been right
[06:17] <infinity> pitti: The other option was snapshot the entire archive and maintain it without the help of our usual tools, either in an external repo or a giant PPA of doom.
[06:17] <infinity> pitti: The derived distro path was more elegant.
[06:17] <pitti> but that's what we call a distro release..
[06:17] <infinity> pitti: A new release causes some issues in that our release model is linear.
[06:18] <pitti> -updates?
[06:18] <infinity> Err, how would that help?
[06:18] <pitti> and 14.09 is before 14.10, so no problem there
[06:18] <pitti> and if people want to backport fixes from utopic to 14.09, there's 14.09-updates
[06:18] <pitti> anyway, this is all moot now
[06:18] <infinity> You mean everything targeted to 14.10 from now until release would go in updates, so as to not break the release pocket?
[06:18] <infinity> Cause, ew.
[06:18] <pitti> err, no?
[06:19] <pitti> utopic wouldn't change, but some fixes we might want in the 14.09 release, and then utopic is the next release after 14.09
[06:19] <infinity> That would be two open development releases.
[06:19] <pitti> I don't see how it is any different than trusty/utopic, except that it's not 6 months apart but 5, then 1
[06:19] <infinity> Which breaks the world.
[06:20] <infinity> Or opening, forking, and immediately "releasing" 14.09, and then doing everything as SRUs.
[06:20] <pitti> and 14.09 would be stable now, and can receive SRUs
[06:20] <infinity> That might have worked.
[06:20] <infinity> But we didn't do it.
[06:20] <infinity> So, whatever. :P
[06:20] <infinity> It also lends even more legitimacy to an illegitimate thing.
[06:21] <infinity> Like, we wouldn't want it on archive.u.c, etc.
[06:22] <infinity> pitti: Anyhow, I agree it's all a mess, but it's the best mess we could come up with when pressed for options.
[06:22] <pitti> so now we have two entirely different distros and releases which both claim to be Ubuntu 14.10
[06:22] <pitti> I guess an RTM archive grep is in order then
[06:23] <infinity> pitti: My only advice on the matter is to not put too much effort into trying to make it messier.  The RTM archive should focus mostly on polishing the user experience as much as it can, the rw/apt-ppa/etc experience might not need to be perfect.
[06:44] <zyga> infinity: 3.2 is good to go
[06:55] <dholbach> good morning
[07:01] <infinity> zyga: Ta.
[07:53] <lifeless> pitti: DPMT ftw
[07:53] <pitti> lifeless: I'm not officially in that team
[07:53] <pitti> hence I'd rather ask before
[07:53] <lifeless> pitti: ah
[07:54] <lifeless> pitti: so I *think* we moved it into that team
[07:54] <pitti> lifeless: yes, it is
[07:54] <lifeless> pitti: so - long as you follow the protocol (commit metadata to svn etc) I'm cool with you doing occasional uploads; I can't speak for the whole team of course
[07:55] <pitti> lifeless: yes, of course I'd use the svn
[08:41] <pitti> lifeless: uploaded and committed to svn
[08:46] <lifeless> pitti: cool
[09:53] <tjaalton> hrm, sbuild post-build phase triggers some security warning on trusty, started recently..
[09:54] <tjaalton> "problem with defaults entries"
[09:54] <tjaalton> maybe messed things up myself, but no idea where
[09:55] <Riddell> any ppc64el experts able to take a look at sflphone?
[09:56] <Riddell> https://launchpad.net/ubuntu/+source/sflphone/1.3.0-1ubuntu3/+build/6263494/+files/buildlog_ubuntu-utopic-ppc64el.sflphone_1.3.0-1ubuntu3_FAILEDTOBUILD.txt.gz
[10:04] <flexiondotorg> cjwatson, May I DM you quickly?
[10:05] <flexiondotorg> Not a support request.
[11:02] <mardy> Laney: hi! Can I bug you for bug 1029289?
[12:18] <Riddell> who looks after usb-creator these days? there's a fix just been proposed
[12:20] <seb128> Riddell, nobody
[12:20] <Riddell> yes I susected as much
[12:21] <Riddell> this seems like quite a failing of us all :(
[12:22] <seb128> Laney said he would be interested to look at some of the issues on it iirc (but I might be wrong), I don't think he's interested to become officially maintainer/reviewer though ... maybe check with slangasek if foundation has somebody would could take over it instead of xnox (might need to wait for them to hire somebody)
[12:25] <sergiusens_> pitti: flo is the codename for the "nexus 7 2013 wifi" device
[12:26] <sergiusens_> pitti: the hint is not being applied when looking at the object path for the blocks over udisks2
[12:52] <Saviq> sergiusens_, hey, ciborium is really moody about failing to add a storage device... that I don't have any in the first place
[12:52] <ogra_> moody ?
[12:52] <ogra_> for me it fails all the time on first attempt after reboot
[12:53] <ogra_> but then pisk up the SD just fine
[12:53] <ogra_> *picks
[12:53] <Saviq> ogra_, yeah, I don't have any SD
[12:54] <Saviq> ogra_, and mako doesn't even *do* SD
[12:54] <sergiusens_> Saviq: flo?
[12:54] <Saviq> sergiusens_, mako and krillin
[12:54] <Saviq> ogra_, it's messing with our autopilot tests
[12:54] <sergiusens_> Saviq: really? mako?
[12:54] <sergiusens_> Saviq: image?
[12:55] <Saviq> sergiusens_, https://docs.google.com/a/canonical.com/file/d/0B32jwBcbaPloRGo5OGd3MmhhQVk/edit
[12:55] <sergiusens_> Saviq: not supposed to display anything on mako
[12:55] <Saviq> sergiusens_, latest devel-proposed
[12:55] <sergiusens_> Saviq: rtm?
[12:55] <Saviq> sergiusens_, no, devel-proposed, but the same happens on krillin rtm
[12:56] <Saviq> sergiusens_, r213 mako
[12:56] <sergiusens_> Saviq: I've been running on mako for a week before this landed
[12:56] <Saviq> sergiusens_, rtm@r4 on krillin
[12:57] <Saviq> sergiusens_, run the unity8 ap test, maybe that triggers it (it restarts unity8 repeatedly)
[12:57] <sergiusens_> Saviq: you can "stop ciborium" to get it out of the way; I'm going to need some logs
[12:57] <sergiusens_> Saviq: is this on the ci dashboard?
[12:57] <Saviq> sergiusens_, not *yet*
[12:58] <Saviq> sergiusens_, local run
[12:58] <Saviq> sergiusens_, but I've seen it on our ci too
[12:58] <sergiusens_> Saviq: can you pass me the /home/phablet/.cache/upstart/*ciborium* files in there
[13:00] <Saviq> sergiusens_, https://drive.google.com/drive/#folders/0B32jwBcbaPloWGVVZmE3dkdlM0k
[13:01]  * Saviq abuses drive, wonder if it will allow downloading as an archive
[13:02] <Saviq> nope
[13:02] <Saviq> sergiusens_, http://people.canonical.com/~msawicz/ciborium/ as well
[13:03] <Saviq> sergiusens_, looks like it's trying *all* the unmounted mmc partitions
[13:04] <sergiusens_> Saviq: it's like the udev rule isn't hitting you
[13:04] <Saviq> sergiusens_, does ciborium ship a rule?
[13:04] <sergiusens_> Saviq: no, lxc-android-config does
[13:05] <sergiusens_> Saviq: gdbus introspect --system -p -d org.freedesktop.UDisks2 -o /org/freedesktop/UDisks2/block_devices/mmcblk0 /org/freedesktop/UDisks2/block_devices/mmcblk0p2 | grep System
[13:05] <sergiusens_> Saviq: that should return true
[13:05] <sergiusens_> well, not return, be
[13:05] <Saviq> sergiusens_, false
[13:05] <Saviq>       readonly b HintSystem = false;
[13:06] <sergiusens_> Saviq: grep UDISKS_SYSTEM /usr/lib/lxc-android-config/70-mako.rules
[13:06] <sergiusens_> on mako
[13:07] <Saviq> sergiusens_, ACTION=="add", KERNEL=="mmcblk0*", ENV{UDISKS_SYSTEM}="1"
[13:08] <sergiusens_> Saviq: bah; ogra_ is udev racy?
[13:09] <sergiusens_> Saviq: in all my reboots I haven't seen this problem on mako nor krillin
[13:09] <Saviq> sergiusens_, lemme reboot then
[13:10] <sergiusens_> Saviq: still, you shouldn't be getting that
[13:10] <Saviq> sergiusens_, now it's true
[13:11] <ogra_> sergiusens_, it shouldnt be ... though note that we shut it down while bringing up the container during boot
[13:11] <sergiusens_> hmm, so there is a race somewhere; I wonder where...
[13:11] <Saviq> sergiusens_, let me see if it becomes false during my test run or is it sometimes on reboot
[13:11] <ogra_> (but you know that)
[13:13] <Saviq> sergiusens_, it was my first boot after flashing I think, will try that hypothesis too in a moment
[13:27] <sergiusens_> ogra_: yeah; I might do something hackish and ignore mmcblk0 if this becomes a huge issue and can't figure out that race
[13:27] <sergiusens_> Saviq: first boot indeed takes longer; was it with a wipe?
[13:27] <sergiusens_> I'll give that a go
[13:28] <ogra_> sergiusens_, yeah, i guess it is safe to ignore the rootfs device ... i wouldnt blindly take mmcblk0 though but check where / lives
[13:28] <Saviq> sergiusens_, no wipe, no
[13:28] <sergiusens_> click hooks ran though?
[13:40] <Saviq> sergiusens_, everything else seemed fine
[13:49] <Saviq> soo... I've my home on SSD{ btrfs{ cryptfs } }, whenever I'm flashing the phone, or doing something with large files, I get iowaits, some apps would go grey for a few seconds... who can get me ideas how to debug this?
[13:49] <Saviq> sergiusens_, can't get it to be false any more of course
[14:00] <sergiusens_> Saviq: just when you want things to go wrong, they go right :-P
[14:02] <Saviq> sergiusens_, story of my life
[14:02] <Saviq> wait
[14:02] <Saviq> ;)
[14:06] <sergiusens_> I'll wait
[14:30] <sergiusens_> Saviq: even if you can't repro, can you ubuntu-bug ciborium and lxc-android-config with this info?
[14:35] <Saviq> sergiusens_, will do
[15:02] <tedg> jodh, Was going to try to build your cgroup fix for upstart, is there an easy way to get a deb for it?
[15:02]  * tedg is spoiled by bzr bd
[15:02] <tedg> :-)
[15:06] <jodh> tedg: autoreconf -fi && ./configure && make. Then tweak /usr/bin/ubuntu-touch-session to run 'exec /path/to/your/init/init --debug --user 1>&2'
[15:07] <jodh> tedg: note that we still need to fix systemd-shim though.
[15:07] <tedg> jodh, Ah, cool, I think I got the deb building.
[15:08] <tedg> jodh, We modified the cgmanager upstart job so that it'll start before the shim.
[15:09] <jodh> tedg: yeah I know, but see the last sentence in my comment: https://bugs.launchpad.net/ubuntu-app-launch/+bug/1357252/comments/34
[15:09] <jodh> tedg: hallyn tells me that desrt|pdx is the guy we need to poke with a stick
[15:11] <tedg> jodh, I think we're okay in that we clean up the cgroup in the application job today.
[15:11] <tedg> jodh, We make sure all the processes are gone in post-stop
[15:12] <jodh> tedg: my understanding from discussing with hallyn yesterday was that systemd-shim is cleaning up the cgroups (via cgmanager) before the overall job has finished with them.
[15:15] <tedg> jodh, Because they were set as remove-on-empty, but we're not setting that until the job is done now, right?
[15:16] <jodh> tedg: upstart isn't setting that until the job has completed, correct. However, systemd-shim sets remove-on-empty at the logind session level and the upstart session-init inherits that. It's not something upstart can deal with - it's a bug/limitation/feature of systemd-shim :)
[15:18] <tedg> jodh, I'm a bit confused, so wouldn't that only effect the overall session cgroup? Or does it mean that every cgroup in that session cgroup has the remove-on-empty property set?
[15:18] <jodh> tedg: I believe so. hallyn ?
[15:19] <SonikkuAmerica> Is there an instruction list for what ubiquity does when it installs and then removes itself from the target system? ubiquity installed everything, then crashed installing GRUB 2. I booted the partition by hand, and now I need to know what packages to clean up. (This is Utopic, by the way)
[15:21] <SonikkuAmerica> (Or is this a support question for #ubuntu+1 ?)
[15:24] <jodh> tedg: anyway, what you'll find is that my branch works sometimes, but you'll see the occasional "failed to start job" error and if you look in /var/log/upstart/cgmanager.log you'll see entries like: http://paste.ubuntu.com/8179550/
[15:25] <jodh> tedg: note the uid of the removal requestor - it's not the session init, it's systemd-shim.
[15:25] <jodh> tedg: well, actually it's cgmanager.
[15:26] <tedg> Hmm, okay.
[15:26] <tedg> Let's see how it does on the uitoolkit tests.
[15:31] <jodh> tedg: I've raised bug 1363134 (unclear if bug 1355966 is supposed to cover the same request?)
[15:32] <Laney> mardy: there's a new evo/e-d-s 3.12 point release that I want to get in soon-ish
[15:33] <jodh> tedg: an alternative may be to have cgmanager grow a unset-remove-on-empty verb, but that's just horribly fugly.
[16:00] <shadeslayer> could someone explain why I can't write /etc/sddm.conf like this : http://paste.kde.org/pmy4cqrdp
[16:04] <directhex> shadeslayer: because of what you are and aren't sudoing
[16:04] <shadeslayer> right
[16:05] <directhex> shadeslayer: you are only sudoing "sudo log-output -t user-setup chroot /mnt $ROOT cat <<EOF" - the "> /etc/sddm.conf" is done by the host shell
[16:05] <shadeslayer> mhm
[16:07] <frezix> hi, I'm doing a netinstall and I'm now at this stage - http://imgur.com/ba9qcCc - how do I remove that partitioning scheme and start fresh?
[16:08] <frezix> I've asked this in #ubuntu also but no one seemed to have an answer
[16:08] <shadeslayer> directhex: any recommendations on how to fix it
[16:09] <directhex> use "| sudo tee filename" as your output mechanism for things
[16:10] <shadeslayer> directhex: yeah tried that, didn't work
[16:11] <shadeslayer> writes to host machine
[16:11] <shadeslayer> so I used sh -c
[16:11] <shadeslayer> but now I'm unsure how to extract the username from the host
[16:11] <directhex> add speech marks around what you want to sh -c ?
[16:11] <shadeslayer> s/host/target/
[16:21] <hallyn> tedg: jodh: remove-on-empty is inherited on mkdir from the parent cgroup
[16:21] <tedg> hallyn, Can Upstart change that?
[16:21] <hallyn> tedg: not currently
[16:22] <hallyn> tedg: I'm not sure whether ti's better to provide an API method for that in cgmanager, or to just fix systemd-shim
[16:22] <hallyn> i think the latter
[16:22] <jodh> hallyn: +1.
[16:22] <tedg> I guess it seems like a reasonable default to me for systemd to set.
[16:22] <hallyn> i just wish someone more competent would do it :)  but maybe if i can get rharper to look at the qcow2 corruption i can fast-track the systemd fix
[16:23] <hallyn> tedg: no, i tshouldn't be needd, logind does ask fo cleanup
[16:23] <hallyn> we just set remove-on-empty bc we ignore the cleanup requests
[16:23] <hallyn> it's purely a dbus callback issue, *should* be simple
[16:23] <rharper> hallyn: stop the presses! qcow2 has data corruption issue ?
[16:23] <tedg> hallyn, Oh, okay.
[16:23] <hallyn> rharper: holy cow.  yes, since 1.7
[16:24] <hallyn> rharper: it's killing me!
[16:24] <rharper> hallyn: that's why we never approved qcow2 use while I was at the LTC for IBM products
[16:24] <rharper> we pushed in QED as a format because qcow2 had so much trouble;  it's improved significantly, but with all of the features... hard to ensure you've found all of the issues
[16:24] <hallyn> rharper: sorry, another meeting, if you're willing to look at it i'd *love* to talk to you in a bit about it
[16:25] <hallyn> rharper: but no, this is regression.  it was fine in 1.5
[16:25] <hallyn> that's why security team (jdstrand and sarnold) always run the old 1.5 version!!!  crazy
[16:25] <rharper> if it's  regression I'd think we could lean on kwolf and stefanha in #qemu once we have a proper test case/reproduce
[16:26] <rharper> hallyn: going to head out in a bit, but post lunch we should chat...
[16:29] <doko> jibel, pitti: the python2.7 and python3.4 autopkg tests show regressions, versions which suceeded before. any changes in the test setup?
[17:39] <cjwatson_> flexiondotorg: If you're going to send a private message, just do so.  Don't ask first in a channel and then wait for timezones to match before I can respond.
[17:46] <hallyn> rharper: so there's a few open bugs on the same issue, but it's basically https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1292234
[17:47] <hallyn> rharper: i have NOT reproduced it with his testcase msyelf.  what i HAVE had is two (or more) regular uvt-kvm vms go corrupt as i was using htem heavily (in one buildint-testing libvirt many times, in another lxc)
[17:47] <hallyn> in the latest one, i suddenly found that ~/build3 was actually poinging to the code in ~/build2 (!).  Then I tried to reboot to see if it would fix it, but it wouldn't reboot
[17:48] <hallyn> so if i could only reliably reproduce then i would bisect.  but i can't reliably reproduce.  only when i'm trying ot get something done, after hours of work :)
[17:48] <hallyn> i'm going to try just almost-filing the disk right now to see if htat is the trigger.
[17:48] <hallyn> rharper: when yo uget back we can either chat here or do a hangout.
[17:49] <sarnold> hallyn: good idea, I seem to recall at least one of my qcow2s was huge
[17:52] <hallyn> add 3G of /dev/zero, leaving 1.4G free, let's see how that works
[17:52] <hallyn> hopefully qcow2 doesn't compress it all away :)
[18:08] <cjwatson> Riddell: sflphone> That looks like it just needs a config.guess/config.sub update in that subdirectory.
[18:39] <jdstrand> hallyn: you are using snapshots, correct? eg:
[18:39] <jdstrand> $ qemu-img info ./sec-utopic-amd64.qcow2
[18:39] <jdstrand> image: ./sec-utopic-amd64.qcow2
[18:39] <jdstrand> file format: qcow2
[18:39] <jdstrand> virtual size: 8.0G (8589934592 bytes)
[18:39] <jdstrand> disk size: 11G
[18:39] <jdstrand> cluster_size: 65536
[18:39] <jdstrand> Snapshot list:
[18:39] <jdstrand> ID        TAG                 VM SIZE                DATE       VM CLOCK
[18:39] <jdstrand> 1         pristine                  0 2014-08-25 18:15:09   00:00:00.000
[18:40] <hallyn> jdstrand: i'm using your exact snapshotting recipe, with the forhallyn.img you provided, yes.  tha thasn't worked.  what has resulted in corruption for me is simple vms using qcow2 with a backing file (but no snapshots)
[18:44] <hallyn> zul: hm, getting a bunc hof failures and test skips, looks like virtinst problem.  i dn't see a python-libvirt 1.2.7, could that be the problem?
[18:45] <zul> i dont have libvirt-python 1.2.7 yet
[18:45] <zul> gimme a sec
[18:47] <jdstrand> hallyn: interesting. I hope there aren't two bugs-- one in bs and one in snap...
[18:47] <rharper> hallyn: looking at the bug... once I'm up to speed, let's talk
[18:48] <rharper> jdstrand: hallyn one question on the images -- is it only reproducable using the images created back on 1.5 with newer qemu ?
[18:48] <hallyn> jdstrand: yeah, it's not impossible.  also i'm back runnign with ksm, so it's possible that my bs bug ends up being a ksm bug
[18:48] <hallyn> rharper: no i don't thin kthat's the case.  pretty sure jdstrand can ruin any new image he creates
[18:49] <sarnold> :)
[18:49] <rharper> ok, that's useful to know;
[18:49] <rharper> and we're always using the qemu/libvirt defaults for cache mode on the disk image, and the host filesystem ?
[18:50] <rharper> no ones doing fancy dev stuff like cache=unsafe and disabling barriers on their fs ?
[18:51] <hallyn> rharper: pretty sure not.  sarnold: uvt doesn't do any o fhtat right?
[18:51] <rharper> uvt can
[18:51] <hallyn> hell i'v eonly had failures with the defaults!
[18:51] <hallyn> when i run kvm by hand i'v enever had a failure, and by hand i always do 'cache=unsafe' :)
[18:51] <rharper> using uvt or just libvirt ?
[18:51] <sarnold> rharper: the security team uses marc's uvt, not robie's uvt..
[18:51] <hallyn> using uvt
[18:51] <hallyn> no, uvt-kvm
[18:51] <hallyn> *I* us uvt-kvm, security team uses uvt
[18:52] <rharper> ok, then someone who knows uvt needs to help specify what cache mode is being selected (or if none at all in qhich case we get cache=writethrough
[18:52] <sarnold> I don't see any "unsafe" in /etc/libvirt/**
[18:52] <rharper> as well as file systems in use and any uptes
[18:52] <hallyn> sarnold: filling up a bunch of disk didn't help yet.  lemme fill it up more and upgrade from t->u
[18:52] <rharper> sarnold: any cache= values?
[18:52] <hallyn> sarnold: ^ can you provide that info?
[18:52] <hallyn> (else i can d/l the script and dig)
[18:53] <sarnold> rharper: no "cache" in /etc/libvirt/** either
[18:53] <rharper> if someone has a ps aux output from a vm run by uvt, that'd be best
[18:54] <sarnold> rharper: my host filesystem is ext3, rw,relatime,data=ordered
[18:55] <rharper> k
[18:55] <rharper> can I get the sec team uvt anywhere ?
[18:55] <sarnold> rharper: command line: http://paste.ubuntu.com/8181058/
[18:56] <sarnold> rharper: there's a fair amount of other setup work necessary, all documented here: https://wiki.ubuntu.com/SecurityTeam/TestingEnvironment#preview
[18:56] <sarnold> sigh, of course leaving off the #preview :)
[18:58] <rharper> sarnold: and where do you get your original/pristine images?
[18:58] <rharper> ah, cd images
[18:58] <sarnold> rharper: 'uvt new' makes them from downloaded ISOs
[18:58] <rharper> for the releases
[18:58] <rharper> right
[18:59] <rharper> does some sort of auto install ?
[18:59] <sarnold> yeah
[18:59] <rharper> k
[18:59] <rharper> so the forhallyn image is presumable one of those pristine images
[19:00] <sarnold> it started as one, though I don't know what, if anything, might have been done to it along the way
[19:00] <sarnold> the two images I gave hallyn started that way but had been apt-get dist-upgraded several times, snapshotted/restored several times..
[19:05] <tvoss> pitti, you around?
[19:05] <jdstrand> rharper: see the end of the description-- it doesn't seem related to machine type. I'm not 100% sure I tried on 2.0 to create a machine and try to reproduce, but I think I did
[19:07] <jdstrand> rharper: here is typical xml: http://paste.ubuntu.com/8181138/
[19:08] <jdstrand> rharper: so we aren't specifying cache mode. host fs is ext4 for me, along with whatever is the default in the guest (ie, ext4)
[19:09] <jdstrand> rharper: the forhallyn image was generated using the standard security process with uvt, yes
[19:13] <rharper> jdstrand: thanks for the info,
[19:13] <rharper> brb, need to grab the kids
[19:17] <jdstrand> rharper: oh I had to have tried to create a vm on 2.0, otherwise I wouldn't have been able to test pc-i440fx-1.7 and pc-i440fx-2.0 (duh)
[19:17] <hallyn> so, 'git log --pretty=oneline v1.5.0..v1.7.0 block/qcow2 | wc -l' says only 69 commits.  i guess i'll actually look at each :)
[19:17] <hallyn> right and what you gave me was pc-i44fx-2.0
[19:17]  * jdstrand nods
[19:18] <jdstrand> I'm highly motivated to have this fix. the trick is that I rely on qemu so heavily I'm not always in a place where I can lose my VMs
[19:19] <jdstrand> (eg, if I am doing dev work or security work and using vms, I can't really test new versions on that day)
[19:30] <hallyn> maybe the key here is looking at the type sof things which those 69 commits change, and trying to write testcases for edge cases there
[19:32] <jdstrand> hallyn: we can do bisects
[19:32] <hallyn> jdstrand: alternatively, perhaps i can promote the bisecting by pushing a set of binaries (say 10-20/day) with which you can start bisecting :)
[19:32] <hallyn> jdstrand: *I* can't :(
[19:32] <hallyn> we need to reliably reproduce.
[19:33] <jdstrand> well, if you produced binaries that included commit 35 of those 69, sarnold and I could try to repoduce
[19:34] <hallyn> kinda got my eye on the commit "qcow2: Batch discards"
[19:34] <jdstrand> it just might be slow going on some of the feedback
[19:34] <hallyn> jdstrand: yeah, it might be worth doing.  the only bitch of it is that it destroys th eidea of 'bisecting', as we have to build a lot more than log_2(N) binaries :)
[19:35] <jdstrand> funny thing is, it occurred to my I should really be using precise's qemu instead of saucy, since it is actually security supported :)
[19:35] <hallyn> heh.
[19:35] <jdstrand> hallyn: well, could just do bisects relative to those 69 commits
[19:35] <hallyn> that's what i use on my big server
[19:35] <hallyn> yeah
[19:36] <hallyn> ok what the heck i'll build+publish them.  you'll just need a few tweaks to your apparmor policy, but i think you can handle those :)
[19:36] <hallyn> oh no yo udon't, you'll just install under /usr/bin.  duh
[19:37] <jdstrand> hallyn: if we get to the point where we know which of the 69 first had it, then we can bisect further on the other commits between the last good of the 69 and the first bad of the 69
[19:37] <hallyn> yeah.  let me script up the building of those binaries real quick
[19:38] <hallyn> (haha, what are the odds they'll build easily enough to script)
[19:38] <jdstrand> yeah, I was wondering how easy that would be
[19:38] <jdstrand> it is one thing having a plan, it is quite another implementing it :)
[19:45] <hallyn> jdstrand: trying http://paste.ubuntu.com/8181365/
[19:45] <hallyn> that *should* let me fix it if there is a build failure.
[19:46] <hallyn> guess i need a 'git reset --hard HEAD' at top of loop so the git checkout master wlil succeed
[19:46] <hallyn> (if it fails)
[19:46] <hallyn> first binary delivered
[19:46] <hallyn> think i'm gonna go get some coffee and check on this when i get back
[19:46] <sarnold> no libraries needed?
[19:46] <rharper> jdstrand: qcow2 corruption is rarely related to machine type, rather it's likely to be one of the many feature flags;  I've not yet found a tool that dumps the flags, but they matter w.r.t behavior, in particular, I'm looking at lazy_refcount which delays metadata updates of qcow2 (which is wehre 99.9999% of corruption comes from)
[19:47] <hallyn> sarnold: hm.  i don't think so.  i was testing without those yesterday
[19:47] <rharper> the compat mode determines what features accessible based on which qemu version you'r running with;  pre 1.0, 1.0, 1.1, and on up.
[19:47] <hallyn> rharper: that suggests that the commit "qcow2-refcount: Repair shared refcount blocks" might be to blame
[19:48] <rharper> have we tried qemu-img repair on these iamges?
[19:48] <rharper> ie the non-bootable ones?
[19:48] <rharper> next I was goign to run the qemu-img check on each one to see if it detected metadata errors
[19:49] <hallyn> sarnold: you have a corrupted one sitting someplace rharper can get to it right?
[19:49]  * hallyn REALLY needs some coffee, biab
[19:49] <rharper> it will be interesting to see what format of level (qemu-img info file) these are, versus what gets created on new systems
[19:49] <jdstrand> rharper: right-- I was simply saying that I couldn't have created/run a 2.0 machine type on 1.5, that I must've used 2.0 for that, and since I am otherwise using defaults, that info might be helpful to you
[19:50] <frezix> hi, I'm doing a netinstall and I'm now at this stage - http://imgur.com/ba9qcCc - how do I remove that partitioning scheme and start fresh?
[19:50] <rharper> jdstrand: ok -- but machine type has nothing to do with the qcow2 image, at best it's machine type could suggest what level of qemu you have
[19:50] <sarnold> hallyn,rharper, yeah, lillypilly.canonical.com:~sarnold/sec-precise-amd64.qcow2.bz2.sig (and the file without .sig) and sec-trusty-amd64.qcow2.gz.sig (and the file without .sig)
[19:50] <rharper> but they're independent
[19:50] <frezix> (Is this the right channel to ask this also?)
[19:50] <rharper> sarnold: can I get that from chinstrap ?
[19:50] <sarnold> rharper: I believe so
[19:51] <sarnold> frezix: try arrow-down to see if the individual partitions can be selected?
[19:52] <rharper> sarnold: fetching... thanks
[19:52] <jdstrand> rharper: right, that is all I was trying to suggest to you :)
[19:53] <rharper> they typical pattern is images created on older qemus tend to show corruption with newer qemus because bugs have been fixed in newer qemu, but their behavior is different.
[19:53] <rharper> which I think is exactly what was originally seen (older 1.5 images, upgrade to trusty, boom)
[19:54] <rharper> the more interesting one is if you can create new iamges on the latest qemu, and see the same corruption; which it sounded like, but not clear to me if that's confirmed.
[19:54] <jdstrand> rharper: that is confirmed
[19:54] <rharper> one possible answer is that the newer images are created in compat mode, versus the latest
[19:54] <jdstrand> I am not being clear
[19:54] <frezix> sarnold: these are the 3 options when I select individual partitions - http://imgur.com/mrObbpD - note that when selecting "Erase data on this partition" it only erases the data instead of converting that partition to unallocated space.
[19:55] <jdstrand> I cannot create a vm with a -2.0 machine type unless I am running qemu 2.0. since I stated in the description that I specifically tested machine of different machine types, I *must* have used a newer qemu to create/run/corrupt the image
[19:56] <sarnold> frezix: dang.
[19:56] <jdstrand> the forhallyn image used 2.0
[19:56] <frezix> when I'm at that partition choosing menu, choosing "Undo changes to partition" also doesn't seem to have any effect. when I went into the "Configure encrypted volumes" option, this is what I got - http://imgur.com/01GCq8Q - which pretty much explains much of what I'm facing.
[19:56] <jdstrand> I used whatever defaults virtinst uses
[19:56] <rharper> jdstrand: ok -- but the last part of your statement matters, do you know if you *created* the image with newer qemu, or only observe the failure on newer qemu.
[19:56] <sarnold> rharper, off to lunch...
[19:57] <rharper> sarnold: k
[19:57] <jdstrand> rharper: I had to *create* on newer qemu otherwise the xml would have a -2.0 machine type
[19:57] <frezix> I'm now wondering how I can still remove those partitions (I do know the passphrase)
[19:58] <jdstrand> would not* have
[19:59] <jdstrand> rharper: see the description
[19:59] <jdstrand> qemu-kvm 2.0~git-20140307.4c288ac-0ubuntu2
[19:59] <jdstrand> qemu-img info ./forhallyn-trusty-amd64.img
[19:59] <jdstrand> image: ./forhallyn-trusty-amd64.img
[19:59] <jdstrand> file format: qcow2
[19:59] <jdstrand> virtual size: 8.0G (8589934592 bytes)
[19:59] <jdstrand> disk size: 4.0G
[19:59] <jdstrand> cluster_size: 65536
[19:59] <jdstrand> Format specific information:
[19:59] <jdstrand>     compat: 0.10
[19:59] <jdstrand> rharper: then see 'Steps to reproduce'
[19:59] <rharper> jdstrand: ok -- sorry for being obtuse, just trying to understand the failure scenarios w.r.t  image creation;
[20:00] <rharper> jdstrand: fair enough;
[20:00] <jdstrand> the step sto reproduce says I created the forhallyn vm with 2.0
[20:01] <jdstrand> I forgot I wrote that myself, so it didn't help the conversaton
[20:08] <frezix> damn wifi
[20:08] <frezix> did I miss anything?
[20:11] <frezix> so during a netinstall, if it's not possible to remove encrypted LVM partitions (even when passphrase is known), would this be considered a bug?
[20:30] <Laney> bah
[20:31] <desrt|pdx> Laney: hm?
[20:31] <Laney> offlineimap trashed my maildir
[20:32] <desrt|pdx> :(
[20:36] <rharper> sarnold: your corrupted image is odd;  none of the qcow2 metadata is corrupt, but the guest OS snapshot (and original) data is hosed;
[20:37] <rharper> jdstrand: in your original comment, when you say revert to 1.5, you mean for the whole sequence of operations;   (all of the create, snapshot, revert) etc... you can use newer libvirt but need to have older qemu ?
[20:38] <tedg> bdmurray, Do you know when the next whoopsie release is expected?
[20:39] <jdstrand> rharper: I just instal the old qemu 1.5 packages. I have to recreate the VMs that are corrupted. for the others, I usually adjust the libvirt xml to use a 1.5 machine type
[20:39] <rharper> right, ok
[20:39] <jdstrand> but I don't usually have to recreate them
[20:39] <jdstrand> I think that all works ok cause we are on compat 0.10
[20:40] <rharper> jdstrand: and the corrupted images don't boot any better in 1.5, correct ?
[20:40] <jdstrand> I'm not 100% sure I tested that
[20:40] <rharper> ok, with this sort of corruption, I would expect them to fail on 1.5 or anywhere
[20:40] <jdstrand> I think I may have always just recreated them
[20:41] <jdstrand> cause, like hallyn pointed out, the corruption seems to happen at precisely the time it isn't convenient and you are rushing to get back to a usable state :)
[20:41] <rharper> jdstrand: your local system does this at will, this recreates with your steps elsewhere ?
[20:41] <jdstrand> what do you mean by elsewhere?
[20:41] <jdstrand> another host machine?
[20:42] <bdmurray> tedg: I uploaded one today. Are you looking for one after that?
[20:42] <rharper> jdstrand: yes
[20:42] <rharper> another host machine
[20:42] <jdstrand> rharper: I have not, but sarnold has the same corruption
[20:42] <rharper> k
[20:43] <tedg> bdmurray, Heh, no I don't think so. You're just ahead of me.
[20:43] <bdmurray> tedg: I try
[20:43] <tedg> bdmurray, Thanks! Excited for moar data!
[20:46] <hallyn> jdstrand: sarnold: if you wanna start, the most recent 7 suspect commits are available at http://people.canonical.com/binaries.[0-6]
[20:46] <hallyn> with suspect being defined as 'a commit which affected block/qcow*"
[20:50] <jdstrand> hallyn (and sarnold): I guess you mean http://people.canonical.com/~serge/binaries.[0-6]
[20:50] <hallyn> yeah
[20:50] <hallyn> sorry
[20:50] <jdstrand> np :)
[20:51] <hallyn> i can't believ how smoothly the builds are going, only had to correct 2 so far (after disabling spice at least)
[20:51] <jdstrand> hallyn: it is because you are awesome
[20:53] <hallyn> saving that for my i-need-an-uplift days :)
[20:58] <hallyn> -12 pushed
[21:01] <rharper> jdstrand: sarnold have you tried virsh resume on those images?
[21:01] <rharper> instead of virsh start as I see in the instructions ?
[21:01] <rharper> internal snapshots contain guest ram that's not yet flushed to disk
[21:01] <sarnold> rharper: no, I almost never interact with 'virsh' directly
[21:01] <rharper> but the corrupt image you handed me was an internal snapshot
[21:02] <rharper> which means it holds both guest ram and disk data
[21:02] <rharper> hrm, but the steps have the guest shutdown...
[21:03] <rharper> that should be fine, ok
[21:03]  * rharper keeps moving
[21:04] <bdmurray> tedg: oh, I was wondering do we need ProcMaps for RecoverableProblems?
[21:05] <tedg> bdmurray, No, I don't think so.
[21:05] <sarnold> rharper: we quite often use "uvt stop -rf" to just quickly "yank the power" and revert to the snapshot, so next time we need it, it is 'clean' and ready to go -- I believe that works out to virsh "destroy" followed by virsh snapshot-revert
[21:05] <sarnold> rharper: probably when those images failed to boot, I just used uvt stop -f on them to kill them quickly but not revert the disk image
[21:08] <tedg> bdmurray, Can't wait for that to hit, I've got this recoverable error for bad appid, but I don't know which one. Going to be exciting to find out.
[21:09] <bdmurray> tedg: is there a betting pool?
[21:10] <tedg> Heh, I don't think that we could make one now because some people have access to the database and could know the results before they trickle through :-)
[21:10] <bdmurray> Are you calling me a cheater?
[21:11] <tedg> No, no, I was more worried about ev.
[21:11] <bdmurray> lol
[21:30] <rharper> jdstrand: sarnold: in your use-case, do you ever save disk state after the upgrade and shutdown?  as in, do you want to "merge" the updates applied back into the original disk ?
[21:30] <rharper> or do you always want to throw the delta from pristine away ?
[21:31] <jdstrand> rharper: that is a use case
[21:31] <jdstrand> rharper: basically, we do this
[21:31] <jdstrand> we generate i386 and amd64 vms for all supported releases
[21:31] <jdstrand> we use uvt to do that and use cd images
[21:31] <jdstrand> then we create the pristine snapshot
[21:31] <rharper> right, your base images
[21:32] <rharper> yep
[21:32] <jdstrand> yep
[21:32] <jdstrand> then, we go along and test security updates, etc
[21:32] <jdstrand> then will revert back to pristine
[21:32] <jdstrand> however, they get out of date after not too long
[21:32] <jdstrand> so we have an 'uvt update' command
[21:33] <rharper> that corresponds to the "boot after pristine snapshot, dist-upgrade;  which includes applying latest security updates"  ?
[21:33] <jdstrand> that reverts to pristine, starts the vm, apt-get sit-upgrades it, etc
[21:33] <jdstrand> then shuts it down cleanly
[21:33] <jdstrand> then does a new snapshot of the same name
[21:33] <rharper> but each time you want to start at pristine and then apply dist-upgrade latest ?
[21:33] <jdstrand> rharper: re correspons> yes
[21:34] <jdstrand> rharper: re each time> that is what our update command does, yes
[21:34] <rharper> are you ever interested in the state of the VM after dist-upgrade after you have shutdown the VM (but before reverting the image back to pristine) ?
[21:34] <jdstrand> rharper: but we are free to do vm configuration and then use 'uvt snapshot' to update the pristine image
[21:35] <jdstrand> rharper: we are interested in that state
[21:35] <rharper> if you don't actually want to inspect or boot the image after the updates are applied, I'd like mention -snapshot which writes deltas to a temporary file and is removed when the guest process exits, which would avoid any need to run snapshot commands on the image and would avoid touching the original file for writes at all
[21:35] <rharper> ah, ok
[21:35] <jdstrand> rharper: it is that state where we may update the pristine snapshot
[21:35] <jdstrand> rharper: also, sometimes we like to leave a vm in a certain state so we can come back to it later
[21:35] <rharper> maybe the names confuse me, you update doesn't commit the changes to the base, it removes them
[21:35] <rharper> by my reading of your steps in the bug...
[21:36] <jdstrand> for example, now I have a vm that is off but has loads of stuff for our apparmor landing in it
[21:36] <jdstrand> rharper: ah, well the bug only describes how I was able to trigger this
[21:36] <rharper> ok, but typically you'd want to keep the update around and merge the updates with the previous pristine, aka commit
[21:36] <rharper> commit my changes to snapshot called pristine
[21:37] <jdstrand> (it also wouldn't happen every time iirc; the steps I gave happen to trigger it most often
[21:37] <jdstrand> our 'update' command will do that
[21:37] <jdstrand> but we are free to 'uvt snapshot' whenever we want
[21:38] <jdstrand> let me get you the bzr branch
[21:38] <jdstrand> rharper: http://bazaar.launchpad.net/~ubuntu-bugcontrol/ubuntu-qa-tools/master/view/head:/vm-tools/uvt
[21:39] <jdstrand> rharper: you might also be interested in: https://wiki.ubuntu.com/SecurityTeam/TestingEnvironment#Snapshotted_virtual_machines
[21:39] <jdstrand> rharper: that more or less captures how we work
[21:39] <sarnold> rharper: (the convention for the uvt commands is a function defined like "cmd_snapshot" or "cmd_update")
[21:46] <rharper> jdstrand: so, update does:  revert to pristine (unless it doesn't exist);  boot vm do upgrade;  stop vm, delete snapshot named "pristine", then create a new one called "pristine" again...  which matches your bug commands
[21:46] <rharper> heres what I don't know;   you're deleting the current snapshot after applying changes to the image;  it's not clear to me exactly what that does w.r.t internal snapshots;
[21:47] <rharper> it seems strange to me to remove the previous snapshot before creating a new one and then just renaming.
[21:48] <jdstrand> I actually didn't right this bit of code, but it does work with 1.5
[21:48] <jdstrand> write*
[21:48] <sarnold> was it done that way to placate earlier libvirts?
[21:48] <xnox> infinity: where are you? =)
[21:48] <jdstrand> as in, I can't say why it is implemented the way it is. mdeslaur may have had a reason
[21:49] <jdstrand> sarnold: maybe? :)
[21:50] <rharper> in the external snapshot world (which IMO is far safer to use than internal snapshots since you never touch your original file)
[21:50] <infinity> xnox: Hiding.
[21:51] <rharper> your update would be:  remove the snapshot file; create a new one;  do the update;  shutdown; and then you can commit the snapshot into the base image if you wanted to save it, or just delete it if you want to go back to pristine
[21:51] <rharper> that would guarantee that you never corrupt your original image as long as you never committed the snapshot to the very base file that was created from the isos.
[21:52] <jdstrand> rharper: I do know mdeslaur wanted to use pure libvirt commands. I had a shell implemention that would use backing stores underneath libvirt before libvirt had reliable snapshot functionality
[21:53] <rharper> there's certainly a bug here,  as internal snapshots should be able to handle this;  the order of operations seems odd to me, but I need to look at the qemu implementation to see what happens on snapshot_blkdev_intenral w.r.t qcow2 refcounts