[07:53] <cjwatson> tomwardill: lxc-attach seems to pass output straight through without modification, so I think you can just do lxc-attach -n "$container_name" -- env blah | subunit-2to1, much as you had above
[07:53] <tomwardill> ah, right :)
[07:53] <tomwardill> will give that a try once I've worked out why postgres is refusing to start
[07:54] <tomwardill> thanks :)
[07:54] <cjwatson> Though consider error handling
[07:54] <cjwatson> As in, what happens if lxc-attach exits non-zero
[07:54] <cjwatson> Pipes tend to lose that unless you take special care
[07:55] <cjwatson> You still want to stop the container, but should preserve the exit code
[07:56] <tomwardill> right
[08:51] <SpecialK|Canon> someone with more familiarity with traversal want to review https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/385489 ?
[08:51] <SpecialK|Canon> (because hi)
[09:32] <tomwardill> cjwatson: before I break out pdb agian, any idea what might be causing: https://pastebin.canonical.com/p/djCTpmdCwF/
[09:33] <cjwatson> tomwardill: bad working directory maybe?
[09:33] <tomwardill> `pwd` agrees with my current directory
[09:33] <cjwatson> or permissions?
[09:33] <cjwatson> open_for_writing swallows any IOError that isn't ENOENT!
[09:34] <cjwatson> quality
[09:34] <cjwatson> if you add an else: raise into open_for_writing (which really ought to be there anyway), you might get a better error message
[09:35] <tomwardill> hmm, I appear to have a bunch of files that are owned by root
[09:35] <tomwardill> I suspect that is not ideal
[09:35] <cjwatson> Oh, your lxc-attach arrangements don't seem to switch user
[09:36] <cjwatson> add '$PWD/utilities/run-as buildbot' just before 'env', maybe?
[09:36] <tomwardill> yeah, this is from pre-that step I think
[09:36] <tomwardill> trying to work out which step is doing it
[09:37] <cjwatson> Might be leftovers from a previous run?
[09:37] <tomwardill> yeah, think so
[09:37] <tomwardill> poking
[09:39] <cjwatson> And yeah, lxc-start-ephemeral -u ... meant "switch to this user", and part of the reason I added utilities/run-as was that at least at the time there was nothing else with quite exactly the right semantics
[09:42] <cjwatson> (lxc-attach -u and lxc exec --user both take uids rather than usernames; setpriv(1) didn't exist yet)
[09:46] <tomwardill> the good news is that I'm just about at the point where I can hack master.cfg and it will repeatedly get to the same state so I can debug it
[09:50] <tomwardill> ... I should pick a faster subsection of the test suite to test this
[09:51]  * tomwardill twiddles thumbs a bit more
[10:18] <tomwardill> run-as is giving me a permissions error on chdir to the build directory, but only when run via buildbot
[10:18] <tomwardill> wtf
[10:20] <cjwatson> User namespaces can cause much confusion sometimes, maybe ...
[10:47] <tomwardill> yeah, something weird going on
[10:48] <tomwardill> works fine run from a terminal
[10:48]  * tomwardill sighs at the amount of shell/environment/namespace learning I don't know
[10:57] <tomwardill> unsure how I'm getting permission denied changing to the directory that is cwd
[10:57] <StevenK> That is clearly perms
[10:57] <StevenK> You either aren't the owner, or in the group, or there's no +x
[10:59] <tomwardill> drwxr-xr-x 20 buildbot buildbot 4.0K Jun 11 10:55 build
[10:59] <StevenK> All the way up to / ?
[11:00] <tomwardill> hmm, no, buildbot ownership stops at /var/lib
[11:01] <StevenK> I'd expect that, but hopefully everything has +x
[11:02] <tomwardill> looks like it
[11:02] <tomwardill> I can be in that directory quite happily in a shell
[11:02] <tomwardill> oh, wait
[11:02] <tomwardill> maybe I can't in this situation
[11:02] <tomwardill> wtf
[11:02] <tomwardill> root@lptests-xenial_tfbWfo:/var/lib/buildbot/lp-devel-xenial/build# su buildbot
[11:02] <tomwardill> Cannot execute /bin/bash: Permission denied
[11:04] <cjwatson> buildbot outside and inside the container might not be the same thing
[11:05] <cjwatson> I would get the most minimal possible reproducer you can manage and strace it
[11:06] <cjwatson> and also make sure to be looking at permissions by id (ls -nl) inside the container
[11:08] <tomwardill> so, yeah
[11:08] <tomwardill> `/` was 0700
[11:08] <tomwardill> ... that's a thing
[11:10]  * tomwardill gets lunch, leaves it for future tom to worry about
[11:10] <ilasc> and in this context of twom dealing with complex issues, I come along and ask the rudimentary question: in LP how do we split a large MP in several smaller MPs ? Do I just create separate git branches and open MPs for each new branch or is there something I can do at the level of the large MP that I'm not yet aware of? 
[11:11] <StevenK> Years and years ago one of my friends did 'chmod 644 .*' as root in a top-level directory and then wondered why no one could log in
[11:11] <ilasc> :)
[11:12] <cjwatson> ilasc: Separate branches and open MPs for each.  Are you familiar enough with the git-level operations here?
[11:15] <cjwatson> (Also, prerequisites in MPs may be useful, depending on how you lay out the branch structure)
[11:16] <ilasc> thanks cjwatson, hmmm good question :) just to make sure I start on the right path, I assume I start creating the smaller new git branches from master ?
[11:17] <ilasc> indeed figured prerequisites in MPs will be necessary
[11:17] <cjwatson> Normally from master, yes.
[11:17] <cjwatson> The "splitting commits" section of "man git rebase" may be useful.
[11:20] <cjwatson> If the bits you need to split up are separate enough in their respective files, you can often manage it with "git add -p" or whatever equivalent exists in your IDE.  Failing that I sometimes resort to just dumping out the overall patch and editing it down to the bits I want before applying it, but editing patches by hand certainly isn't for everyone
[11:24] <ilasc> great, ok, thanks Colin! it sounds like our approaches are similar in this case, I always go for editing patches by hand :)
[11:26] <cjwatson> Oh, I'm glad I'm not the only person who puts up with that
[11:26] <cjwatson> I do need it slightly less since I found tools to let me do line-by-line rather than hunk-by-hunk changes to the git index
[11:29] <ilasc>  :)
[11:33] <cjwatson> Also keep a git ref to the original thing around, then you can't lose it
[11:44] <ilasc> good idea :)
[11:49] <SpecialK|Canon> `git add --patch`'s editor option is <3
[11:54] <cjwatson> I prefer vimagit since I discovered it last year sometime, but same sort of idea
[13:28] <tomwardill> cjwatson: any idea wher eI need the umask change?
[13:31] <cjwatson> I'm not quite sure, it was just a hunch as to how you might end up with mysterious mode 700
[13:32] <cjwatson> Is the base container like this or just the ephemeral copy?
[13:33] <tomwardill> good question, sec
[13:33] <cjwatson> If the former, look in your build pipeline, if the latter, start from lp-setup-lxc-test and trace down
[13:33] <cjwatson> (probably)
[13:34] <tomwardill> yeah, it's the latter
[13:34] <cjwatson> Not actually what I expected
[13:34] <tomwardill> which makes sense, as lp-setup-lxc-test is the only bit I've actually changed
[13:34] <cjwatson> Though it probably should have been since you reported different behaviour when running from a terminal
[13:34] <tomwardill> yeah
[13:34] <tomwardill> a hack fix would be to just chmod / ;)
[13:35] <cjwatson> buildbot's buildslave runs with umask 077 by default unless you say --umask=022
[13:35] <cjwatson> Maybe relevant?
[13:35] <cjwatson> But you could also just umask 022 at the top of lp-setup-lxc-test ...
[13:36] <cjwatson> I suspect that'll do it
[13:37] <cjwatson> I could be wrong here, because I thought our buildbot worker config already did umask 022, but it's been some time since I looked at that and maybe it got lost somewhere along the way
[13:37] <tomwardill> I'll have a look and give that a try
[13:37] <cjwatson> puppet modules/launchpad/templates/buildbot.tac.erb has it
[13:38] <cjwatson> Hm.  Did you write buildbot.tac or whatever the modern equivalent is for the workers yourself?  Or where did you get it from?
[13:38] <tomwardill> I didn't write it
[13:38] <tomwardill> came from lpsetup I think
[13:38] <cjwatson> It might be a good idea to get sluagh:/srv/buildbot/lpbuildbot/buildbot.tac and compare
[13:39] <cjwatson> lpsetup's might be wrong
[13:39] <tomwardill> and it has an interesting thing:
[13:39] <tomwardill> `umask = None`
[13:39] <cjwatson> That might be from lpbuildbot demo/slave/buildbot.tac
[13:39] <cjwatson> Which I'm not certain is in sync
[13:39] <tomwardill> yeah, that makes sense
[13:39] <tomwardill> asked for the real one
[13:50] <tomwardill> well, it gets further, now to see if postgres works
[13:50] <tomwardill> it's running tests!
[13:50] <tomwardill> weeee
[13:51] <tomwardill> now just subunit to work out
[13:53] <ilasc> +!
[13:53] <ilasc> +1
[13:53] <ilasc> ... can't type :P
[13:56] <tomwardill> now, how do I make it stop
[13:56]  * tomwardill reboots the worker
[14:59] <tomwardill> okay, might need to teach the test step about subunit 2
[14:59] <cjwatson> --subunit-v2 | subunit-2to1 you mean?  or something else?
[15:00] <tomwardill> piping the lxc-attach output through subunit-2to1 just reproduces the same problem of testr not understanding the ouput
[15:00] <tomwardill> and trying to pipe the testr output through it still results in weird stdout in the logs and the step not understanding how many tests have run
[15:01] <cjwatson> Ah, hm
[15:01] <cjwatson> Maybe testr adds too much extra stuff
[15:01] <tomwardill> hmm, or maybe I've done something wrong somewhere
[15:02] <tomwardill> as `testr run --parallel --concurrency=2 --subunit --full-results '|' subunit-2to1` looks a bit weird, given the escaping around the pipe
[15:02] <cjwatson> Where did you put the subunit-2to1 in that case?
[15:03] <tomwardill> in the master.cfg
[15:03] <cjwatson> What's the diff?
[15:03] <tomwardill>             command=['testr', 'run', '--parallel', '--concurrency=2', '--subunit', '--full-results', '|', 'subunit-2to1']))
[15:03] <cjwatson> Oh
[15:03] <cjwatson> Well, yes
[15:04] <cjwatson> That's an argv
[15:04] <cjwatson> More or less
[15:04] <cjwatson> It's not passed to a shell, so doesn't understand |
[15:04] <tomwardill> which makes sense
[15:05]  * cjwatson looks at buildbot.steps.shell
[15:05] <cjwatson> So ... there's no fiddly quoting required for the arguments themselves there
[15:06] <cjwatson> You *could* just try:
[15:06] <cjwatson> command=['sh', '-c', 'testr run --parallel --concurrency=20 --subunit --full-results | subunit-2to1']
[15:06] <cjwatson> Definitely a workaround, but ought to help
[15:06] <tomwardill> the docs spec that you can give command as a single string
[15:06] <tomwardill> and it does basically that
[15:07] <tomwardill> (although that's in the latest docs)
[15:08]  * tomwardill tries
[15:08] <cjwatson> I am a bit suspicious 'cos I can't find what implements that, but maybe
[15:09] <tomwardill> running
[15:09] <cjwatson> But the sh -c trick should definitely work if that doesn't
[15:16] <tomwardill> https://usercontent.irccloud-cdn.com/file/7xneaob9/image.png
[15:16] <tomwardill> success!
[15:16] <cjwatson> Progress!
[15:17] <tomwardill> the stdout is good too
[15:17] <tomwardill> okay, so I think that's all the problems worked through
[15:18] <tomwardill> now I just need to document what they were, work out patches and file an RT to try this...
[15:21] <tomwardill> concurrency 5 is making my computer VERY LOUD
[15:24] <cjwatson> Nice
[15:25] <cjwatson> Out of interest, does this fix the "unknown worker (bug in our subunit output?)" thing that we currently get?  Looked like it might from your image ...
[15:25] <tomwardill> it seems to...
[15:25] <tomwardill> we have a list of workers too!
[15:25] <cjwatson> Ooh, does this let us download independent subunit streams from each worker?
[15:25] <tomwardill> ooh, which tells you which worker ran which tests
[15:26] <cjwatson> EXCELLENT
[15:26] <tomwardill> I think the only 'stream' we get is the list of tests
[15:26] <cjwatson> That will make debugging certain kinds of test isolation bugs so much easier
[15:26] <tomwardill> if we upgrade from precise to xenial, do we need to rebuild the xenial LXC that we already have?
[15:27] <cjwatson> Well, even separate lists of tests for each worker is a lot better than nothing
[15:27] <cjwatson> I have no idea
[15:27] <cjwatson> Hopefully not
[15:27] <tomwardill> indeed, as I don't really want to have to try and maintain this script :)
[15:27] <cjwatson> As long as you have something working locally, I think it's OK to debug it into existence a little bit on production if necessary
[15:28] <tomwardill> hmm, getting some 'App server startup timed out' failures, but that may well be due to the load on the VM/machine
[15:28] <cjwatson> Yeah, likely
[15:28] <tomwardill> it's at 350% cpu usage and has eaten all the ram allocated to it
[15:28] <cjwatson> om nom nom
[15:28] <tomwardill> they're not on the same tests as the ones I had in the last run, so points towards that at least
[15:30] <tomwardill> wish I'd left this machine in the basement now
[15:30] <cjwatson> Heh
[15:31] <cjwatson> This is great though, super-happy to see these improvements
[15:31] <tomwardill> getting this out and working, then transcribing over to LXD will be super nice
[15:31] <cjwatson> And hopefully LXD won't be too difficult after this
[15:31] <cjwatson> Yeah
[15:31] <tomwardill> and cleaning up/sorting lpsetup along the way
[16:32] <cjwatson> I think I've decided I don't have enough brain to review https://code.launchpad.net/~pappacena/turnip/+git/turnip/+merge/385158 today.  I've reviewed the Launchpad bits that need to precede that ...
[16:46] <tomwardill> fixed container cleanup and exit code return too
[16:48] <SpecialK|Canon> nice
[16:56] <tomwardill> okay, will work out a plan and extract / update the files required tomorrow morning
[16:56] <tomwardill> but it's looking good/feasible now
[20:21] <cjwatson> Non-lcy01 bionic image builders aren't working.  I've (belatedly) deployed staging equivalents to test this.  lgw01 is failing due to a glance API difference, bos02 is possibly something else but I haven't worked it out yet.
[20:36] <cjwatson> wgrant: ^- could I have a quick review of https://code.launchpad.net/~cjwatson/canonical-is-charms/gss-glance-v2-private/+merge/385608 ?
[20:46] <cjwatson> Looks like bos02 is probably the same thing after all.
[20:46]  * cjwatson cowboys on lgw01 bionic staging to test
[20:52] <cjwatson> Looks like that fixes it on lgw01, indeed
[21:41] <wgrant> cjwatson: Ah, fun