cjwatson | tomwardill: lxc-attach seems to pass output straight through without modification, so I think you can just do lxc-attach -n "$container_name" -- env blah | subunit-2to1, much as you had above | 07:53 |
---|---|---|
tomwardill | ah, right :) | 07:53 |
tomwardill | will give that a try once I've worked out why postgres is refusing to start | 07:53 |
tomwardill | thanks :) | 07:54 |
cjwatson | Though consider error handling | 07:54 |
cjwatson | As in, what happens if lxc-attach exits non-zero | 07:54 |
cjwatson | Pipes tend to lose that unless you take special care | 07:54 |
cjwatson | You still want to stop the container, but should preserve the exit code | 07:55 |
tomwardill | right | 07:56 |
SpecialK|Canon | someone with more familiarity with traversal want to review https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/385489 ? | 08:51 |
SpecialK|Canon | (because hi) | 08:51 |
tomwardill | cjwatson: before I break out pdb agian, any idea what might be causing: https://pastebin.canonical.com/p/djCTpmdCwF/ | 09:32 |
cjwatson | tomwardill: bad working directory maybe? | 09:33 |
tomwardill | `pwd` agrees with my current directory | 09:33 |
cjwatson | or permissions? | 09:33 |
cjwatson | open_for_writing swallows any IOError that isn't ENOENT! | 09:33 |
cjwatson | quality | 09:34 |
cjwatson | if you add an else: raise into open_for_writing (which really ought to be there anyway), you might get a better error message | 09:34 |
tomwardill | hmm, I appear to have a bunch of files that are owned by root | 09:35 |
tomwardill | I suspect that is not ideal | 09:35 |
cjwatson | Oh, your lxc-attach arrangements don't seem to switch user | 09:35 |
cjwatson | add '$PWD/utilities/run-as buildbot' just before 'env', maybe? | 09:36 |
tomwardill | yeah, this is from pre-that step I think | 09:36 |
tomwardill | trying to work out which step is doing it | 09:36 |
cjwatson | Might be leftovers from a previous run? | 09:37 |
tomwardill | yeah, think so | 09:37 |
tomwardill | poking | 09:37 |
cjwatson | And yeah, lxc-start-ephemeral -u ... meant "switch to this user", and part of the reason I added utilities/run-as was that at least at the time there was nothing else with quite exactly the right semantics | 09:39 |
cjwatson | (lxc-attach -u and lxc exec --user both take uids rather than usernames; setpriv(1) didn't exist yet) | 09:42 |
tomwardill | the good news is that I'm just about at the point where I can hack master.cfg and it will repeatedly get to the same state so I can debug it | 09:46 |
tomwardill | ... I should pick a faster subsection of the test suite to test this | 09:50 |
* tomwardill twiddles thumbs a bit more | 09:51 | |
tomwardill | run-as is giving me a permissions error on chdir to the build directory, but only when run via buildbot | 10:18 |
tomwardill | wtf | 10:18 |
cjwatson | User namespaces can cause much confusion sometimes, maybe ... | 10:20 |
tomwardill | yeah, something weird going on | 10:47 |
tomwardill | works fine run from a terminal | 10:48 |
* tomwardill sighs at the amount of shell/environment/namespace learning I don't know | 10:48 | |
tomwardill | unsure how I'm getting permission denied changing to the directory that is cwd | 10:57 |
StevenK | That is clearly perms | 10:57 |
StevenK | You either aren't the owner, or in the group, or there's no +x | 10:57 |
tomwardill | drwxr-xr-x 20 buildbot buildbot 4.0K Jun 11 10:55 build | 10:59 |
StevenK | All the way up to / ? | 10:59 |
tomwardill | hmm, no, buildbot ownership stops at /var/lib | 11:00 |
StevenK | I'd expect that, but hopefully everything has +x | 11:01 |
tomwardill | looks like it | 11:02 |
tomwardill | I can be in that directory quite happily in a shell | 11:02 |
tomwardill | oh, wait | 11:02 |
tomwardill | maybe I can't in this situation | 11:02 |
tomwardill | wtf | 11:02 |
tomwardill | root@lptests-xenial_tfbWfo:/var/lib/buildbot/lp-devel-xenial/build# su buildbot | 11:02 |
tomwardill | Cannot execute /bin/bash: Permission denied | 11:02 |
cjwatson | buildbot outside and inside the container might not be the same thing | 11:04 |
cjwatson | I would get the most minimal possible reproducer you can manage and strace it | 11:05 |
cjwatson | and also make sure to be looking at permissions by id (ls -nl) inside the container | 11:06 |
tomwardill | so, yeah | 11:08 |
tomwardill | `/` was 0700 | 11:08 |
tomwardill | ... that's a thing | 11:08 |
* tomwardill gets lunch, leaves it for future tom to worry about | 11:10 | |
ilasc | and in this context of twom dealing with complex issues, I come along and ask the rudimentary question: in LP how do we split a large MP in several smaller MPs ? Do I just create separate git branches and open MPs for each new branch or is there something I can do at the level of the large MP that I'm not yet aware of? | 11:10 |
StevenK | Years and years ago one of my friends did 'chmod 644 .*' as root in a top-level directory and then wondered why no one could log in | 11:11 |
ilasc | :) | 11:11 |
cjwatson | ilasc: Separate branches and open MPs for each. Are you familiar enough with the git-level operations here? | 11:12 |
cjwatson | (Also, prerequisites in MPs may be useful, depending on how you lay out the branch structure) | 11:15 |
ilasc | thanks cjwatson, hmmm good question :) just to make sure I start on the right path, I assume I start creating the smaller new git branches from master ? | 11:16 |
ilasc | indeed figured prerequisites in MPs will be necessary | 11:17 |
cjwatson | Normally from master, yes. | 11:17 |
cjwatson | The "splitting commits" section of "man git rebase" may be useful. | 11:17 |
cjwatson | If the bits you need to split up are separate enough in their respective files, you can often manage it with "git add -p" or whatever equivalent exists in your IDE. Failing that I sometimes resort to just dumping out the overall patch and editing it down to the bits I want before applying it, but editing patches by hand certainly isn't for everyone | 11:20 |
ilasc | great, ok, thanks Colin! it sounds like our approaches are similar in this case, I always go for editing patches by hand :) | 11:24 |
cjwatson | Oh, I'm glad I'm not the only person who puts up with that | 11:26 |
cjwatson | I do need it slightly less since I found tools to let me do line-by-line rather than hunk-by-hunk changes to the git index | 11:26 |
ilasc | :) | 11:29 |
cjwatson | Also keep a git ref to the original thing around, then you can't lose it | 11:33 |
ilasc | good idea :) | 11:44 |
SpecialK|Canon | `git add --patch`'s editor option is <3 | 11:49 |
cjwatson | I prefer vimagit since I discovered it last year sometime, but same sort of idea | 11:54 |
tomwardill | cjwatson: any idea wher eI need the umask change? | 13:28 |
cjwatson | I'm not quite sure, it was just a hunch as to how you might end up with mysterious mode 700 | 13:31 |
cjwatson | Is the base container like this or just the ephemeral copy? | 13:32 |
tomwardill | good question, sec | 13:33 |
cjwatson | If the former, look in your build pipeline, if the latter, start from lp-setup-lxc-test and trace down | 13:33 |
cjwatson | (probably) | 13:33 |
tomwardill | yeah, it's the latter | 13:34 |
cjwatson | Not actually what I expected | 13:34 |
tomwardill | which makes sense, as lp-setup-lxc-test is the only bit I've actually changed | 13:34 |
cjwatson | Though it probably should have been since you reported different behaviour when running from a terminal | 13:34 |
tomwardill | yeah | 13:34 |
tomwardill | a hack fix would be to just chmod / ;) | 13:34 |
cjwatson | buildbot's buildslave runs with umask 077 by default unless you say --umask=022 | 13:35 |
cjwatson | Maybe relevant? | 13:35 |
cjwatson | But you could also just umask 022 at the top of lp-setup-lxc-test ... | 13:35 |
cjwatson | I suspect that'll do it | 13:36 |
cjwatson | I could be wrong here, because I thought our buildbot worker config already did umask 022, but it's been some time since I looked at that and maybe it got lost somewhere along the way | 13:37 |
tomwardill | I'll have a look and give that a try | 13:37 |
cjwatson | puppet modules/launchpad/templates/buildbot.tac.erb has it | 13:37 |
cjwatson | Hm. Did you write buildbot.tac or whatever the modern equivalent is for the workers yourself? Or where did you get it from? | 13:38 |
tomwardill | I didn't write it | 13:38 |
tomwardill | came from lpsetup I think | 13:38 |
cjwatson | It might be a good idea to get sluagh:/srv/buildbot/lpbuildbot/buildbot.tac and compare | 13:38 |
cjwatson | lpsetup's might be wrong | 13:39 |
tomwardill | and it has an interesting thing: | 13:39 |
tomwardill | `umask = None` | 13:39 |
cjwatson | That might be from lpbuildbot demo/slave/buildbot.tac | 13:39 |
cjwatson | Which I'm not certain is in sync | 13:39 |
tomwardill | yeah, that makes sense | 13:39 |
tomwardill | asked for the real one | 13:39 |
tomwardill | well, it gets further, now to see if postgres works | 13:50 |
tomwardill | it's running tests! | 13:50 |
tomwardill | weeee | 13:50 |
tomwardill | now just subunit to work out | 13:51 |
ilasc | +! | 13:53 |
ilasc | +1 | 13:53 |
ilasc | ... can't type :P | 13:53 |
tomwardill | now, how do I make it stop | 13:56 |
* tomwardill reboots the worker | 13:56 | |
tomwardill | okay, might need to teach the test step about subunit 2 | 14:59 |
cjwatson | --subunit-v2 | subunit-2to1 you mean? or something else? | 14:59 |
tomwardill | piping the lxc-attach output through subunit-2to1 just reproduces the same problem of testr not understanding the ouput | 15:00 |
tomwardill | and trying to pipe the testr output through it still results in weird stdout in the logs and the step not understanding how many tests have run | 15:00 |
cjwatson | Ah, hm | 15:01 |
cjwatson | Maybe testr adds too much extra stuff | 15:01 |
tomwardill | hmm, or maybe I've done something wrong somewhere | 15:01 |
tomwardill | as `testr run --parallel --concurrency=2 --subunit --full-results '|' subunit-2to1` looks a bit weird, given the escaping around the pipe | 15:02 |
cjwatson | Where did you put the subunit-2to1 in that case? | 15:02 |
tomwardill | in the master.cfg | 15:03 |
cjwatson | What's the diff? | 15:03 |
tomwardill | command=['testr', 'run', '--parallel', '--concurrency=2', '--subunit', '--full-results', '|', 'subunit-2to1'])) | 15:03 |
cjwatson | Oh | 15:03 |
cjwatson | Well, yes | 15:03 |
cjwatson | That's an argv | 15:04 |
cjwatson | More or less | 15:04 |
cjwatson | It's not passed to a shell, so doesn't understand | | 15:04 |
tomwardill | which makes sense | 15:04 |
* cjwatson looks at buildbot.steps.shell | 15:05 | |
cjwatson | So ... there's no fiddly quoting required for the arguments themselves there | 15:05 |
cjwatson | You *could* just try: | 15:06 |
cjwatson | command=['sh', '-c', 'testr run --parallel --concurrency=20 --subunit --full-results | subunit-2to1'] | 15:06 |
cjwatson | Definitely a workaround, but ought to help | 15:06 |
tomwardill | the docs spec that you can give command as a single string | 15:06 |
tomwardill | and it does basically that | 15:06 |
tomwardill | (although that's in the latest docs) | 15:07 |
* tomwardill tries | 15:08 | |
cjwatson | I am a bit suspicious 'cos I can't find what implements that, but maybe | 15:08 |
tomwardill | running | 15:09 |
cjwatson | But the sh -c trick should definitely work if that doesn't | 15:09 |
tomwardill | https://usercontent.irccloud-cdn.com/file/7xneaob9/image.png | 15:16 |
tomwardill | success! | 15:16 |
cjwatson | Progress! | 15:16 |
tomwardill | the stdout is good too | 15:17 |
tomwardill | okay, so I think that's all the problems worked through | 15:17 |
tomwardill | now I just need to document what they were, work out patches and file an RT to try this... | 15:18 |
tomwardill | concurrency 5 is making my computer VERY LOUD | 15:21 |
cjwatson | Nice | 15:24 |
cjwatson | Out of interest, does this fix the "unknown worker (bug in our subunit output?)" thing that we currently get? Looked like it might from your image ... | 15:25 |
tomwardill | it seems to... | 15:25 |
tomwardill | we have a list of workers too! | 15:25 |
cjwatson | Ooh, does this let us download independent subunit streams from each worker? | 15:25 |
tomwardill | ooh, which tells you which worker ran which tests | 15:25 |
cjwatson | EXCELLENT | 15:26 |
tomwardill | I think the only 'stream' we get is the list of tests | 15:26 |
cjwatson | That will make debugging certain kinds of test isolation bugs so much easier | 15:26 |
tomwardill | if we upgrade from precise to xenial, do we need to rebuild the xenial LXC that we already have? | 15:26 |
cjwatson | Well, even separate lists of tests for each worker is a lot better than nothing | 15:27 |
cjwatson | I have no idea | 15:27 |
cjwatson | Hopefully not | 15:27 |
tomwardill | indeed, as I don't really want to have to try and maintain this script :) | 15:27 |
cjwatson | As long as you have something working locally, I think it's OK to debug it into existence a little bit on production if necessary | 15:27 |
tomwardill | hmm, getting some 'App server startup timed out' failures, but that may well be due to the load on the VM/machine | 15:28 |
cjwatson | Yeah, likely | 15:28 |
tomwardill | it's at 350% cpu usage and has eaten all the ram allocated to it | 15:28 |
cjwatson | om nom nom | 15:28 |
tomwardill | they're not on the same tests as the ones I had in the last run, so points towards that at least | 15:28 |
tomwardill | wish I'd left this machine in the basement now | 15:30 |
cjwatson | Heh | 15:30 |
cjwatson | This is great though, super-happy to see these improvements | 15:31 |
tomwardill | getting this out and working, then transcribing over to LXD will be super nice | 15:31 |
cjwatson | And hopefully LXD won't be too difficult after this | 15:31 |
cjwatson | Yeah | 15:31 |
tomwardill | and cleaning up/sorting lpsetup along the way | 15:31 |
cjwatson | I think I've decided I don't have enough brain to review https://code.launchpad.net/~pappacena/turnip/+git/turnip/+merge/385158 today. I've reviewed the Launchpad bits that need to precede that ... | 16:32 |
tomwardill | fixed container cleanup and exit code return too | 16:46 |
SpecialK|Canon | nice | 16:48 |
tomwardill | okay, will work out a plan and extract / update the files required tomorrow morning | 16:56 |
tomwardill | but it's looking good/feasible now | 16:56 |
cjwatson | Non-lcy01 bionic image builders aren't working. I've (belatedly) deployed staging equivalents to test this. lgw01 is failing due to a glance API difference, bos02 is possibly something else but I haven't worked it out yet. | 20:21 |
cjwatson | wgrant: ^- could I have a quick review of https://code.launchpad.net/~cjwatson/canonical-is-charms/gss-glance-v2-private/+merge/385608 ? | 20:36 |
cjwatson | Looks like bos02 is probably the same thing after all. | 20:46 |
* cjwatson cowboys on lgw01 bionic staging to test | 20:46 | |
cjwatson | Looks like that fixes it on lgw01, indeed | 20:52 |
wgrant | cjwatson: Ah, fun | 21:41 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!