menn0 | grrrr... an interface that mimicks bits of State using the same names but slightly different method signatures | 02:15 |
---|---|---|
menn0 | what a great idea :) | 02:15 |
menn0 | wallyworld: sorry, another question. how does a machine agent get into the "Started" state? where is that set? | 02:27 |
wallyworld | menn0: from memory, that happens when the agent comes up on the node and "phones home" via the precense api | 02:28 |
wallyworld | until then the staus is pending/allocationg | 02:28 |
menn0 | wallyworld: with this issue I'm looking at, it seems that the peergrouper isn't even noticing the new controller nodes so isn't adding them to the replicaset | 02:28 |
wallyworld | or it may be a direct status call, not sure | 02:28 |
menn0 | wallyworld: and the only reason that that can happen (really) is if the machine doesn't get to "started" | 02:29 |
wallyworld | you mean not noticing the status change? | 02:29 |
wallyworld | is there a watcher for status? | 02:29 |
menn0 | wallyworld: no it can't be that | 02:29 |
menn0 | wallyworld: the peergrouper polls so even if the watcher wasn't working any changes still get noticed | 02:30 |
menn0 | wallyworld: I can see the peergrouper running regularly in the logs, but the other machines aren't picked up | 02:30 |
wallyworld | when enable ha is run, the machine collection does get the new machine docs | 02:31 |
menn0 | wallyworld: those machines do connect to the API on machine-0 even when their own mongodb instance isn't part of the replicatset yet | 02:31 |
wallyworld | what is the peergrouper polling for? | 02:31 |
menn0 | I think it polls in case the replicaset changes underneath it (in mongodb) | 02:32 |
wallyworld | sorry, i meant what data is it polling? | 02:32 |
menn0 | there are watchers so that it reacts straight away something changes in state | 02:32 |
menn0 | and the polling is there to catch any changes in mongodb | 02:32 |
menn0 | the only way the peergrouper can be not reacting to the new machines is if they don't get to "started" | 02:33 |
wallyworld | so it watches for machine status changes | 02:34 |
menn0 | not really | 02:34 |
menn0 | it watches for controller changes as well as waking up periodically | 02:34 |
wallyworld | that's what "started" is isn't it? a machine agent status | 02:34 |
menn0 | and when it wakes up due to either a poll or a controller change | 02:34 |
menn0 | it checks the controller machines against the mongodb replicaset config | 02:34 |
menn0 | and udpates things if required | 02:35 |
menn0 | but controller machines are only considered if they're "started" | 02:35 |
menn0 | yes machine agent status | 02:35 |
menn0 | obtained via (state.)Machine.Status() | 02:35 |
wallyworld | so, can we see in the logs if that SetStatus() api call is being made | 02:35 |
* menn0 checks | 02:36 | |
wallyworld | by the new agent when it comes up via jujud | 02:36 |
menn0 | wallyworld: they are being made: | 02:36 |
menn0 | 2016-04-14 21:01:33 DEBUG juju.apiserver apiserver.go:291 <- [475] machine-1 {"RequestId":67,"Type":"Machiner","Version":1,"Request":"SetStatus","Params":"'params redacted'"} | 02:36 |
menn0 | 2016-04-14 21:01:33 DEBUG juju.apiserver apiserver.go:305 -> [475] machine-1 324.478971ms {"RequestId":67,"Response":"'body redacted'"} Machiner[""].SetStatus | 02:36 |
menn0 | don't know what the status is being set to but the calls are there | 02:37 |
* menn0 checks the machine-1 logs at that time | 02:37 | |
wallyworld | if you run with trace you'll see tha params | 02:37 |
wallyworld | it sounds like maybe the watcher is suspect? | 02:37 |
menn0 | wallyworld: I can't repro the problem reliably, it's intermittent | 02:38 |
wallyworld | awesome | 02:38 |
menn0 | so I'm stuck with the DEBUG logs from the CI failures | 02:38 |
menn0 | I think the machine was set to started | 02:39 |
menn0 | 2016-04-14 21:01:33 INFO juju.worker.machiner machiner.go:105 "machine-1" started | 02:39 |
wallyworld | sounds like it | 02:39 |
menn0 | so why the hell isn't the peergrouper noticing the machine..... | 02:39 |
wallyworld | so we need to know why the watcher is not firing or why the peergrouper doesn't see that | 02:39 |
wallyworld | exactly | 02:39 |
menn0 | if the logs were at TRACE I'd see why in the peergrouper logs | 02:39 |
wallyworld | we can ask QA i guess | 02:40 |
menn0 | wallyworld: I've just simulated a machine not setting its status to started (by hacking the machiner to not set the status for a specific tag) and I see exactly the same log output in the machine-N.logs and the mongodb logs. | 03:45 |
menn0 | wallyworld: so that's the likely intermediate cause | 03:45 |
menn0 | wallyworld: now to figure out why the machine isn't "started" when the peergrouper looks | 03:45 |
wallyworld | menn0: hmmm, interesting. maybe there's a network/routing issue at play? | 03:45 |
menn0 | wallyworld: it can't be that because we see the new machines contact the API on the bootstrap node | 03:46 |
menn0 | we even see it make the SetStatus call | 03:46 |
menn0 | wallyworld: my guess is that something is setting the status to another value after the machiner sets it to started | 03:46 |
menn0 | wallyworld: is that possible to your knowledge? | 03:47 |
wallyworld | not ottomh | 03:47 |
wallyworld | but could be i guess | 03:47 |
wallyworld | the logs should show extra SetStatus calls | 03:47 |
menn0 | wallyworld: there aren't any | 03:48 |
wallyworld | menn0: could there be a legit issue with the watcher firing? | 03:48 |
wallyworld | it does get there eventually right? | 03:48 |
wallyworld | but after a long time | 03:49 |
menn0 | wallyworld: well it polls every minute | 03:49 |
wallyworld | that is true | 03:49 |
wallyworld | so wtf | 03:49 |
menn0 | wallyworld: so even if the watcher doesn't fire the peergrouper still sees any changes at least once a minute | 03:49 |
wallyworld | yes | 03:50 |
menn0 | wallyworld: the other way the peergrouper might not be seeing a controller machine is if it's not in the controllerinfo doc | 03:51 |
menn0 | wallyworld: but that seems less likely based on my read of the code | 03:51 |
wallyworld | yeah, i'm not overly familiar with the controllerInfo state code | 03:52 |
menn0 | wallyworld: the code that updates it looks solid... it's either going to work or the whole txn which adds to it and adds the machine docs fails | 03:53 |
menn0 | and the machine docs are clearly being added | 03:53 |
wallyworld | menn0: could it be one of those corner cases we've hit before with txn etc? | 03:54 |
menn0 | wallyworld: maybe... I don't think so | 03:55 |
menn0 | i've just noticed something else in the peergrouper code... digging some more | 03:55 |
wallyworld | ok | 03:55 |
menn0 | wallyworld: I just noticed that the list of controller machines and machine status is only updated/checked based on the watcher | 04:05 |
menn0 | wallyworld: and the code to do that is horrid | 04:05 |
wallyworld | menn0: you saying the poll every minute relies on the watcher firing? | 04:05 |
mup | Bug #1571476 opened: "juju register" stores password on disk <juju-core:Triaged> <https://launchpad.net/bugs/1571476> | 04:06 |
mup | Bug #1571477 opened: juju 1.25.3: juju-run symlink to tmpdir <landscape> <juju-core:New> <https://launchpad.net/bugs/1571477> | 04:06 |
wallyworld | to update the machien list to poll | 04:06 |
mup | Bug #1571478 opened: juju login/register should only ask for password once <juju-core:Triaged> <https://launchpad.net/bugs/1571478> | 04:06 |
menn0 | wallyworld: no the poll happens every minute regardless | 04:06 |
wallyworld | right | 04:06 |
wallyworld | but the list to poll | 04:06 |
wallyworld | comes frm the watcher? | 04:06 |
menn0 | wallyworld: and during that poll the latest status of the mongo replicaset is updated | 04:06 |
menn0 | wallyworld: but the controller data from state is only refreshed based on the watchers | 04:07 |
menn0 | wallyworld: and the way it's done screams of data race to me | 04:07 |
wallyworld | seems like ti :-( | 04:07 |
wallyworld | it | 04:07 |
menn0 | there's a separate goroutine which passes a method on itself over a channel to main peergrouper goroutine | 04:08 |
wallyworld | i wonder why we poll at all? | 04:08 |
menn0 | the peergroup receives the method, calls it | 04:08 |
menn0 | which then goes and modifies stuff back on the main watcher again | 04:08 |
wallyworld | sounds like a good refactoring is in order | 04:08 |
menn0 | yep, I might try that | 04:09 |
menn0 | but first I'm going to land some simple logging changes so that we can get more info when it fails in CI | 04:09 |
wallyworld | sgtm | 04:10 |
menn0 | wallyworld: thanks for being a sounding board all day... it helps a lot having to explain what i'm seeing | 04:11 |
wallyworld | tis ok, i didn't do much | 04:11 |
wallyworld | or anything really :-) | 04:11 |
wallyworld | axw: any chance of a small review? https://github.com/juju/bundlechanges/pull/22 | 04:20 |
axw | wallyworld: sure | 04:21 |
axw | wallyworld: a little confused by the second paragraph in the description - I don't see any change relating to series override | 04:25 |
axw | wallyworld: is that a change to be made in juju/juju? | 04:26 |
wallyworld | axw: a test failed - the bundle said trusty but the charm in the bundle said precise. the test exected trusty, and i changed to precise | 04:26 |
wallyworld | actually, that may have been me adding trusty first up to the test | 04:27 |
wallyworld | and then having to change to precise | 04:27 |
wallyworld | so maybe ignore that | 04:27 |
axw | wallyworld: yeah, there are only additions in the code and tests | 04:27 |
wallyworld | yeah, i just checked too, so i think i just forgot what i changes/added | 04:28 |
axw | wallyworld: LGTM, please drop that para before merging to avoid confusing anyone else :p | 04:28 |
wallyworld | yep :-) | 04:29 |
wallyworld | ty | 04:29 |
menn0 | wallyworld: peergrouper logging changes: http://reviews.vapour.ws/r/4623/ | 05:09 |
menn0 | back soon | 05:09 |
wallyworld | awesome, will look in a sec | 05:09 |
wallyworld | axw: here's the other half of that fix, only a small change http://reviews.vapour.ws/r/4624/ | 05:50 |
axw | wallyworld: reviewed | 06:01 |
wallyworld | ty | 06:01 |
wallyworld | axw: yeah, i'll need to rework the common function. technical it is "user" specified (not charm specified) but i take the point about the message | 06:02 |
wallyworld | well, i guess not really user specified | 06:02 |
axw | wallyworld: no, I don't think so. in the original use of that function, it was user-specified because the user specifies with --series on the command line | 06:03 |
axw | here they're just doing "juju deploy some-bundle" | 06:03 |
wallyworld | yeah, i reconsidered by position :-) | 06:03 |
wallyworld | my | 06:03 |
axw | cool | 06:03 |
frobware_ | jam: didn't get anywhere with tests (re RB). Was testing some stuff that dooferlad was proposing. | 06:30 |
=== frobware_ is now known as frobware | ||
dimitern | frobware: morning | 06:45 |
dimitern | frobware: I have 2 PRs up for review, more coming later | 06:45 |
dimitern | http://reviews.vapour.ws/r/4614/ and https://github.com/juju/gomaasapi/pull/42 | 06:45 |
dimitern | frobware: ping | 07:42 |
frobware | dimitern: morning | 07:51 |
dimitern | frobware: morning :) | 07:51 |
dimitern | frobware: have you seen the links I pasted above? | 07:51 |
dimitern | (still not quite sure ERC doesn't shit itself and appears connected but it isn't) | 07:52 |
frobware | just about to look, but have a 1:1 with jam in a minute | 07:52 |
dimitern | frobware: sure, np | 07:52 |
jam | frobware: didn't like what I had to say? | 08:08 |
dimitern | wallyworld: ping | 08:21 |
dimitern | or axw ? | 08:22 |
axw | dimitern: heya, what's up? | 08:22 |
dimitern | I'm wondering why juju list-machines (or status for that mater) does not display containers | 08:23 |
axw | dimitern: erm, it doesn't? no idea | 08:23 |
axw | I haven't used containers in ages | 08:23 |
dimitern | axw: ok, np :) | 08:23 |
dimitern | I think, unless it's on purpose, it should be a bug | 08:23 |
mup | Bug #1571545 opened: juju status with default tabular format or juju list-machines does not show containers <observability> <status> <juju-core:New> <https://launchpad.net/bugs/1571545> | 08:45 |
hoenir | why does some test files on windows like fork/exec fail ? I know that fork syscall dosen't exist in windows but we should try to to detect the specific platfor and then run just the specific tests | 09:17 |
hoenir | I'm right? | 09:18 |
fwereade_ | dimitern, frobware, voidspace: sorry, dropped accidentally, but won't have much to contribute to topic beyond "I endorse the removal of hacks" | 09:29 |
wallyworld | dimitern: hey, sorry was afk | 09:48 |
dimitern | wallyworld: np - it's rather late anyway | 09:48 |
frobware | dimitern: re ntpdate: https://bugs.launchpad.net/bugs/1564397 | 09:49 |
mup | Bug #1564397: MAAS provider bridge script deletes /etc/network/if-up.d/ntpdate during bootstrap <bootstrap> <network> <juju-core:Triaged> <https://launchpad.net/bugs/1564397> | 09:49 |
dimitern | wallyworld: check out https://launchpad.net/bugs/1571545 | 09:49 |
mup | Bug #1571545: juju status with default tabular format or juju list-machines does not show containers <observability> <status> <juju-core:New> <https://launchpad.net/bugs/1571545> | 09:49 |
wallyworld | dimitern: i do have a question for you - i started to look at removing the Network attr from deploy service and it was a very deep rabbit hole and many 100s of lines of code that i started to delete and then i noticed i started to overlap woth one of your existing prs so backed off | 09:49 |
wallyworld | dimitern: i am hoping all that network stuff - including collections, paras structs etc - can all be deleted for 2.0 rc1 | 09:50 |
=== babbageclunk` is now known as babbageclunk | ||
dimitern | wallyworld: ah, well - yeah, it's gnarly but we'll get there and drop it soonish I hope | 09:50 |
wallyworld | dimitern: yeah, it needs to be done before 1.0 final | 09:51 |
wallyworld | 2.0 | 09:51 |
dimitern | it's no longer used and can't influence the code path anymore | 09:51 |
wallyworld | that status thing - it may not have ever included containers, not sure, but seems like a bug | 09:51 |
wallyworld | dimitern: but the api does expose it etc, so we need to drop thart bit at least | 09:51 |
dimitern | it's fine to drop the CLI argument (if still there) | 09:52 |
dimitern | i.e. just hide it while it can be dropped | 09:52 |
wallyworld | dimitern: the model migration stuff also references the obsolete collection(s) | 09:52 |
wallyworld | it would be best just to drop the whole lot; we will delete a couple of 1000 lines of code i think | 09:53 |
dimitern | wallyworld: ah, those I *think* are safer to remove now | 09:53 |
dimitern | wallyworld: let me have a look today how much we can drop | 09:53 |
dimitern | wallyworld: networksC it only still referenced by the opened ports, but should be easy to move that to spaces instead | 09:54 |
wallyworld | dimitern: ok, let me know how you get on. i delete a shit tonne of stuff from state, apis, params etc before i started to hit the address alllocation feature flag stuff | 09:54 |
wallyworld | so i stopped | 09:54 |
dimitern | wallyworld: yeah, I really wanted to drop the whole thing, but it was suggested as safer to drop it on maas only for now | 09:55 |
wallyworld | and yeah, the ports thing i wasn't sure about | 09:55 |
wallyworld | dimitern: not sure 100%, but i think the status thing was by design - [Machines] just really does mean machines and not containers | 09:56 |
wallyworld | to keep it not too verbose and to fit on a screen | 09:56 |
dimitern | well containers *are* machines :) | 09:57 |
wallyworld | not sure if we should keep it like that and add a --with-containers arg | 09:57 |
wallyworld | i'm just guessing | 09:57 |
wallyworld | what the rationale may have been | 09:57 |
wallyworld | but yeah, i agree with you | 09:57 |
dimitern | it will be nice to not have to go through a pile of yaml just to get what addresses containers have | 09:57 |
wallyworld | agreed | 09:58 |
dimitern | frobware, voidspace, babbageclunk: friendly review poke :) https://github.com/juju/gomaasapi/pull/42 http://reviews.vapour.ws/r/4614/ http://reviews.vapour.ws/r/4626/ | 10:03 |
* dimitern steps out for a while | 10:03 | |
babbageclunk | dimitern: reviewed the first two - the third's getting pretty far away from anything I know about, so might take me a bit longer. | 10:24 |
jamespage | erm | 10:59 |
mup | Bug #1571593 opened: lxd bootstrap fails with unhelpful 'invalid config: no addresses match' <juju-core:New> <https://launchpad.net/bugs/1571593> | 11:03 |
=== JoseeAntonioR is now known as jose | ||
=== Ursinha_ is now known as Ursinha | ||
mup | Bug #1571593 changed: lxd bootstrap fails with unhelpful 'invalid config: no addresses match' <juju-core:New> <https://launchpad.net/bugs/1571593> | 11:10 |
=== cppforlife__ is now known as cppforlife_ | ||
mup | Bug #1571593 opened: lxd bootstrap fails with unhelpful 'invalid config: no addresses match' <juju-core:New> <https://launchpad.net/bugs/1571593> | 11:16 |
babbageclunk | frobware: I had to destroy that ZNC service - it was hogging my nick here! But I still couldn't work out if it was listening on any local ports. | 11:19 |
frobware | babbageclunk: shame | 11:22 |
dimitern | cheers babbageclunk! | 12:00 |
babbageclunk | dimitern: :) oops, forgot to ping you when I'd done them! | 12:01 |
dimitern | babbageclunk: np, I've just got back anyway | 12:01 |
babbageclunk | frobware: I'll give it another go later on. | 12:02 |
voidspace | dimitern: did you get your reviews done? | 12:17 |
dimitern | voidspace: yeah, most of them - I'd appreciate a look on the last one though: http://reviews.vapour.ws/r/4626/ | 12:18 |
voidspace | dimitern: I'll swap it: http://reviews.vapour.ws/r/4629/ | 12:18 |
dimitern | voidspace: sure thing | 12:18 |
voidspace | dimitern: I like the logging changes :-) | 12:20 |
voidspace | dimitern: I have no new issues to add to the reviews already there | 12:21 |
dimitern | :) I'm sure *everybody* does heh | 12:21 |
* voidspace lunches | 12:21 | |
dimitern | voidspace: ta | 12:21 |
voidspace | babbageclunk: I'll pick up something new after lunch | 12:21 |
voidspace | babbageclunk: we're nearly there! | 12:22 |
dimitern | voidspace: reviewed | 12:34 |
babbageclunk | voidspace - around? Want to pick your brains about how devicename and hardware id are set. | 12:51 |
dimitern | frobware, voidspace, babbageclunk: guys, I still need an approval on http://reviews.vapour.ws/r/4626/ - please, have a look | 13:01 |
dimitern | frobware: you know what? /etc/network/if-up.d/ntpdate is missing on xenial - it' only there on trusty | 13:16 |
dimitern | (well, maybe also in more recent non-LTS *releases) | 13:17 |
frobware | dimitern: is that because ntpdate is not in the base image on xenial? | 13:18 |
dimitern | frobware: I suspect so - it's not on my machine after upgrading to xenial, nor it's on freshly deployed xenial maas nodes with the most recent images | 13:19 |
frobware | dimitern: but technically it could come back (i.e., later versions add it (again)) | 13:20 |
mgz | charms can also install packages. | 13:21 |
mgz | pretty sure neither ntp or ntpdate have ever been part of the base server image | 13:21 |
dimitern | true, but I can confirm ntpdate is there on trusty and not there on xenial images | 13:22 |
dimitern | even without juju in the picture | 13:23 |
dimitern | frobware: it could, but the code handles that transparently | 13:23 |
fwereade_ | voidspace, state/machine.go:1262 seems like it might not be quite right -- surely the preferred address should be allowed to change if it's no longer one of the know addresses? | 13:23 |
dimitern | (chmod -f -x .. and later chmod -f +x ..) | 13:23 |
frobware | dimitern: do we fail on chmod or just ignore? | 13:24 |
dimitern | chmod -f does not fail when the file is missing | 13:24 |
frobware | dimitern: do we fail if chmod fails? | 13:24 |
frobware | dimitern: ty | 13:24 |
frobware | dimitern: and chmod -f is supported as an option in precise? | 13:24 |
dimitern | frobware: unfortunately I could see a bunch of ntpdate still hanging around with the chmod -x patch, as it seems ifup calls `/bin/sh /etc/network/if-up.d/ntpdate` | 13:25 |
frobware | dimitern: yay | 13:26 |
dimitern | frobware: we *could* try using ifup --no-scipts, but that seems more dangerous | 13:26 |
frobware | dimitern: ooh. interesting. | 13:26 |
dimitern | frobware: I'll do some experiments to see | 13:27 |
frobware | dimitern: we *should* try this. the scripts will have run once for curtin's ENI, and they will on every reboot. Just not whilst we're replacing stuff. | 13:27 |
frobware | dimitern: at face value that seem ok | 13:28 |
dimitern | frobware: that's an excellent point (which I keep forgetting about) | 13:28 |
frobware | dimitern: is --no-scripts supported in precise? :) | 13:28 |
dimitern | frobware: will try precise as well | 13:29 |
frobware | dimitern: fwiw, I don't think '--no-loopback' is supported in precise | 13:31 |
frobware | dimitern: nope, not supported. | 13:31 |
frobware | dimitern: http://pastebin.ubuntu.com/15912945/ | 13:33 |
dimitern | frobware: unfortunately --no-scripts does not work even on xenial | 13:36 |
dimitern | frobware: that is, it works, but since one of the scripts is `bridge`, the bridges are not configured ok | 13:36 |
frobware | dimitern: what does 'bridge' mean here? | 13:36 |
dimitern | frobware: /etc/network/if-pre-up.d/bridge -> /lib/bridge-utils/ifupdown.sh* | 13:37 |
frobware | dimitern: which doesn't exist on precise... let me look elsewhere | 13:38 |
dimitern | frobware: it should be there if bridge-utils is installed | 13:38 |
fwereade_ | voidspace, ignore me | 13:42 |
dimitern | babbageclunk, frobware: how about `verifySubnetAliveUnlessMissing(cidr) error` ? it will still return no error if cidr does not match an existing subnet | 13:56 |
dimitern | would that be easier to follow? | 13:59 |
natefinch | lol | 14:02 |
babbageclunk | dimitern: a bit, although it's still got too many clauses in the name | 14:02 |
natefinch | if you have to make a function into a sentence, you probably need more than one function | 14:02 |
dimitern | what's wrong with descriptive names? | 14:02 |
dimitern | or that's just not how real go devs roll :D | 14:03 |
babbageclunk | dimitern: hang on, I'm typing up what I mean | 14:03 |
babbageclunk | dimitern: nothing wrong with descriptive names, it's that the thing shouldn't be one function if its name has to be too descriptive. | 14:03 |
frobware | dimitern, natefinch: fwiw, that was my original concern in the PR | 14:04 |
natefinch | generally if you need a name that specific, it means you're tying the implementation of that function too tightly into what your consumer needs. Just split it into two functions that do two simple things | 14:04 |
dimitern | ok, a better option will be I think to return a concrete error in the case the subnet does not exist, so it can be verified where needed | 14:05 |
natefinch | granted, I don't know what that function does per se, but it seems something like if subnetExists(cidr) { return verifyAlive(cidr) } is probably clearer and the individual functions are more reusable | 14:05 |
voidspace | fwereade_: ok, I will ignore you | 14:07 |
voidspace | babbageclunk: you there? | 14:07 |
babbageclunk | voidspace: yup, just typing up something | 14:09 |
babbageclunk | dimitern: https://pastebin.canonical.com/154572/ | 14:11 |
babbageclunk | dimitern: Maybe? | 14:11 |
babbageclunk | voidspace: yup? | 14:14 |
voidspace | babbageclunk: do you still have questions? | 14:14 |
babbageclunk | voidspace: yes! | 14:14 |
babbageclunk | voidspace: hangout? | 14:14 |
voidspace | babbageclunk: why do you want to ask about new device name? | 14:14 |
natefinch | anyone up for a review of a 2.0 bug? http://reviews.vapour.ws/r/4616/diff/# | 14:14 |
voidspace | babbageclunk: sure | 14:14 |
dimitern | babbageclunk: I'm trying out a similar approach, will update the PR soon with it | 14:15 |
frobware | jam: I think the bits we need to expose to apply NICs to the container is: SetContainerConfig(container, key, value string) | 14:18 |
babbageclunk | dimitern: cool | 14:25 |
babbageclunk | dimitern: can I pick your brains about /list | 14:25 |
babbageclunk | dimitern: oops, that is not what I meant to type | 14:26 |
dimitern | babbageclunk: yeah? :) | 14:26 |
babbageclunk | dimitern: what I meant to type was: provider/maas/volumes.go | 14:26 |
dimitern | babbageclunk: I'm not *that* familiar with it, but I'll help with what I can | 14:27 |
dimitern | babbageclunk: HO? | 14:27 |
babbageclunk | dimitern: ok thanks - voidspace was not much help! | 14:27 |
babbageclunk | dimitern: yup yu[ | 14:28 |
dimitern | frobware, babbageclunk: updated http://reviews.vapour.ws/r/4626/diff/ can you have another look please? | 14:41 |
voidspace | dimitern: looking | 14:44 |
babbageclunk | Gah, X keeps crashing on me. :( | 14:45 |
voidspace | dimitern: I'm landing my branch - the only issue you opened was invalid and your other two comments I addresses | 14:45 |
voidspace | dimitern: (you suggested changing map[string]bool to set.Strings but the bool has significance, it isn't just a set) | 14:46 |
dimitern | voidspace: sure, sounds good | 14:46 |
voidspace | dimitern: the map tracks which subnets we actually found (true/false) | 14:46 |
voidspace | dimitern: cool | 14:46 |
dimitern | voidspace: we should bump deps.tsv for gomaasapi at some point as well | 14:47 |
voidspace | dimitern: it's been bumped whenever needed | 14:47 |
voidspace | dimitern: last time was on Friday | 14:47 |
dimitern | voidspace: ok | 14:48 |
voidspace | dimitern: I don't think anything has been done since then that needs updating | 14:48 |
dimitern | voidspace: my PR that fixes fetching VLANs with a null name | 14:48 |
dimitern | (landed earlier) | 14:48 |
voidspace | dimitern: ah, cool | 14:49 |
voidspace | dimitern: want me to do just that and propose it? | 14:49 |
voidspace | dimitern: I just knocked one more thing off the maas2 list and was about to tackle the next | 14:49 |
frobware | dimitern: looking | 14:49 |
dimitern | voidspace: well, as it's not a blocker for your maas I can do it later tonight or tomorrow | 14:49 |
voidspace | dimitern: ok | 14:50 |
dimitern | frobware: thanks! | 14:50 |
dimitern | babbageclunk: updated/simplified http://reviews.vapour.ws/r/4626/diff/ (you dropped and missed this I think) | 14:51 |
voidspace | dimitern: LGTM on your branch | 14:51 |
dimitern | voidspace: ta! | 14:51 |
mup | Bug #1571687 opened: Azure-arm leaves machine-0 from the admin model behind <azure-provider> <ci> <destroy-controller> <jujuqa> <repeatability> <juju-core:Triaged> <https://launchpad.net/bugs/1571687> | 14:52 |
voidspace | mgz: ping | 15:02 |
katco` | ericsnow: standup time | 15:02 |
mgz | voidspace: yo | 15:03 |
babbageclunk | dimitern: looking now | 15:03 |
dimitern | thanks babbageclunk | 15:03 |
voidspace | mgz: just emailed you | 15:05 |
voidspace | mgz: I thought it was better to do as an email anyway | 15:05 |
voidspace | mgz: we'd like all the MAAS CI tests duplicating for MAAS 2.0 please :-) | 15:05 |
mgz | voidspace: we have a card for it | 15:06 |
voidspace | mgz: ah, awesome | 15:06 |
voidspace | mgz: we're very near to needing it | 15:06 |
voidspace | mgz: anytime tomorrow will be fine ;-) | 15:06 |
voidspace | :-P | 15:06 |
mgz | what's not working on master at present? | 15:07 |
voidspace | mgz: we don't add machine tags in instance characterstics (in progress) | 15:08 |
voidspace | mgz: instance.volumes unimplemented (in progress) | 15:08 |
voidspace | mgz: all container support not done yet (a couple of days work probably) | 15:08 |
voidspace | mgz: a network interface function that is implemented but not wired in | 15:08 |
voidspace | mgz: (that's trivial) | 15:08 |
mgz | hm, so the basic deploy test should work, but the bundle ones probably won't quite yet | 15:09 |
voidspace | mgz: yep | 15:09 |
voidspace | mgz: but it will only be a handful of days which is why I'm pinging now | 15:09 |
mgz | thanks :) | 15:09 |
voidspace | mgz: and thanks to you sir | 15:09 |
natefinch | cherylj: I commented on https://bugs.launchpad.net/juju-core/+bug/1531444 ... Maybe I'm missing some context, but it seems like it's probably not super critical | 15:11 |
mup | Bug #1531444: azure: add public mapping of series->Publisher:Offering:SKU <juju-core:Triaged> <https://launchpad.net/bugs/1531444> | 15:11 |
cherylj | natefinch: it was marked as critical as it impacted our ability to publish centos / windows in streams for azure | 15:12 |
babbageclunk | Is there a protocol for asking for help in canonical #maas? | 15:12 |
babbageclunk | Someone particular I should ask? | 15:13 |
cherylj | babbageclunk: I usually ask roaksoax, or mpontillo | 15:13 |
natefinch | cherylj: yes, but if it's only when a new version of windows comes out... we need to update core for that anyway (which, admittedly is horrible and bad, but it is the state of the code AFAIK) | 15:13 |
voidspace | babbageclunk: roaksoax has promised to help us | 15:14 |
dimitern | babbageclunk: allenap, mpontillo, roaksoax, blake_r | 15:14 |
mgz | it's not a very lively channel | 15:14 |
mgz | but gavin is our timezone, and sometimes in hitting range of me (allenap) | 15:14 |
cherylj | natefinch: thanks for checking it out, I'll bring it up again today to better understand the blockage | 15:14 |
babbageclunk | voidspace: or maybe he *vowed* to help us? | 15:14 |
babbageclunk | Ok, thanks | 15:14 |
voidspace | babbageclunk: uhm, maybe I guess... | 15:15 |
babbageclunk | voidspace: It would be more dramatic. | 15:16 |
voidspace | babbageclunk: it certainly would be | 15:16 |
babbageclunk | voidspace: Never promise when you could vow | 15:16 |
voidspace | babbageclunk: heh, sound life advice there | 15:17 |
voidspace | babbageclunk: frobware: dimitern: a really difficult one http://reviews.vapour.ws/r/4630/ | 15:18 |
dimitern | voidspace: looking | 15:18 |
frobware | dimitern: does it need a test where there are no tags? | 15:19 |
dimitern | voidspace: :) LGTM | 15:19 |
voidspace | dimitern: thanks | 15:19 |
dimitern | frobware: that's up to gomaasapi I think - it should handle the lack of tags as an empty slice (or nil) | 15:20 |
dimitern | voidspace: ^^ | 15:20 |
voidspace | dimitern: frobware: yep, it will just be an empty slice | 15:20 |
frobware | ok | 15:20 |
voidspace | dimitern: frobware: calling gomaasapi give us a struct with a Tags member - so either there's something there or there isn't. It doesn't matter. | 15:21 |
voidspace | or rather, an interface with a Tags method | 15:21 |
dimitern | voidspace: cool | 15:21 |
voidspace | frobware: thanks | 15:21 |
dimitern | voidspace: also I doubt you have any tags on your vmaas vms - otherwise that would've been noticed earlier :) | 15:22 |
dimitern | (I mean if it's a panic or something nasty like that) | 15:22 |
voidspace | indeed | 15:22 |
voidspace | babbageclunk: don't forget to update the status doc - ta! | 15:23 |
dimitern | voidspace, babbageclunk, frobware: http://reports.vapour.ws/releases/3899/job/run-unit-tests-race/attempt/1338#highlight it might be worth running provider/maas tests with '-race' a few times to find and fix that | 15:24 |
voidspace | dimitern: I've added it as a TODO on the status doc | 15:26 |
dimitern | voidspace: +1 | 15:27 |
frobware | voidspace, babbageclunk, dimitern: reminder - no rick call today | 15:36 |
dimitern | frobware: ok | 15:37 |
* dimitern bbl | 15:38 | |
babbageclunk | dimitern, voidspace: whoa, that data race is weird - can someone explain it to me a bit? | 15:41 |
mup | Bug #1519877 changed: 'juju help' Provider information is out of date <juju-core:Invalid> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1519877> | 16:01 |
=== tasdomas` is now known as tasdomas | ||
voidspace | babbageclunk: I looked at it, went "whoa" and stopped looking at it | 16:14 |
voidspace | babbageclunk: as you might guess, the race detector detects possible race conditions between goroutines | 16:14 |
voidspace | babbageclunk: so it shouldn't be *too* hard to work out | 16:14 |
babbageclunk | voidspace: Ah, I think I get it - it's the fact we store the filename in GetFile. | 16:14 |
babbageclunk | voidspace: In fakeController. | 16:15 |
voidspace | babbageclunk: if you think you can fix it then awesome | 16:15 |
voidspace | babbageclunk: ah, right | 16:15 |
voidspace | sounds likely | 16:15 |
babbageclunk | voidspace: do you think I should put locking on all of the fakeController methods that store state on the controller for later? | 16:17 |
babbageclunk | voidspace: may as well, right? | 16:17 |
voidspace | babbageclunk: if it's not too much work | 16:18 |
voidspace | babbageclunk: I don't really like "just in case" code | 16:19 |
voidspace | babbageclunk: but locking is perhaps an exception | 16:19 |
babbageclunk | It's only a few methods | 16:23 |
mup | Bug #1571737 opened: Race is mass provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1571737> | 16:31 |
cherylj | babbageclunk, voidspace, so who gets bug 1571737? :) | 16:39 |
mup | Bug #1571737: Race is mass provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1571737> | 16:39 |
babbageclunk | cherylj: me, me! | 16:39 |
cherylj | we have a winner! | 16:39 |
babbageclunk | cherylj: fixing it now | 16:39 |
cherylj | yay! | 16:39 |
cherylj | thank you, babbageclunk :) | 16:39 |
babbageclunk | cherylj: :) | 16:39 |
cherylj | babbageclunk: what's your lp ID? | 16:41 |
babbageclunk | cherylj: hmm, good question - checking now | 16:41 |
babbageclunk | cherylj: 2-xtian | 16:41 |
cherylj | yeah, I never would've guessed that | 16:41 |
cherylj | thanks, babbageclunk :) | 16:41 |
frobware | me neither | 16:42 |
frobware | and I'm sure you told me this a few weeks ago | 16:42 |
babbageclunk | voidspace, dimitern, frobware: review my data race fix please? http://reviews.vapour.ws/r/4631/ | 16:49 |
voidspace | babbageclunk: LGTM | 17:00 |
babbageclunk | voidspace: sweet. What's the protocol for closing bugs? Will anything update it automatically on merge if I put a tag on the PR, or do I just close it manually? | 17:01 |
voidspace | babbageclunk: assign it to yourself, mark it in progress | 17:02 |
voidspace | babbageclunk: then once the fix lands mark it fix committed | 17:02 |
voidspace | babbageclunk: QA are responsible for marking it fix released (effectively closing it) | 17:02 |
voidspace | babbageclunk: I don't *think* there's anything auto here | 17:03 |
mgz | our release process does auto-fix-released bugs targetted at the milestone | 17:03 |
voidspace | mgz: cool | 17:03 |
voidspace | babbageclunk: you should probably target the bug at the latest 2.0 beta/rc or whatever the latest is then | 17:03 |
mgz | yeah, rc1 | 17:04 |
mup | Bug #1570035 changed: Race in api/watcher/watcher.go <ci> <race-condition> <regression> <test-failure> <juju-core:Fix Released by natefinch> <https://launchpad.net/bugs/1570035> | 17:04 |
mup | Bug #1570994 changed: deploy fails to download updated local charm <juju-core:New> <https://launchpad.net/bugs/1570994> | 17:04 |
babbageclunk | When you say target it at 2.0-rc1 - is that the milestone? (It's already set to that.) | 17:06 |
voidspace | babbageclunk: yes | 17:07 |
babbageclunk | Ok, I've marked it as in-progress, and when the merge passes (as I'm sure it will with no flaky tests!) I'll change it to fix-committed. | 17:08 |
voidspace | babbageclunk: don't forget status doc | 17:09 |
babbageclunk | voidspace: haven't! | 17:10 |
voidspace | babbageclunk: :-p | 17:10 |
babbageclunk | voidspace: no, I mean I haven't updated it ;) | 17:10 |
voidspace | hah | 17:11 |
voidspace | I know you haven't | 17:11 |
babbageclunk | voidspace: but I will. | 17:12 |
voidspace | ok | 17:12 |
voidspace | you do that | 17:12 |
babbageclunk | voidspace: ding! | 17:30 |
mup | Bug # changed: 1556113, 1556146, 1556180, 1558901 | 17:31 |
natefinch | ericsnow, katco`: if you're looking to break up your day, you could review my bugfix from last week: http://reviews.vapour.ws/r/4616/ | 17:47 |
=== katco` is now known as katco | ||
ericsnow | natefinch: will take a look in a bit | 17:47 |
katco | natefinch: same | 17:47 |
mup | Bug #1571783 opened: Windows unit tests cannot setup under go 1.6 <ci> <go1.6> <jujuqa> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1571783> | 17:52 |
mup | Bug #1571783 changed: Windows unit tests cannot setup under go 1.6 <ci> <go1.6> <jujuqa> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1571783> | 17:55 |
mup | Bug #1571783 opened: Windows unit tests cannot setup under go 1.6 <ci> <go1.6> <jujuqa> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1571783> | 18:01 |
natefinch | I really wish juju status would print out the controller name and model name I'm looking at | 18:12 |
perrito666 | natefinch: open a bug | 18:12 |
=== cherylj_ is now known as cherylj | ||
=== redir is now known as redir_lunch | ||
mup | Bug #1571792 opened: Juju status should show controller and model names <juju-core:New> <https://launchpad.net/bugs/1571792> | 18:32 |
=== redir_lunch is now known as redir | ||
natefinch | cmars: I'm looking at https://bugs.launchpad.net/juju-core/+bug/1566130 but I can't reproduce it with a trivial install hook that just does an exit 1... do you still have a good repro? | 19:25 |
mup | Bug #1566130: awaiting error resolution for "install" hook <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1566130> | 19:25 |
cmars | natefinch, try pulling cs:~cmars/gogs and introducing an install hook error in reactive/gogs.py | 19:27 |
redir | anyone have a minute to rubber duck something with me? | 19:27 |
cmars | natefinch, maybe raise Exception("foo") in there | 19:27 |
natefinch | cmars: ok, I'll give it a try, thanks | 19:28 |
cmars | natefinch, then, "fix" it, do `juju upgrade-charm gogs --force-units` | 19:28 |
cmars | natefinch, then possibly juju resolved --retry gogs/0 | 19:28 |
redir | katco: who would I ping about zseries information? | 19:32 |
mup | Bug #1556155 changed: worker/periodicworker data race <race-condition> <juju-core:Fix Released> <https://launchpad.net/bugs/1556155> | 19:32 |
mup | Bug #1570219 changed: juju2 openstack provider setting default network <canonical-bootstack> <network> <openstack-provider> <juju-core:Fix Released> <https://launchpad.net/bugs/1570219> | 19:32 |
katco | redir: sec | 19:32 |
redir | np | 19:32 |
katco | redir: i answered on internal network in case you missed it | 19:37 |
redir | I did | 19:42 |
natefinch | oh weird, we automatically retry failed hooks now? | 20:19 |
natefinch | wallyworld: you around? | 20:20 |
mgz | natefinch: yeah, see dev thread in jan, from message from bogdan in nov | 20:20 |
mgz | did his followup changes all get reviewed? | 20:20 |
mgz | I saw at least one hanging around for a while | 20:20 |
natefinch | Oh yeah, I remember that thread now | 20:21 |
natefinch | mgz: no idea about reviews | 20:21 |
natefinch | mgz: was wondering if maybe the retry code had something to do with the bug I'm looking at: https://bugs.launchpad.net/juju-core/+bug/1566130 | 20:24 |
mup | Bug #1566130: awaiting error resolution for "install" hook <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1566130> | 20:24 |
mgz | natefinch: seems possible at least - don't have a tighter revision window to check for you I'm afraid, as we don't have tests for actually borked charms. | 20:26 |
natefinch | *nod* | 20:26 |
mgz | should at least have something exercising resolved --retry and the like, wouldn't be that hard to add. | 20:27 |
wallyworld | natefinch: sorta | 20:34 |
natefinch | wallyworld: np, got an answer elsewhere | 20:34 |
=== jillr_ is now known as jillr | ||
=== redir is now known as redir_afk | ||
redir_afk | I'll be back later this eve, but not sure what time yet. | 20:47 |
katco | redir_afk: gl | 20:48 |
mup | Bug #1571831 opened: TxnPrunerSuite.TestPrunes intermittent test failure <juju-core:New> <https://launchpad.net/bugs/1571831> | 20:50 |
mup | Bug #1571832 opened: Respect the full tools list on InstanceConfig when building userdata config script. <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1571832> | 20:50 |
natefinch | cmars: that install bug... have you been able to reproduce it with latest master? I tried a few different ways of having install fail, to no avail. | 20:57 |
cmars | natefinch, a very recent 2.0rc1 master. i'll try in a few min & demonstrate | 20:58 |
natefinch | cmars: thanks | 20:58 |
=== alexlist` is now known as alexlist | ||
cmars | natefinch, updated the bug with really precise instructions. confirmed it's still there with latest master. the trick is to upgrade-charm --force-units after you get the hook error | 21:49 |
mup | Bug #1571855 opened: User lacking model write access confronted with unhelpful message <docteam> <juju-core:New> <https://launchpad.net/bugs/1571855> | 21:53 |
mup | Bug #1571861 opened: juju upgrade-charm requires --switch for local charms <juju-core:New> <https://launchpad.net/bugs/1571861> | 22:05 |
axw | wallyworld: I'm thinking of removing the --generate flag from change-user-password. it currently doesn't tell you what it generated, so not very helpful; and really, you should just use a password manager if you want that | 23:50 |
wallyworld | axw: sgtm i think | 23:51 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!