[00:00] <perrito666> cherylj: back
[00:03] <menn0> anastasiamac: i'll address those review comments and get those merged into the feature branch separately. thanks.
[00:03] <anastasiamac> menn0: really? thnx :D
[00:06] <cherylj> hi perrito666, I'm looking at bug 1424069 where juju resolved fails with the error "ERROR unit "ubuntu/0" is not in an error state"
[00:06] <mup> Bug #1424069: juju resolve doesn't recognize error state <regression> <resolved> <juju-core:Triaged by cherylj> <https://launchpad.net/bugs/1424069>
[00:07] <cherylj> and it looks like what is happening, is that when the install hook fails, the Unit Agent status gets set to error
[00:07] <cherylj> but when we try to run juju resolved, it looks at the state of the Unit, not the UnitAgent
[00:07] <perrito666> cherylj: I ffed up that then.
[00:08] <perrito666> cherylj: it should in all cases use unitAgent
[00:08] <perrito666> unit should not be used yet
[00:08] <perrito666> cherylj: that particular behavior seems to be missing a test or it would have failed when I made the change
[00:10] <cherylj> perrito666: did you want to take this bug?  I can help out if you tell me what needs to happen, if you're running low on time :)
[00:12] <perrito666> cherylj: well it deppends is this alreay critical?
[00:12] <cherylj> yes
[00:12] <perrito666> how fun
[00:12] <perrito666> cherylj: what time is it for you?
[00:12] <wallyworld_> we have a couple of days
[00:12] <wallyworld_> 1.23 is not going to be released until thursday most likely
[00:12] <cherylj> perrito666:  7PM
[00:12] <perrito666> cherylj: well its unfair that you are working so late because of me
[00:13] <perrito666> assign it to me
[00:13] <cherylj> perrito666: well it sounds like we have a little bit of time
[00:13] <perrito666> wallyworld_: yes, but his bug is most likely blocking CI
[00:13] <cherylj> perrito666: I do need to put my daughter to bed now, but I can still help out once she's down.
[00:13] <wallyworld_> oh, nothing is in the topic
[00:13] <perrito666> cherylj: could you point me to where you found the general error happening? ill grow some decent test and fix it
[00:14] <perrito666> wallyworld_: its critical, I assume it is locking
[00:14] <cherylj> there's a testcase in the bug
[00:14] <cherylj> which I was able to reproduce easily on my local system
[00:15] <wallyworld_> perrito666: yeah, marked as critical regression, but topic not updated
[00:15] <perrito666> wallyworld_: I believe that is done by hand
[00:15] <wallyworld_> perrito666: it's late for you too, i can pick this up
[00:15] <cherylj> perrito666: I added some logging and found that before the regression, upon the install hook failure, the state of the unit was set to StatusError, so juju resolved worked fine
[00:16] <cherylj> The current behavior is that upon the install hook failure, the UnitAgent status is updated to StatusError (and nothing ever changes with the status of the Unit)
[00:16] <perrito666> cherylj: that is expected behavior
[00:16] <cherylj> But the juju resolved command still gets the status of the Unit to determine whether or not the unit is in an error state
[00:16] <perrito666> the issue seems to be the code checking if unit is broken
[00:16] <cherylj> perrito666: ok
[00:17] <perrito666> cherylj: tx a lot
[00:17] <perrito666> Ill go have dinner and fix it afterwards
[00:17] <cherylj> perrito666: np, let me know if there's anything else I can help with
[00:17] <cherylj> I'll bbl too
[00:22] <wallyworld_> perrito666: i've taken the bug, fixing now
[00:32] <perrito666> wallyworld_: aren't you a nice person?
[00:33] <wallyworld_> perrito666: sometimes :-) it's fixed, including a test, but i'm just looking at any other tests that may be needed
[00:34] <perrito666> wallyworld_: well, that test is going to be very useful on the change that I am practicing now
[00:39] <perrito666> you know the worse part? I need to change that back to unit
[00:39] <wallyworld_> sigh
[01:00] <wallyworld_> axw: a small one to fix blocker http://reviews.vapour.ws/r/1109/
[01:00] <axw> looking
[01:03] <axw> wallyworld_: seems odd that we'd be checking the agent status, rather than the unit status... what's the distinction meant to be?
[01:03] <wallyworld_> axw: agent status is what we check now before any health related work
[01:04] <wallyworld_> moving forward that will change
[01:04] <axw> wallyworld_: I see. can you please add a comment to that effect?
[01:04] <wallyworld_> sure
[01:15] <ericsnow> could I get another pair of eyes on http://reviews.vapour.ws/r/1102/?
[02:34] <axw> anastasiamac wallyworld_: this one is ready for review now - http://reviews.vapour.ws/r/993/
[02:34] <wallyworld_> ok
[02:35] <axw> wallyworld_: I ended up taking out all of the implementation of the storage watching code in this branch, because it was ~2000 LOC all up :)
[02:35] <axw> now about 600
[02:35] <wallyworld_> sure, i may duck out to get lunch and look after
[02:35] <axw> nps
[04:03] <wallyworld_> thumper: why do we have the agents starting in debug mode prior to determining what logging level to use?
[04:05] <thumper> histerical raisins
[04:05] <thumper> wallyworld_: we didn't want to miss important bits at startup
[04:05] <wallyworld_> thumper: thought so. but that means we leak credentials :-(
[04:07] <wallyworld_> we can log apiserver request/response at trace level, but will also liekly need to look to suppress cloud init logging etc
[04:07] <thumper> wallyworld_: here's an idea
[04:07] <thumper> when we write out the service script, we look at the current log level for the juju module
[04:08] <thumper> if it is set to DEBUG or below, we start with that
[04:08] <thumper> otherwise start with INFO
[04:08] <thumper> what this means though
[04:08] <thumper> is that some agents may start with info and some with debug if changes are made during deployment
[04:08] <thumper> the original idea was to have a defined, known starting point that was the same everywhere
[04:08] <thumper> there is no technical reason why it can't be different
[04:09] <wallyworld_> i agree with that - i just wish we din't include credentials
[04:09] <thumper> the original design went for consistency
[04:09] <thumper> we are outputting everything
[04:09] <wallyworld_> the issue is that even if we start with info and then change to debug, we'll still leak
[04:09] <thumper> yes
[04:10] <wallyworld_> so messing with logging levels is only a band aide which is not that effective
[04:10] <thumper> correct
[04:10] <wallyworld_> sigh
[04:10] <wallyworld_> seems like it needs to be done properly
[04:10] <thumper> so... alternative options
[04:10] <thumper> don't write out credentials in the log
[04:11] <davecheney> 15:10 < thumper> don't write out credentials in the log
[04:11] <davecheney> ^ yes, that
[04:11] <thumper> however, since we log all api traffic at debug
[04:11] <thumper> we need to do one of two things:
[04:11] <wallyworld_> we need to be able to strip credentials but easier said than done
[04:11] <thumper> be able to identify the values that are creds
[04:11] <thumper> or not write out all the traffic
[04:12] <wallyworld_> agreed, latter is flawed because sometimes you need all traffic
[04:12] <wallyworld_> without leaking
[04:12] <thumper> for debugging yes, for normal users, no
[04:12] <thumper> I don't think it is feasible to say "we
[04:12] <thumper> "we'll log everything except credentials"
[04:12] <wallyworld_> well, users need it even if they don't know it
[04:12] <thumper> it doesn't make sense
[04:13] <wallyworld_> since users will upload all-machines.log and then expect us to fix their problem
[04:13] <wallyworld_> or we need to turn on debug logging
[04:14] <wallyworld_> so we do need to allow for verbose logging minus secrets
[04:15] <thumper> yep, that's hard
[04:17] <wallyworld_> for now need a quick fix - can log api server requests at trace level
[04:17] <wallyworld_> and look at not dumping cloud init script at startup unless also at trace
[04:18] <wallyworld_> well, can log api server request metadata at debug
[04:18] <wallyworld_> but not contents
[04:21] <thumper> seems like a reasonable approach
[04:22] <wallyworld_> i'll do that
[05:00] <wallyworld_> axw: rb not updating; this one fixes a 1.23 release issue with leaking creds https://github.com/juju/juju/pull/1782
[05:03] <axw> looking
[05:06] <axw> wallyworld_: done
[05:06] <wallyworld_> ty
[06:00] <wallyworld_> axw: i'm off for a bit to go see Wicked, won't be able to look at your latest PRs till later tonight after I get back, or maybe tomorrow morning
[06:00] <axw> wallyworld_: no worries. enjoy :)
[06:01] <axw> no great rush, I've got things to carry on with
[06:01] <wallyworld_> axw: wil do, if you are bored, there's also the storage provisioner one http://reviews.vapour.ws/r/1108/
[06:01] <axw> wallyworld_: sure, I'll finish off this branch then take a look
[06:01] <wallyworld_> no hurry of course :-)
[06:02] <anastasiamac> axw: wallyworld_: i mite be able to look at some of these too :D
[06:02] <axw> anastasiamac: great, thanks
[06:02] <wallyworld_> great
[08:16] <mattyw> morning all
[08:39] <Muntaner> hello devs
[08:39] <Muntaner> I'm trying to use the load balancer charm HAProxy for my charm. I set up everything, but when I try to access to the public IP of the load balancer, I get a 503 Service Unavailable. How can I diagnose my situation?
[08:46] <davecheney> rogpeppe3: oh pooh , you left
[08:46] <rogpeppe3> davecheney: no i didn't
[08:47] <mup> Bug #1430205 was opened: lxc template needs refreshing every 24 hours <juju-core:New> <https://launchpad.net/bugs/1430205>
[08:58] <dooferlad> morning! o/
[09:00] <dimitern> dooferlad, o/
[09:05] <dimitern> dooferlad, I've been reading your kvm notes from yesterday
[09:06] <dimitern> dooferlad, how did that reject rules appear in iptables? from libvirt?
[09:07] <dooferlad> dimitern: I have no idea! I haven't tracked it down yet.
[09:07] <dimitern> dooferlad, also, have you checked the nat table? there should be an SNAT rule
[09:07] <TheMue> morning o/
[09:07] <dimitern> TheMue, o/
[09:07] <dooferlad> TheMue: o/
[09:11] <dooferlad> dimitern: bother, I didn't save those :-( I will get a duplicate environment up and get back to you.
[09:12] <dimitern> dooferlad, cheers
[09:21] <mup> Bug #1430220 was opened: Swift container (but not objects) deleted, bootstrap and destroy-environment fail <juju-core:New> <https://launchpad.net/bugs/1430220>
[09:25] <coreycb> TheMue, do you know the syntax for specifying an action-get key?
[09:31] <TheMue> coreycb: sorry, not directly. would have to look too
[09:32]  * TheMue is looking
[09:32] <coreycb> TheMue, ok np. figured I'd ask you since the other guys are not likely online.
[09:33] <coreycb> I'd tried a few ways based on https://jujucharms.com/docs/authors-charm-actions#action-get but wasn't getting anything back
[09:33] <TheMue> coreycb: yes, it's a bit early
[09:35] <TheMue> coreycb: the example shows a nested variable name. "outfile" and "name" are parts of it. if your "foo" is on the top level it's only "action-get foo".
[09:37] <coreycb> TheMue, ok thanks.  yeah that example's a bit odd since 'name' is never defined as far as I can see.  as an aside, I'm running with an experimental version of juju that has leadership elections so I'm going to try on 1.22-beta5.
[09:40] <TheMue> coreycb: indeed, it doesn't match the actions.yaml shown above. "compression.type" would be a better example.
[09:54] <mup> Bug #1430225 was opened: juju br0/juju-br0 does not observe dhcp mtu settings <juju-core:New> <https://launchpad.net/bugs/1430225>
[10:02] <dimitern> TheMue, standup?
[10:02] <TheMue> dimitern: yes, trying to connect. somehow my browser dislikes me today :(
[10:04] <TheMue> dimitern: aargh, not only the browser, all new connections. will reboot now
[10:11] <mattyw> anastasiamac, excellent questions on your review, thanks very much
[10:36] <mup> Bug #1430245 was opened: Transient error with juju-unit-agent on windows hosts <windows> <juju-core:New> <https://launchpad.net/bugs/1430245>
[10:58] <coreycb> TheMue, ok so I found the issue
[10:58] <TheMue> coreycb: what has it been?
[10:59] <jam> dimitern: ping
[10:59] <dimitern> jam, pong
[11:00] <jam> I'm running into an intermittent failure, that we've tracked down to maybe a network issue
[11:00] <jam> specifically in master
[11:00] <coreycb> I was using 'params:' instead of 'properties:' in actions.yaml.  params doesn't seem to work but properties does.  so maybe the doc just needs an update, not sure.
[11:00] <coreycb> TheMue, ^
[11:00] <jam> if you run: cd worker/uniter/filter
[11:00] <jam> $ time go test -c && for i in `seq 10`; do ./filter.test -gocheck.v -gocheck.f ConfigEvents & sleep 0.01; done
[11:01] <jam> that runs the one test 10 times in parallel
[11:01] <jam> it seems we are getting an extra AddressChanged event
[11:01] <jam> and when you touch the test to remove the line where it sets the address
[11:01] <jam> it starts passing consistently, when we shouldn't be getting *any* events since the address wouldn't have been set
[11:02] <dimitern> jam, I'll have a look
[11:02] <dimitern> jam, which test is failing?
[11:02] <jam> -gocheck.f ConfigEvent
[11:03] <jam> TestConfigEvents and TestInitialAddressEventIgnored both suffer from this
[11:03] <TheMue> coreycb: as I understand it "params" is for the definition parameters (top level), "properties" for their potential additional properties
[11:04] <dimitern> jam, ok, thanks, will let you know if I can reproduce it locally
[11:04] <coreycb> TheMue, that's what the doc seems to say too, but it doesn't appear to be the case
[11:04] <jam> dimitern: you might need to bump up "10" 1-in-10 fails for me here
[11:05] <TheMue> coreycb: oh, have to talk to bodie and jw4 about it. could you pastebin your actions.yaml?
[11:07] <dimitern> jam, so far 2-in-20 failed
[11:07] <dimitern> 3 even
[11:09] <dimitern> now 4
[11:09] <coreycb> TheMue, very basic, this works but if you swap params for properties it didn't work for me  --  http://pastebin.ubuntu.com/
[11:09] <jam> dimitern: right, so often enough that we have a clear problem :)
[11:09] <coreycb> sorry, http://pastebin.ubuntu.com/10573902/
[11:09] <dimitern> jam, it seems to fail twice as frequently here - always with github.com/juju/juju/testing/channel.go:63 - unexpected receive
[11:10] <dimitern> jam, is that the same issue you're seeing?
[11:10] <jam> dimitern: oh, I'm not saying the count, just that it fails regularly. 10 was enough that every attempt failed for me
[11:10] <jam> dimitern: but UnexpectedReceive is the failure
[11:11] <TheMue> coreycb: ok, thanks. will discuss it with bodie and jw4 and come back to you later
[11:11] <coreycb> thanks
[11:11] <dimitern> jam, ok, why do you think it's a network issue?
[11:12] <jam> dimitern: if you comment out the line in the test SetNetworkAddress, the test *passes*, when the test expects nothing to set the address
[11:12] <jam> since it didn't do it
[11:13] <jam> dimitern: the test is exercising the logic that just setting address or just setting charm url doesn't generate an event until the other one occurs
[11:14] <jam> dimitern: now this is a test running in JujuConnSuite, and the SetUpTest is using s.unit.AssignToNewMachine()
[11:14] <jam> so I don't know what that gets running in the background
[11:15] <jam> but apparently we're assigning addresses to the machine being tested
[11:15] <jam> and if it happens at exactly the right moment
[11:15] <dimitern> jam, well, commenting out SetAddresses before starting the watcher papers over the problem I think
[11:15] <jam> we end up with 2 changed events, and the test fails
[11:16] <jam> dimitern: the point is that something in JujuConnSuite is forcing addresses for machines, and the test assumes it isn't, I don't *directly* care about what the fix is, it is just an intermittent test that blocked me landing code because I thought I broke something here
[11:17] <perrito666> morning
[11:17] <TheMue> perrito666: o/
[11:17] <dimitern> jam, right, will dig deeper into it
[11:17] <jam> certainly some sort of "don't touch me" flag here would be nice
[11:17] <jam> I can understand that for most tests maybe we want to set an address because we have to have something to work with
[11:18] <jam> and this particular test was written when nothing was assigned
[11:18] <dimitern> jam, I guess removing setAddresses from there should do the trick
[11:18] <jam> dimitern: so just commenting out SetAddress in the test breaks what the test actually cares about testing
[11:18] <jam> (that we don't fire events until both address and charm url are set)
[11:18] <jam> dimitern: I guess ideally we'd be more decoupled, but it is written using JujuConnSuite, and I don't know we want to rework all of that
[11:19] <jam> dimitern: Just know that TestInitialAddressEventIgnored also runs into this problem
[11:19] <jam> dimitern: https://bugs.launchpad.net/juju-core/+bug/1426394
[11:19] <mup> Bug #1426394: TestConfigEvents random failure <ci> <intermittent-failure> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1426394>
[11:20] <jam> so mgz is seeing it in CI
[11:25] <dimitern> jam, I think the problem is s.machine in that suite is machine 0, which shouldn't be
[11:27] <anastasiamac> mattyw: :D np - m enjoying reading the code! yvm for ur PR :D
[11:28] <mattyw> anastasiamac, I just had one question emailed you about it
[11:39] <jam> dimitern: I don't think the test cares, so if you want to have SetUpSuite create 2 machines, I'm ok with that. Though definitely comment why
[11:47] <anastasiamac> mattyw: is this the comment about returning a valid data when error occurs?
[11:51] <mattyw> anastasiamac, that's the one, I wasn't sure if you were just commenting on expecting action
[11:52] <anastasiamac> mattyw: i liked the 3 part return :D
[11:52] <anastasiamac> mattyw: but in every similar piece of code i've seen, when an error is returned
[11:53] <anastasiamac> mattyw: all other parts are either nil or the most basic "empty" default
[11:53] <anastasiamac> mattyw: whereas you were returning a MeterNotAvailable value...
[11:53] <anastasiamac> mattyw: all i was saying that it jumped at me :)
[11:55] <mattyw> anastasiamac, ah I see
[11:56] <anastasiamac> mattyw: :D
[12:41]  * TheMue steps out for a moment
[12:42] <mup> Bug #1430225 changed: juju br0/juju-br0 does not observe dhcp mtu settings <cts> <juju-core:New> <https://launchpad.net/bugs/1430225>
[12:45] <jam> anyone else upgraded to go 1.4?
[12:45] <jam> go fmt stopped working if you have a symlink in your PWD
[12:46] <jam> and it changed how imports are sorted
[12:46] <jam> :(
[13:16] <dimitern> jam, I found the issue
[13:16] <jam> dimitern: yay
[13:16] <dimitern> jam, as usual in these cases a 1 line change :)
[13:17] <jam> dimitern: all about finding that line and knowing it won't break other expectations
[13:17] <mgz> dimitern: oo, go on, what's the change?
[13:17] <dimitern> jam, can you try it to see if you can still reproduce it? just add seenConfigChange = false after f.outConfig = nil on lin 470
[13:18] <jam> dimitern: that sounds like a very bad change
[13:18] <jam> it means that future address changes won't be notified until config changes again
[13:18] <dimitern> jam, I've successfully run both TestConfigEvents and TestInitialAddressEventIgnored multiple times with seq 50
[13:18] <dimitern> jam, well, the config watcher is getting restarted on setCharm
[13:19] <jam> dimitern: so the code intends to not send the *first* notification until it sees both Config and Address changes, but future changes to either should always trigger a change.
[13:19] <dimitern> jam, and that's not accounted for in the maybePrepareConfigEvent
[13:19] <dimitern> jam, not quite
[13:20] <dimitern> jam, if the config watcher was restarted, we *always* get the initial empty event
[13:22] <jam> dimitern: but if we just did setCharm is it actually wrong to say the config changed?
[13:23] <dimitern> jam, well another disturbing thing is the configChanges chan gets reset to the newly started configw.Changes() after it gets started
[13:24] <dimitern> jam, (again in setCharm event handling), so there's a possibility of a race reading from the old configChanges while configw is getting restarted
[13:26] <dimitern> jam, this might be the other missing piece, as running all tests 50 times in parallel I only got 3 failures - now trying with that additional fix - adding configChanges = nil; seenConfigChange = false just after "changing charm to %q"
[13:28]  * dimitern 's machine is visibly lagging when running 50 instances of the filter tests in parallel
[13:28] <jam> dimitern: well I believe that is also running 50 mongod instances in parallel
[13:29] <dimitern> yeah :)
[13:30] <dimitern> jam, ok, so only 2 failures now - both in TestGetMeterStatus
[13:36] <dimitern> jam, correction - TestMeterStatusEvents
[13:38] <dimitern> jam, with seq 20 I can't reproduce it, and since my machine was under quite a load and I can see in the log just before "timeout waiting for receive" that the meter status was indeed sent
[13:39] <dimitern> jam, I'd attribute that these failures to heavy lagging under load
[13:40] <jam> dimitern: if it is just ShortWait sort of timeout, that would be ok
[13:40] <jam> depends on the failure
[13:41] <dimitern> jam, it is a LongWait I think, but I'm running the tests a few more times
[13:47] <dimitern> jam, ok, now 1 failure out of 50 for all tests - this time TestActionEvents and it *is* a ShortWait
[13:48] <jam> dimitern: so the change sounds good given the testing we have in place, but I wouldn't be surprised if our testing wasn't actually as complete as we would like. So this is something that we'd want to think through well. Possibly running past fwereade
[13:49] <dimitern> jam, ok, so far we've established I believe it's not a networking issue but a watcher/race sort of thing; I'll propose a fix with those few lines added and provide a way to test it using your snippet
[13:49] <dimitern> jam, I'd
[13:50] <dimitern> jam, (really bad lag!)
[13:50] <dimitern> jam, I'd ask you to retry reproducing it with the fix
[13:51] <jam> dimitern: I'm certainly willing to try it. My concern isn't that it doesn't fix what I saw, but if it breaks other expectations
[13:52] <dimitern> jam, agreed, I'll ask fwereade to have a look as well
[13:54] <alexisb> all we have a critical bug blocking 1.22, I need some volunteers:
[13:54] <alexisb> https://bugs.launchpad.net/juju-core/+bug/1430049
[13:54] <mup> Bug #1430049: unit "ceph/0" is not assigned to a machine when deploying with juju 1.22-beta5 <oil> <juju-core:Triaged> <https://launchpad.net/bugs/1430049>
[13:55] <alexisb> wallyworld_, has done initial investigation but it will require some f2f time with jhobbs in US hours
[13:57] <dimitern> dooferlad, TheMue, voidspace, guys, I'd appreciate if some of you could join the maas+juju interlock in 2 minutes
[13:57] <mgz> gsamfira: bug 1430340
[13:57] <mup> Bug #1430340: Failing to create tempdir in tests on windows <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1430340>
[13:57] <gsamfira> mgz: thanks :D
[13:58] <gsamfira> grabbing a coffee and looking
[13:58] <dooferlad> dimitern: on it
[13:58] <dimitern> dooferlad, cheers!
[13:58] <voidspace> dimitern: yep, just grabbing coffee
[13:59] <dimitern> voidspace, ta!
[14:00] <mup> Bug #1430340 was opened: Failing to create tempdir in tests on windows <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1430340>
[14:00] <mgz> gsamfira: for interest, the fix I'd made previously was https://github.com/juju/testing/pull/52 for IsolationSuite usage in juju, making sure we actually passed all the windows variables through to subprocesses regardless of case
[14:02] <gsamfira> cool! Windows does not realy care about case, but I remember doing case insensitive match when I pushed this. Might be mistaken though
[14:02] <gsamfira> be roght back
[14:02] <gsamfira> *right
[14:06] <wwitzel3> perrito666: ping
[14:09] <perrito666> wwitzel3: pong
[14:09] <perrito666> wasnt me
[14:11] <perrito666> wwitzel3: ahh dst
[14:11] <wwitzel3> perrito666: standup
[14:12] <perrito666> wwitzel3: yup sorry I kieep thinking its in one hour
[14:12] <mup> Bug #1430340 changed: Failing to create tempdir in tests on windows <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1430340>
[14:18] <mup> Bug #1430340 was opened: Failing to create tempdir in tests on windows <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1430340>
[14:20] <axw> fwereade: if you have any time today, I'd appreciate a review of the storage hook source: http://reviews.vapour.ws/r/1113/
[14:30] <gsamfira> mgz: running tests now
[14:30] <mgz> gsamfira: ace
[14:40] <gsamfira> mgz: http://paste.ubuntu.com/10574918/ :)
[14:41] <gsamfira> going to fix tests today. We need that CI up as soon as humanly possible :D
[14:44] <gsamfira> recloned master just to make sure
[14:45] <gsamfira> yup, my bad. unclean env on my machine. Apologies
[14:45] <mgz> yeah, wondered if it was a dep issue
[14:47] <gsamfira> testing again
[14:48] <axw> can someone please review https://github.com/juju/juju/pull/1789 - fixes the critical blocker in 1.22
[14:48] <dimitern> axw, LGTM
[14:49] <axw> dimitern: thanks
[14:49] <gsamfira> mgz: I mentioned this before, and I don't know if its possible, but using WinRM to run the tests might avoid environment issues you get with OpenSSH. You can still use OpenSSH to scp files, just don't add it to the environment
[14:49] <gsamfira> mgz: there is a python script that can be used for this, and it can be run from any Linux box
[14:50] <gsamfira> mgz: https://github.com/cloudbase/pywinrm/blob/master/wsmancmd.py
[14:51] <gsamfira> mgz: we use this in the Hyper-V OpenStack CI to run the tempest tests
[14:55] <mgz> gsamfira: I think the ssh stuff is pretty clean - I was worried we were polluting with cygwin junk but turned out not to be the case
[14:56] <dimitern> TheMue, please update the status on bug 1428430
[14:56] <mup> Bug #1428430: AllWatcher does not remove last closed port for a unit, last removed service config <api> <juju-core:Triaged by themue> <https://launchpad.net/bugs/1428430>
[14:56] <TheMue> dimitern: will do
[15:00] <mup> Bug #1403955 was opened: DHCP's "Option interface-mtu 9000" is being ignored on bridge interface br0 <cts> <juju-core:New> <isc-dhcp (Ubuntu):Confirmed> <https://launchpad.net/bugs/1403955>
[15:04] <axw> dimitern: I'm off to bed, would you mind keeping an eye on that merge and restart it if needed?
[15:05] <dimitern> axw, sure, np
[15:05] <voidspace> dimitern: so only the State can run queries (like "find all IP addresses for this machine ID"
[15:05] <voidspace> dimitern: because the collection names are private
[15:05] <voidspace> dimitern: so a new state method to find them all
[15:06] <voidspace> dimitern: or I could just have a state method to remove them all and move all that code
[15:06] <dimitern> voidspace, yeah, that sounds good
[15:06] <voidspace> dimitern: (it's only iterate over the returned collection and call addr.Remove)
[15:06] <voidspace> dimitern: a state method to fetch them all, or a state method to remove them all - which do you think?
[15:06] <voidspace> dimitern: I think maybe just a method to find them all
[15:06] <dimitern> voidspace, I think the latter is better
[15:06] <voidspace> hah
[15:06] <voidspace> dimitern: ok, maybe
[15:06] <voidspace> it's what we specifically need I guess
[15:07] <voidspace> fairy 'nuff
[15:07] <dimitern> voidspace, if we need to find them we'll add another method
[15:07] <voidspace> let a thousand methods bloom
[15:07] <dimitern> voidspace, :)
[15:07] <dimitern> voidspace, however, let me check a thing first
[15:11] <voidspace> dimitern: this is the core logic
[15:11] <voidspace> dimitern: http://pastebin.ubuntu.com/10575094/
[15:11] <voidspace> dimitern: it needs some error handling around the addr.Remove() and it's in the wrong place (needs to be in State)
[15:11] <voidspace> but that's all it's doing
[15:12] <dimitern> voidspace, yeah - no existing "find all ips for a machine/subnet"
[15:12] <voidspace> dimitern: I was pretty sure there wasn't as I've worked with most of the IPAddress code
[15:13] <voidspace> it's new for the networking stuff
[15:13] <voidspace> we're only modelling IP addresses allocated for containers so far
[15:15] <dimitern> voidspace, yeah, so I guess it does sound better to have a machine.AllocatedIPAddresses() method
[15:15] <dimitern> voidspace, and then call ip.Remove() on each.
[15:18] <voidspace> dimitern: ok, I'll do that
[15:18] <voidspace> dimitern: thanks
[15:18] <voidspace> dimitern: for machine id I want container.InstanceId() right?
[15:18] <dimitern> voidspace, no, just machine id
[15:19] <voidspace> dimitern: I have a state.Machine called container
[15:19] <dimitern> voidspace, we only need instanceId when talking to the provider - i.e. ReleaseAddress
[15:19] <voidspace> ah
[15:19] <voidspace> so ID() instead of InstanceId()
[15:20] <voidspace> or at least Id()
[15:26] <dimitern> voidspace, yes
[15:27] <dimitern> fwereade, I'd appreciate if you can have a look at this http://reviews.vapour.ws/r/1118/ which should fix a uniter filter intermittent test failure
[15:30] <mup> Bug #1307728 changed: ensure-availability command should show actions performed <ha> <ui> <juju-core:Fix Released> <https://launchpad.net/bugs/1307728>
[15:30] <mup> Bug #1403955 changed: DHCP's "Option interface-mtu 9000" is being ignored on bridge interface br0 <cts> <juju-core:Invalid> <isc-dhcp (Ubuntu):Confirmed> <https://launchpad.net/bugs/1403955>
[15:37] <dimitern> jam, there's the fix btw ^^ if you can give it a try and see if you still reproduce the issue that'll be great
[16:32] <sinzui> dimitern, which milestone should bug 1428439 be on
[16:32] <mup> Bug #1428439: retry-provisioning launches instances for containers; cannot retry containers at all <juju-core:Triaged> <https://launchpad.net/bugs/1428439>
[16:35] <sinzui> ericsnow, do you think we need a two vivid slaves with systemd and upstart to be certain juju works with both configurations?
[16:37] <mup> Bug #1403955 was opened: DHCP's "Option interface-mtu 9000" is being ignored on bridge interface br0 <cts> <kvm> <lxc> <network> <juju-core:Triaged> <isc-dhcp (Ubuntu):Confirmed> <https://launchpad.net/bugs/1403955>
[16:37] <ericsnow> sinzui: not really, considering that vivid operates under upstart only in "old" pre-release images and the official releases will all be systemd
[16:37] <dimitern> sinzui, 1.23-beta1 is ok I think
[16:38] <ericsnow> sinzui: then again, but it also depends on if upstart will be officially supported as the booted init systemd on vivid
[16:38] <sinzui> ericsnow, what about people upgrading from utopic. they don't switch unless Ubuntu make further packaging changes
[16:38] <ericsnow> sinzui: I doubt it will be, but if it is then 2 slaves may make sense
[16:38] <ericsnow> sinzui: good point
[16:39] <sinzui> ericsnow, I am still unsure. maybe I should just take the time and have two slaves. I can turn one off later
[16:39] <ericsnow> sinzui: so does that mean upstart will be supported up to the next LTS?
[16:39] <ericsnow> sinzui: sounds good
[16:39] <sinzui> ericsnow, It might be required as a legacy.
[16:39] <ericsnow> sinzui: it should continue to work under upstart regardless
[16:40] <sinzui> ericsnow, yeah, and the current setup proves that. I want to be sure it still works if Ubuntu about-face
[16:40] <ericsnow> sinzui: but it does mean having 2 slaves for each release starting with vivid, no?
[16:41] <ericsnow> sinzui: there *is* a contingency plan for dropping systemd in vivid, so good thinking
[16:42] <sinzui> ericsnow, maybe? its about Ubuntu making systemd-sysv a requirement
[17:44] <ericsnow> sinzui: do we use feature flags in CI?
[17:44] <sinzui> ericsnow, no
[17:45] <sinzui> ericsnow, are the set in environments.yaml?
[17:47] <ericsnow> sinzui: I believe it's from environment variables
[17:48] <sinzui> ericsnow, it it works from shell env vars, I think we can reconfigure jobs as needed. If it comes from the juju env, we need to make config and or code changes to use them
[17:48] <ericsnow> sinzui: if we run a vivid slave with upstart then it may make sense to use a feature flag to indicate upstart-on-vivid
[17:49] <sinzui> ericsnow, understood
[17:49] <ericsnow> sinzui: there's a fallback case for init system discovery that hard-codes vivid (and subsequent releases) to systemd
[17:49] <sinzui> ericsnow, I am now at the point of wondering what will happen when I try to provision a new slave. The charm/package uses upstart
[17:50] <sinzui> ericsnow, if there is no legacy upstart support, the slave wont run without me writing a systemd script
[17:50] <ericsnow> sinzui: yeah, we'll see :(
[17:51] <ericsnow> sinzui: there is some accommodation for that, but I don't know to what degree (they were still discussing it a month+ ago)
[17:52] <ericsnow> sinzui: I'll try to find out
[17:52] <ericsnow> sinzui: can you tell if the latest vivid image is booting systemd?
[17:53] <sinzui> ericsnow, I haven't looked
[17:53] <ericsnow> sinzui: k
[17:55] <sinzui> ericsnow, since clouds don't support ubuntu devel images, nor does the current juju support systemd. I need to build this by hand
[17:55] <ericsnow> sinzui: ah, makes sense
[17:55] <ericsnow> sinzui: it's like that for each new series, huh?
[17:56] <sinzui> I have done this before. it just take longer for me to type and upload things that juju and charms
[17:57] <sinzui> ericsnow, I usually wait for an official release of juju after a series opens. that gives me a juju and an agent. then I provision the previous series in a cloud, dist-upgrade, the manually provision. that is how we got the trusty, utopic, and current vivid
[18:13] <gsamfira> any chance I can get a review on this: http://reviews.vapour.ws/r/1119
[18:13] <gsamfira> so we can start testing on windows? :)
[18:17] <sinzui> ericsnow, I am having a misadventure on the current vivid-slave. lxc blewup. I am watching a case were we upgrade from stable juju to unstable, which might be upset because 1.21.3 doesn't know about systemd. I will report real details in a bit
[18:17] <ericsnow> sinzui: k
[18:22] <alexisb> http://www.opencompute.org/community/events/summit/ocp-us-summit-2015-live-streaming
[18:22] <alexisb> thanks kwmonroe for the pointer ^^
[18:22] <alexisb> perrito666, this is our OCP demo
[18:23]  * gsamfira is going through  second bowl of popcorn
[18:24] <alexisb> gsamfira, :)
[18:24] <sinzui> ericsnow, looks like lxc template creation is broken. 1.21.3 cannot create a template with the current vivid lxc images. I am going to clean, upgrade, then retest that unstable juju loves the new image. Then try the upgrade again
[18:24] <alexisb> shouldnt you be asleep
[18:24] <ericsnow> sinzui: right
[18:24] <gsamfira> alexisb: not yet. Its 20:24 here :D
[18:24] <gsamfira> and I actually grabbed a few hours last night :)
[18:25] <ericsnow> sinzui: we send upstart-specific commands to cloud-init and in the LXC clonetemplate code
[18:25] <sinzui> ericsnow, I believe we will not be counting the upgrade test until 1.23.0 is released
[18:25] <ericsnow> sinzui: so if vivid is booting off systemd now then you'll see that sort of failure
[18:25] <sinzui> ericsnow, thank you. that predicts my question
[18:26] <ericsnow> sinzui: can you check PID 1?
[18:26] <sinzui> ericsnow, I cannot even get into the container. it is a zombie
[18:27] <ericsnow> sinzui: yuck
[18:27] <sinzui> I am kill lxc procs because the controls are useless
[18:27] <ericsnow> sinzui: not even with lxc-console?
[18:27] <sinzui> it is stalled
[18:27] <ericsnow> sinzui: bummer
[18:27] <ericsnow> sinzui: oh
[18:28] <ericsnow> sinzui: there's a known issue here
[18:28] <ericsnow> sinzui: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1347020
[18:28] <mup> Bug #1347020: systemd does not boot in a container <systemd-boot> <lxc (Ubuntu):Fix Released by stgraber> <lxc (Ubuntu Trusty):Triaged by stgraber> <https://launchpad.net/bugs/1347020>
[18:28] <sinzui> ericsnow, that looks similar
[18:29] <ericsnow> sinzui: basically, vivid with systemd in a container on a non-vivid host won't work
[18:29] <alexisb> with natefinch out is there someone who can review this for cloudbase (aka gsamfira):
[18:29] <alexisb> https://github.com/juju/juju/pull/1791
[18:29] <alexisb> it is fixing our windows tests that will soon have the power to block trunk
[18:30] <ericsnow> sinzui: well, it works on trusty and utopic if you update a few things from e.g. "trusty-updates"
[18:31] <ericsnow> sinzui: I expect it won't work on precise
[18:32] <hazmat> anybody who wants to help a user debug while local provider is broken on vivid should join #juju and talk to lamont
[18:32] <hazmat> s/while/why
[18:40] <sinzui> ericsnow, the comedy gets better. The jenkins upstart job got removed. I am now committed to rebuilding a slave that meets your needs. I am going to disable all the vivid jobs until the replacement is ready
[18:44] <ericsnow> sinzui: okay
[18:46] <cmars> gsamfira, i'll take a look at it.
[18:46] <alexisb> cmars, thank you!
[18:52] <gsamfira> cmars: thank you! :)
[18:59] <ericsnow> sinzui: I'm going to land the patch that specifies that vivid uses systemd
[19:00] <voidspace> g'night all
[19:00] <ericsnow> sinzui: then write a follow up patch that adds a feature flag for the vivid-on-upstart case
[19:00] <sinzui> ericsnow, I suppose you should as it clearly doesn't want me using upstart any more
[19:01] <ericsnow> sinzui: also, the ubuntu folks verified that vivid did indeed switch yesterday
[19:21] <ericsnow> sinzui: I've landed the vivid-is-systemd patch so vivid should be able to work now (LXC host issues aside)
[19:22] <sinzui> ericsnow, thank you.
[19:58] <cmars> gsamfira, reviewed
[19:59] <thumper> morning
[20:00] <ericsnow> sinzui: upstart feature-flag patch: http://reviews.vapour.ws/r/1121/
[20:00] <sinzui> fab
[20:01] <ericsnow> sinzui: that should help in the case we have a bot running upstart
[20:02] <sinzui> ericsnow, I am writing a systemd script for jenkins. I wonder how many charms will need to change
[20:04] <alexisb> morning thumper
[20:07] <ericsnow> sinzui: well, since charms are series-specific the impact is explicitly limited--vivid charms will have to make sure they work under systemd
[20:07] <ericsnow> sinzui: how many charms will have to adjust?  good question
[20:08] <sinzui> ericsnow, yeah. We mark our charms because we need them the day the series is born
[20:14] <thumper> cherylj: if you want to start our 1:1 early, I'm fine with that
[20:16] <cherylj> thumper: sure, I'll join the hangout again
[20:58] <perrito666> now, that disconnect button is too big
[21:09] <thumper> cmars: anything you want to catch up with today?
[21:10] <mup> Bug #1420049 was opened: ppc64el - jujud: Syntax error: word unexpected (expecting ")") <deploy> <openstack> <regression> <uosci> <juju-core:In Progress by axwalk> <juju-core 1.22:Fix Released by axwalk> <https://launchpad.net/bugs/1420049>
[21:52] <ericsnow> thumper: thanks for taking a look at that vivid-LXC issue
[22:18] <menn0> thumper, davecheney: https://github.com/juju/utils/pull/115
[22:23]  * thumper looks
[22:24] <ericsnow> menn0: FYI, the github-RB hooks work for the utils repo too
[22:25] <menn0> ericsnow: yep I saw that. good stuff.
[22:26] <ericsnow> menn0: I've been meaning to do it for the other repos but need round tuits
[22:33] <thumper> menn0: review done
[22:34]  * thumper hands ericsnow a round tuit
[22:35]  * jw4 learned something new today - round tuits - :)
[22:36] <mgz> not to be confused with square tuits
[22:36] <jw4> haha
[22:37] <perrito666> you both just made me google
[22:38] <perrito666> uff, finally all tests pass
[22:46] <jw4> perrito666, I blame it on ericsnow ... I was 50 years behind
[22:46] <ericsnow> jw4: :)
[22:47] <perrito666> ok, EOD
[22:47] <perrito666> cheers
[22:47] <jw4> bye perrito666
[23:07] <jjox> hazmat: hi there o/ - fyc -> https://code.launchpad.net/~jjo/juju-deployer/fix-juju-deployer-diff-for-multiple-relations-between-servicepair/+merge/252271
[23:24] <hazmat> jjox: solid
[23:25] <hazmat> jjox: did you see my comments on the bug... much of this can be simplified by simply mocking juju
[23:31] <jjox> hazmat: ah, hm ... hadn't - will peek tomorrow
[23:47] <wallyworld> axw: just started looking at the local state PR. is there a reason not to reuse stuff from uniter/operation/state.go?