[00:08] beisner: i haven't looked sorry, thumper is on it afaik [00:20] wwitzel3: yes [00:21] wallyworld, beisner: I'll look this afternoon, just need to talk with menn0 [00:21] and write an email or two [00:21] * thumper looks at bug now [00:21] to start the thinking process [00:21] thumper, awesome, much thanks. [00:22] thumper, i've got a repro underway, which basically bootstraps and destroys in a loop, 1.24.6 & openstack provider. --debug enabled, will capture and add to bug if i can catch it that way. [00:22] beisner: awesome [00:23] beisner: does it happen every time? [00:23] or just some times? [00:26] thumper, i've seen it > 5 times today in test automation. that test ran 38 cycles. [00:26] hmm... interesting [00:27] beisner: and every time it is deleted at the end even though the warning says it wasn't [00:27] ? [00:27] thumper, pure conjecture: it seems that the opportunity for that to race has always been present, and that something got better/faster. [00:27] exposing it more frequently [00:27] * thumper nods [00:27] sleeps for the win? [00:27] but yes, 100% of destroys result in the "couldn't delete that thing" msg [00:28] oh... [00:28] so the warning is always there, what was it that happened > 5 times? [00:29] i don't control timing of amulet. say it gets 10 jobs to run. it bootstraps, deploys, execs tests, destroys, bootstraps, deploys, rinse and repeat. [00:29] oh to clarify: failed to bootstrap 5. all 38 complain that they couldn't delete sec groups (but always have) [00:30] the failure to bootstrap is what? [00:30] is this the message you are trying to capture? [00:30] no it's what i already logged in the bug [00:30] what i'm trying to capture is the --debug output [00:32] this is something of a well-known issue with the destroy code [00:32] zomg how can be local provider so easy to break :( [00:32] mgz: well known by whom? [00:32] not me [00:32] mgz - yep. the failing to create sec group is new. [00:32] we have a bunch of mitigation in the form of post-destroy cleanup [00:32] thumper: the bug is from 2014-06 [00:33] mgz, until 1.24.6 it was just an annoyance. now, it fails to delete Foo, then tries to create Foo, and fails to bootstrap, saying it couldn't create Foo. [00:33] mgz: well that isn't particularly useful to us now... [00:33] and anyone who does destroy-env immediately followed by bootstrap the same env on openstack will have seen it [00:33] now we just look imcompetant [00:34] thumper: so, the only realy way to fix it is make destroy-environment take much longer [00:34] sure [00:34] which is the right thing surely [00:34] make sure the freaking thing is dead [00:34] cloud providers will frequently refuse to destroy resources that are associated with other resources in the process of being destroyed [00:35] * thumper grumbles [00:35] so, kill a machine, you have to wait for some ammount of time before it will let you delete the groups that were attatched to it [00:35] likewise block devices and so on [00:36] one thing that is possible with openstack, and I think the new ec2 vpc sec groups, is remove the groups from the machines before killing the machines [00:36] that way you can reliably wipe them straight away [00:36] is a bunch more api calls though [00:37] the other option is something more like what CI does to get juju reliable, which is before bootstrap, basically destroy-environment --force [00:37] that's less elegant [00:41] thumper: I guess we really want a different bug for beisner's issue, which is certainly a newer thing [00:44] oh neat. my bootstrap/destroy loop yielded something different: http://paste.ubuntu.com/12620988/ [00:44] beisner: I know this is going to be annoying as you need the destroy cleanup race error first, but any idea if this started in a particular 1.24 minor version? [00:45] mgz i believe 1.24.5 was solid [00:45] would have to do some log digging to prove/disprove that observation though [00:46] beisner: bug 1467331 bug 1500613 [00:46] Bug #1467331: configstore lock should use flock where possible

[00:46] Bug #1500613: configstore should break fslock if time > few seconds

[00:47] ok so that repro is simple. loop a deploy/bootstrap. took 8 iterations to hit that. [00:47] errr em. rather, a bootstrap/destroy loop [00:48] beisner: http://reports.vapour.ws/releases/rule/34 for us hitting that in ci [00:49] * beisner wanders off, to return in a bit [00:49] bug 1454323 is marked fixed but that was just to make the error less terrible and the followups are what I linked above [00:49] Bug #1454323: Mysterious env.lock held message

[00:51] thumper: so, I don't think the juju code around adopting existing security groups with the same name has actually changed, [00:52] see ensureGroup in provider/openstack/provider.go [00:55] however, I think we hit the bad case of trying to create a group which is in the process of being deleted much more often with our storage code, and changes in newer openstacks [01:34] mgz, thumper - added accidental findings to bug 1500613. after hitting that lock issue, my enviro is borked. how do i unlock? ;-) [01:34] Bug #1500613: configstore should break fslock if time > few seconds

[01:35] beisner: just delete the lock [01:35] am i supposed to know where it is? [01:36] oh look there. it tells me. ha [01:36] :P [01:37] so, not sure i can reliably catch the secgroup thing with this hopping out front so readily. [01:38] beisner: you can just rm -rf the lock location in between each run [01:38] not if i'm using another runner, such as bundletester or amulet [01:38] oh you mean in the repro, yes i can [01:39] yup [01:58] beisner: I'm kinda surprised at how often this lock file problem is occurring [01:59] it should just work and delete the file [01:59] really weird that it isn't [01:59] time to go make a coffee and look at this bug [02:03] mgz, thumper, thanks. i've got the repro looping for the secgroup race. must sleep now. [02:03] beisner: ack, thanks [02:28] mgz, thumper - successfully repro'd bug 1335885 with the same loop, added new comment. now i'm really closing my screen. thx again. [02:28] Bug #1335885: destroy-environment reports WARNING cannot delete security group

[02:28] beisner: thanks again [02:29] thumper, yw, happy to help chase it. [02:29] \o [02:29] o/ [02:39] Bug #1303787 changed: hook failures - nil pointer dereference

[02:56] * thumper afk for a family thing [02:56] will be back to finish bug [03:10] thumper: thanks for that email [04:04] axw: small review please https://github.com/juju/charm/pull/159 [04:21] wallyworld: looking [04:37] axw: thanks for review, any idea for name? i don't like it either [04:38] SeriesForCharm maybe [04:38] wallyworld: *shrug* SelectSeries? not much more informative [04:38] wallyworld: sounds fine [04:38] ok, ta [04:39] wallyworld: BTW, my point (regarding "any", "default", etc.) is that this function is not directly attached to the charm metadata. so the user has to ensure the order of supported series is maintained [04:40] wallyworld: which is why I'm saying not to use "any" when it's really "the first item" [04:40] (if that's true) [04:40] ok, i'll reword [04:40] it is the first [05:39] Bug #1501173 opened: apiserver/common/storagecommon: StorageAttachmentInfo returns without error even if block device doesn't exist

[05:51] Bug #1501173 changed: apiserver/common/storagecommon: StorageAttachmentInfo returns without error even if block device doesn't exist

[05:54] Bug #1501173 opened: apiserver/common/storagecommon: StorageAttachmentInfo returns without error even if block device doesn't exist

[05:57] wallyworld, axw, anastasiamac: http://reviews.vapour.ws/r/2789/ [05:58] I'd like to build a version to make available for these folks to try with [05:58] to see if it does actually help [06:01] thumper: reviewed [06:01] ta [06:02] axw: yes, I'm wanting to get it live tested first [06:02] though observing things, it appears that what happens is this: [06:02] try to terminate all the machines [06:02] emits warning saying security group in use [06:03] finishes destroy, deletion of group works [06:03] so the end result is that the user is warned that it couldn't be deleted, but it has gone [06:03] alternatively it warns again, and doesn't delete it, next bootstrap fails [06:03] but yes, I want to test it prior to landing [06:04] as I'm taking a wild stab at the numbers [06:04] thumper: sure, sounds fine [06:04] * thumper writes that on review board too :) [06:15] ok, I'm done [06:15] laters folks [07:35] wallyworld: http://www.theguardian.com/travel/2013/may/25/top-10-live-music-venues-seattle :) [07:36] :-) [07:58] Bug #1501203 opened: apiserver/storage/storagecommon: WatchStorageAttachment should filter block devices

=== akhavr1 is now known as akhavr [08:57] dimitern: ping [08:57] voidspace, pong [08:59] dimitern: in environments.yaml I have an environment called "amazon-eu" which is type "ec2" and region "eu-central-1" [08:59] dimitern: yet when I bootstrap that environment I get a bootstrap machine in us-east-1 [08:59] hmmm... it might be a yaml indentation issue [08:59] dammit [08:59] voidspace, check if you have EC2_REGION set in the env [09:00] dimitern: will do, thanks [09:01] voidspace, or EC2_URL [09:02] hmm, HO dislikes me [09:02] dimitern: that's set to: https://ec2-lcy01.canonistack.canonical.com:443/services/Cloud [09:02] :-) [09:03] jam, fwereade: joining standup today? [09:03] omw [09:27] dimitern: dooferlad: gah, Subnets bug is on 1.25 as well as master [09:27] better retarget the work I'm doing and fix it in both places [09:28] voidspace, the addressable containers instId thing? [09:32] dimitern: yeah [09:32] I assumed it was just master, should have checked... [09:33] hah [09:33] the bug even says both [09:33] so it's a reading comprehension failure too... :-) [09:34] :) [09:35] fwereade: can you please have a glance at https://github.com/juju/juju/compare/master...axw:lp1500769-gce-default-block-source, and let me know if you're ok with this before I go much further? [09:36] axw: o/ [09:36] axw: morning :-) [09:36] fwereade: basically, I'm sick of using Validate to upgrade config [09:36] voidspace: hiya, how's it? [09:36] axw: all is well, how's you? [09:36] voidspace: not too shabby. furious bug fixing before demo time at the sprint :) [09:38] axw: heh, right [09:38] axw: pretty much what our team is on as well... [09:42] axw, ack [09:46] axw, looks eminently sane to me [09:47] axw, thanks [10:06] fwereade: got a minute? [10:18] fwereade: thanks [10:21] juju bootstrap for amazon reports the following: https://ec2.us-east-1.amazonaws.com?Action=DescribeInstances&Filter.1.Name=instance-state-name&Filter.1.Value.1=pending&Filter.1.Value.2=running&Filter.2.Name=instance.group-id&Filter.2.Value.1=sg-05ae1a61&Timestamp=2015-09-30T10%3A18%3A37Z&Version=2014-10-01 [10:21] any ideas anyone? ^ [10:21] fwereade: gonna be out for a bit, but looking for tips on how to run workers during jujuconnsuite tests, since unit assignment is done in a worker now, a ton of tests fail due to units not getting assigned. === natefinch is now known as natefinch-afk [10:22] ashipika: is there an error missing from that line? [10:22] sorry.. copy&paste mistake… here's the error message: ERROR failed to bootstrap environment: cannot start bootstrap instance: Get https://ec2.us-east-1.amazonaws.com?Action=DescribeInstances&Filter.1.Name=instance-state-name&Filter.1.Value.1=pending&Filter.1.Value.2=running&Filter.2.Name=instance.group-id&Filter.2.Value.1=sg-05ae1a61&Timestamp=2015-09-30T10%3A18%3A37Z&Version=2014-10-01: dial tcp: lookup ec2.us-east-1.amazonaws.com on 1 [10:22] axw ^ [10:22] hrm [10:23] axw: latest master.. go 1.5.1 [10:24] ashipika: looks like it's due to tagging [10:24] ashipika: that command should be retried though ... [10:24] axw: tagging? [10:24] ashipika: we tag the instance and its root disk after it starts [10:24] (can't do it while starting, which seems a bit brain dead) [10:25] axw: https://pastebin.canonical.com/140850/ [10:26] axw: with —debug: https://pastebin.canonical.com/140853/ [10:26] ashipika: erm actually that just looks like a host resolution error. can't tell more than that [10:27] axw: rebooting.. who knows.. might help [10:32] axw: did not help [10:34] ashipika: don't really know. it's attempting to resolve through DNS on localhost, is that intentional? [10:34] "on 127.0.1.1:53" [10:34] axw: i know… i saw that.. but cannot explain it [10:36] ashipika: don't know, sorry [10:53] ashipika, ping ec2.us-east-1.amazonaws.com [10:57] tasdomas: yes, fails.. switched to eu-west-1 and it seems to be working [10:58] ashipika, but what does it resolve to? [10:58] tasdomas: something must have messed up my resolve.conf, or sth [11:06] this PR adds macaroon authorization to the charms endpoint, and continues with some cleanup of the apiserver package too. reviews much appreciated, thanks! http://reviews.vapour.ws/r/2794/ [12:15] wallyworld: i've reviewed https://github.com/juju/charmrepo/pull/32 [12:15] ty [12:20] rogpeppe: i'm tired now and want to keep hacking on the juju side of things for a bit, but will come back to the charmrepo stuff tomorrow, thanks for looking [12:20] wallyworld: ok, cool [12:21] rogpeppe: one thing - name in meta doesn't have to be same as directory [12:21] so i'm not sure about yuor comment [12:21] wallyworld: yeah, but it's very confusing if it's not [12:21] hmm, ok, i habe test charms i have written where it doesn't match, so i guess i'm used to it [12:33] dimitern, did you mention this morning that you got the spaces demo to work without having to have a public ip address? (Or perhaps I misheard you.) [12:34] frobware, yes, eventually - initially the machines in the subnets without auto-public-ip set were "pending", because they didn't manage to download some packages (no outbound access, just dns works) [12:35] dimitern, aha. that's what I see. [12:35] frobware, so I presume after apt-get retried 30 or so times it gave up and cloud-init finished OK [12:35] dimitern, so did you flick the switch for auto-public-ip on the subnets? [12:36] dimitern, I'm not seeing a timeout though. machine state still in allocating. [12:36] frobware, no, but even if I did the flag is only honored when starting instances - not after they're running [12:36] frobware, is the instance running in the EC2 UI? [12:37] dimitern, I was trying on my local account. HO? [12:37] dimitern, yes instance is running [12:38] frobware, it might take 30m or so for apt-get retry script to give up I guess - I waited at least 30m with no change, but in the morning all machines shouled up as started [12:39] frobware, and it "worked" I guess just because I was deploying the ubuntu charm (which was pre-fetched by the apiserver and then the isolated machine got it from there - as usual), which doesn't need anything from the internet - wordpress I suspect won't work [12:39] dimitern, so in the real world how is this supposed to / going to work? service in the "private" subnet will need access on provisioning, installing packages, et al [12:40] dimitern, so I was deploying the ubuntu charm, like we were doing yesterday. [12:41] frobware, in the real world we can do things like setting up squid-deb-proxy for apt + another proxy + nat + forwarding etc. on machine 0 (or another "public" machine) [12:42] frobware, the ubuntu charm is useful only for really simple tests - for more "real-world-like" tests, we need charms like in that bundle - scalable, with relations, config, etc. [12:42] * dimitern needs to eat something - bbiab [12:44] * frobware also needs to eat something too. [13:33] dimitern, when bootstrapping a node with two NICs is it possible to configure which NIC gets selected? [13:58] frobware: no [13:58] frobware: that's why we need spaces [13:58] voidspace, :) [13:58] seriously :-) [13:58] voidspace, ok ok ook okkkkk [13:59] voidspace, I'm sold! [13:59] frobware: hah :-) [13:59] voidspace, I manually provisioned a machine with two NICs [13:59] frobware: right [13:59] voidspace, sent 'bootstrap-host: 10.17.17.117' in my environment.yaml [14:00] voidspace, then bootstrapped. Which indicates that the dns-name=10.17.17.117 [14:00] frobware: so it's at least using the address you gave it [14:00] voidspace, however, both mongod and jujud are listening on all interfaces [14:00] right [14:00] voidspace, http://pastebin.ubuntu.com/12624701/ [14:01] voidspace, whereas I was trying to coerce it to listen on the single NIC only. [14:02] frobware: yep [14:02] voidspace, OK answers my questions. thanks [14:02] frobware: not possible at the moment with a vanilla install [14:02] frobware, you mean in maas? [14:02] dimitern, yes [14:02] AFAIK anyway... [14:02] dimitern, well, no just maas [14:05] frobware, yeah - voidspace is correct actually :) [14:05] it does happen sometimes [14:05] frobware, one of the many goals of the model is giving you this sort of flexibility, while hiding the gruesome details :) [14:05] dimitern: frobware: I'm just bootstrapping an EC2 environment with my fix in place (for the ec2 Subnets issue) to see if it actually works... [14:05] it should do... [14:06] dimitern: btw, just recognized it. we still document the networks constraint as we still support this constraint. but shall I already remove it from the constraints documentation? [14:09] wwitzel3: ping? [14:09] cherylj: heya, in standup [14:09] wwitzel3: kk [14:12] TheMue, I think so, it should be dropped from the docs (as we're on that stage) and later from the code as well (I'm not too worried about this now) [14:12] well, the error is no longer in the logs - but the container still has a 10.0 address [14:12] dimitern: yep, feels better to me so too, thx [14:13] voidspace, with the address-allocation feature flag set? [14:13] I thought so... [14:14] godammit [14:14] must be a different shell window [14:14] *sigh* [14:17] cherylj: ping [14:19] wwitzel3: I heard that you've got experience using virtual MAAS? [14:19] cherylj: yep [14:19] wwitzel3: is there documentation somewhere on how to set that up? What I've found seems to be out of date [14:20] cherylj: did you sacrafice a chicken? first step ;) [14:20] cherylj: yeah, one sec, I used the videos that Kirkland made, and they worked well for me [14:20] wwitzel3: no, no chicken. I've got some pigeons around here. Will that work? === natefinch-afk is now known as natefinch [14:21] cherylj: sorry, it was beisner who made them [14:21] cherylj: https://www.youtube.com/playlist?list=PLvn2jxYHUxFlxNmc1dAbw524aoPmHxNpC [14:22] wwitzel3: yay, thank you! [14:22] cherylj: I've refered to them a few times, I just follow his steps and it has always worked [14:22] cherylj: gl [14:22] wwitzel3: thank you :) [14:25] cherylj: just remember, wwitzel3 said it'll be really easy with absolutely no problems. [14:25] natefinch: so long as I remember the chicken [14:25] that's the key [14:26] cherylj: that must have been what I forgot when I was trying to do it at the sprint in Germany. [14:26] never did get it working [14:29] frobware, voidspace, I have a patched version of the gui which works and deploys the slightly modified bundle and respects spaces constraints! [14:30] (writing down all the steps and will send them later) [14:38] dimitern: awesome [14:45] dimitern: great, sounds cool [14:51] dimitern: yay, it worked this time [14:51] dimitern: it's done properly (subnetIds also honours as well as instId) - just needs some tests [14:55] voidspace, you're the man! :) great [14:57] How does one go about getting something backported to 1.24.x? i.e., this fixes juju with osx 10.11, which comes out today: https://github.com/juju/juju/pull/2969 [14:59] sinzui: heya, El Cap just went gold today, IMO we should probably send a mail to the list telling people they should be fine with 1.25.x [14:59] any issues you think I should bring up? [15:00] jcastro: 1.24.6 in homebrew is fine. I delievered the patch to them personally [15:00] I saw that, that's why I wanted to mention it [15:00] jcastro: 1.25 is a beta [15:01] sinzui: I was meaning more like "this is the last time you'll have to care about this, future juju versions won't break on your beta OS." [15:01] jcastro: I WONT say that unitl it is true [15:01] heh [15:01] ok, I can not say that then. [15:01] el capitan is hardoded in 1.25. I read the coew [15:11] Bug #1501381 opened: panic: cannot pass empty version to VersionSeries()

[15:17] mgz, ^^^ is this bug in master? or all branches [15:18] alexisb: master and feature branches off master [15:18] mgz, ok thanks [15:22] alexisb: clarified the bug [15:22] voidspace, dooferlad, TheMue, frobware, you should've all received demo prep instructions [15:23] dimitern: +1, great, thanks [15:23] alexisb: I'm not clear if it will only happen on maas, or if it's just our testing on maas that happens to hit this [15:27] dimitern, received, queued (and not quite read). :) [15:31] TheMue, frobware, cheers :) [15:31] * dimitern is outta here ;) [15:31] dimitern, thanks; great to see the demo coming along :) [15:32] frobware, yeah - I'm happy we won't be the only team not showing interesting stuff :D [15:35] katco: you had mentioned enabling worked for the lease feature tests.... where is that code? I can't find it [15:35] s/worked/workers/ [15:36] natefinch: let me tal [15:37] natefinch: err... looks like they were deleted? [15:37] natefinch: here: https://github.com/juju/juju/blob/1.22/featuretests/leadership_test.go [15:37] katco: lol, well, that explains why I couldn't find them :) [15:38] natefinch: don't forget to submit your sick leave [15:38] katco: oh yeah, I'll do that right now [15:50] Bug #1501398 opened: stateSuite setup fails on windows with WSARecv timeout

[15:58] voidspace, you still about? Regarding the multi-nic question from above: am I wrong in thinking that spaces should allow for: juju bootstrap --constraints mem,cpu,etc,spaces=my-network-with-nic-192.168.1.123 [16:01] hm, it's not possible to be in more than one hangout at once [16:02] mgz, ping [16:02] alexisb: omw [16:03] mgz, it's odd though - you would think computers should be good at multitasking. :) [16:04] apparently not :) [16:30] o/ hi mgz - fyi, i pulled thumper's binaries, re-ran loop, hit that bootstrap fail. updated @ bug 1335885 [16:30] Bug #1335885: destroy-environment reports WARNING cannot delete security group

[16:30]

[16:31] beisner, thanks for the update [16:31] beisner: thanks. [16:32] alexisb, mgz - yw. thx for the focus on this. [16:37] frobware: still around [16:37] frobware: that question would be better directed to dimiter I think, but I don't see why that shouldn't work [16:43] frobware: hmmm... although thinking about it [16:43] frobware: our implementation of spaces is at the "juju model" level - which requires the state server to be in place [16:44]