[00:08] beisner: i haven't looked sorry, thumper is on it afaik [00:20] wwitzel3: yes [00:21] wallyworld, beisner: I'll look this afternoon, just need to talk with menn0 [00:21] and write an email or two [00:21] * thumper looks at bug now [00:21] to start the thinking process [00:21] thumper, awesome, much thanks. [00:22] thumper, i've got a repro underway, which basically bootstraps and destroys in a loop, 1.24.6 & openstack provider. --debug enabled, will capture and add to bug if i can catch it that way. [00:22] beisner: awesome [00:23] beisner: does it happen every time? [00:23] or just some times? [00:26] thumper, i've seen it > 5 times today in test automation. that test ran 38 cycles. [00:26] hmm... interesting [00:27] beisner: and every time it is deleted at the end even though the warning says it wasn't [00:27] ? [00:27] thumper, pure conjecture: it seems that the opportunity for that to race has always been present, and that something got better/faster. [00:27] exposing it more frequently [00:27] * thumper nods [00:27] sleeps for the win? [00:27] but yes, 100% of destroys result in the "couldn't delete that thing" msg [00:28] oh... [00:28] so the warning is always there, what was it that happened > 5 times? [00:29] i don't control timing of amulet. say it gets 10 jobs to run. it bootstraps, deploys, execs tests, destroys, bootstraps, deploys, rinse and repeat. [00:29] oh to clarify: failed to bootstrap 5. all 38 complain that they couldn't delete sec groups (but always have) [00:30] the failure to bootstrap is what? [00:30] is this the message you are trying to capture? [00:30] no it's what i already logged in the bug [00:30] what i'm trying to capture is the --debug output [00:32] this is something of a well-known issue with the destroy code [00:32] zomg how can be local provider so easy to break :( [00:32] mgz: well known by whom? [00:32] not me [00:32] mgz - yep. the failing to create sec group is new. [00:32] we have a bunch of mitigation in the form of post-destroy cleanup [00:32] thumper: the bug is from 2014-06 [00:33] mgz, until 1.24.6 it was just an annoyance. now, it fails to delete Foo, then tries to create Foo, and fails to bootstrap, saying it couldn't create Foo. [00:33] mgz: well that isn't particularly useful to us now... [00:33] and anyone who does destroy-env immediately followed by bootstrap the same env on openstack will have seen it [00:33] now we just look imcompetant [00:34] thumper: so, the only realy way to fix it is make destroy-environment take much longer [00:34] sure [00:34] which is the right thing surely [00:34] make sure the freaking thing is dead [00:34] cloud providers will frequently refuse to destroy resources that are associated with other resources in the process of being destroyed [00:35] * thumper grumbles [00:35] so, kill a machine, you have to wait for some ammount of time before it will let you delete the groups that were attatched to it [00:35] likewise block devices and so on [00:36] one thing that is possible with openstack, and I think the new ec2 vpc sec groups, is remove the groups from the machines before killing the machines [00:36] that way you can reliably wipe them straight away [00:36] is a bunch more api calls though [00:37] the other option is something more like what CI does to get juju reliable, which is before bootstrap, basically destroy-environment --force [00:37] that's less elegant [00:41] thumper: I guess we really want a different bug for beisner's issue, which is certainly a newer thing [00:44] oh neat. my bootstrap/destroy loop yielded something different: http://paste.ubuntu.com/12620988/ [00:44] beisner: I know this is going to be annoying as you need the destroy cleanup race error first, but any idea if this started in a particular 1.24 minor version? [00:45] mgz i believe 1.24.5 was solid [00:45] would have to do some log digging to prove/disprove that observation though [00:46] beisner: bug 1467331 bug 1500613 [00:46] Bug #1467331: configstore lock should use flock where possible [00:46] Bug #1500613: configstore should break fslock if time > few seconds [00:47] ok so that repro is simple. loop a deploy/bootstrap. took 8 iterations to hit that. [00:47] errr em. rather, a bootstrap/destroy loop [00:48] beisner: http://reports.vapour.ws/releases/rule/34 for us hitting that in ci [00:49] * beisner wanders off, to return in a bit [00:49] bug 1454323 is marked fixed but that was just to make the error less terrible and the followups are what I linked above [00:49] Bug #1454323: Mysterious env.lock held message [00:51] thumper: so, I don't think the juju code around adopting existing security groups with the same name has actually changed, [00:52] see ensureGroup in provider/openstack/provider.go [00:55] however, I think we hit the bad case of trying to create a group which is in the process of being deleted much more often with our storage code, and changes in newer openstacks [01:34] mgz, thumper - added accidental findings to bug 1500613. after hitting that lock issue, my enviro is borked. how do i unlock? ;-) [01:34] Bug #1500613: configstore should break fslock if time > few seconds [01:35] beisner: just delete the lock [01:35] am i supposed to know where it is? [01:36] oh look there. it tells me. ha [01:36] :P [01:37] so, not sure i can reliably catch the secgroup thing with this hopping out front so readily. [01:38] beisner: you can just rm -rf the lock location in between each run [01:38] not if i'm using another runner, such as bundletester or amulet [01:38] oh you mean in the repro, yes i can [01:39] yup [01:58] beisner: I'm kinda surprised at how often this lock file problem is occurring [01:59] it should just work and delete the file [01:59] really weird that it isn't [01:59] time to go make a coffee and look at this bug [02:03] mgz, thumper, thanks. i've got the repro looping for the secgroup race. must sleep now. [02:03] beisner: ack, thanks [02:28] mgz, thumper - successfully repro'd bug 1335885 with the same loop, added new comment. now i'm really closing my screen. thx again. [02:28] Bug #1335885: destroy-environment reports WARNING cannot delete security group [02:28] beisner: thanks again [02:29] thumper, yw, happy to help chase it. [02:29] \o [02:29] o/ [02:39] Bug #1303787 changed: hook failures - nil pointer dereference [02:56] * thumper afk for a family thing [02:56] will be back to finish bug [03:10] thumper: thanks for that email [04:04] axw: small review please https://github.com/juju/charm/pull/159 [04:21] wallyworld: looking [04:37] axw: thanks for review, any idea for name? i don't like it either [04:38] SeriesForCharm maybe [04:38] wallyworld: *shrug* SelectSeries? not much more informative [04:38] wallyworld: sounds fine [04:38] ok, ta [04:39] wallyworld: BTW, my point (regarding "any", "default", etc.) is that this function is not directly attached to the charm metadata. so the user has to ensure the order of supported series is maintained [04:40] wallyworld: which is why I'm saying not to use "any" when it's really "the first item" [04:40] (if that's true) [04:40] ok, i'll reword [04:40] it is the first [05:39] Bug #1501173 opened: apiserver/common/storagecommon: StorageAttachmentInfo returns without error even if block device doesn't exist [05:51] Bug #1501173 changed: apiserver/common/storagecommon: StorageAttachmentInfo returns without error even if block device doesn't exist [05:54] Bug #1501173 opened: apiserver/common/storagecommon: StorageAttachmentInfo returns without error even if block device doesn't exist [05:57] wallyworld, axw, anastasiamac: http://reviews.vapour.ws/r/2789/ [05:58] I'd like to build a version to make available for these folks to try with [05:58] to see if it does actually help [06:01] thumper: reviewed [06:01] ta [06:02] axw: yes, I'm wanting to get it live tested first [06:02] though observing things, it appears that what happens is this: [06:02] try to terminate all the machines [06:02] emits warning saying security group in use [06:03] finishes destroy, deletion of group works [06:03] so the end result is that the user is warned that it couldn't be deleted, but it has gone [06:03] alternatively it warns again, and doesn't delete it, next bootstrap fails [06:03] but yes, I want to test it prior to landing [06:04] as I'm taking a wild stab at the numbers [06:04] thumper: sure, sounds fine [06:04] * thumper writes that on review board too :) [06:15] ok, I'm done [06:15] laters folks [07:35] wallyworld: http://www.theguardian.com/travel/2013/may/25/top-10-live-music-venues-seattle :) [07:36] :-) [07:58] Bug #1501203 opened: apiserver/storage/storagecommon: WatchStorageAttachment should filter block devices === akhavr1 is now known as akhavr [08:57] dimitern: ping [08:57] voidspace, pong [08:59] dimitern: in environments.yaml I have an environment called "amazon-eu" which is type "ec2" and region "eu-central-1" [08:59] dimitern: yet when I bootstrap that environment I get a bootstrap machine in us-east-1 [08:59] hmmm... it might be a yaml indentation issue [08:59] dammit [08:59] voidspace, check if you have EC2_REGION set in the env [09:00] dimitern: will do, thanks [09:01] voidspace, or EC2_URL [09:02] hmm, HO dislikes me [09:02] dimitern: that's set to: https://ec2-lcy01.canonistack.canonical.com:443/services/Cloud [09:02] :-) [09:03] jam, fwereade: joining standup today? [09:03] omw [09:27] dimitern: dooferlad: gah, Subnets bug is on 1.25 as well as master [09:27] better retarget the work I'm doing and fix it in both places [09:28] voidspace, the addressable containers instId thing? [09:32] dimitern: yeah [09:32] I assumed it was just master, should have checked... [09:33] hah [09:33] the bug even says both [09:33] so it's a reading comprehension failure too... :-) [09:34] :) [09:35] fwereade: can you please have a glance at https://github.com/juju/juju/compare/master...axw:lp1500769-gce-default-block-source, and let me know if you're ok with this before I go much further? [09:36] axw: o/ [09:36] axw: morning :-) [09:36] fwereade: basically, I'm sick of using Validate to upgrade config [09:36] voidspace: hiya, how's it? [09:36] axw: all is well, how's you? [09:36] voidspace: not too shabby. furious bug fixing before demo time at the sprint :) [09:38] axw: heh, right [09:38] axw: pretty much what our team is on as well... [09:42] axw, ack [09:46] axw, looks eminently sane to me [09:47] axw, thanks [10:06] fwereade: got a minute? [10:18] fwereade: thanks [10:21] juju bootstrap for amazon reports the following: https://ec2.us-east-1.amazonaws.com?Action=DescribeInstances&Filter.1.Name=instance-state-name&Filter.1.Value.1=pending&Filter.1.Value.2=running&Filter.2.Name=instance.group-id&Filter.2.Value.1=sg-05ae1a61&Timestamp=2015-09-30T10%3A18%3A37Z&Version=2014-10-01 [10:21] any ideas anyone? ^ [10:21] fwereade: gonna be out for a bit, but looking for tips on how to run workers during jujuconnsuite tests, since unit assignment is done in a worker now, a ton of tests fail due to units not getting assigned. === natefinch is now known as natefinch-afk [10:22] ashipika: is there an error missing from that line? [10:22] sorry.. copy&paste mistake… here's the error message: ERROR failed to bootstrap environment: cannot start bootstrap instance: Get https://ec2.us-east-1.amazonaws.com?Action=DescribeInstances&Filter.1.Name=instance-state-name&Filter.1.Value.1=pending&Filter.1.Value.2=running&Filter.2.Name=instance.group-id&Filter.2.Value.1=sg-05ae1a61&Timestamp=2015-09-30T10%3A18%3A37Z&Version=2014-10-01: dial tcp: lookup ec2.us-east-1.amazonaws.com on 1 [10:22] axw ^ [10:22] hrm [10:23] axw: latest master.. go 1.5.1 [10:24] ashipika: looks like it's due to tagging [10:24] ashipika: that command should be retried though ... [10:24] axw: tagging? [10:24] ashipika: we tag the instance and its root disk after it starts [10:24] (can't do it while starting, which seems a bit brain dead) [10:25] axw: https://pastebin.canonical.com/140850/ [10:26] axw: with —debug: https://pastebin.canonical.com/140853/ [10:26] ashipika: erm actually that just looks like a host resolution error. can't tell more than that [10:27] axw: rebooting.. who knows.. might help [10:32] axw: did not help [10:34] ashipika: don't really know. it's attempting to resolve through DNS on localhost, is that intentional? [10:34] "on 127.0.1.1:53" [10:34] axw: i know… i saw that.. but cannot explain it [10:36] ashipika: don't know, sorry [10:53] ashipika, ping ec2.us-east-1.amazonaws.com [10:57] tasdomas: yes, fails.. switched to eu-west-1 and it seems to be working [10:58] ashipika, but what does it resolve to? [10:58] tasdomas: something must have messed up my resolve.conf, or sth [11:06] this PR adds macaroon authorization to the charms endpoint, and continues with some cleanup of the apiserver package too. reviews much appreciated, thanks! http://reviews.vapour.ws/r/2794/ [12:15] wallyworld: i've reviewed https://github.com/juju/charmrepo/pull/32 [12:15] ty [12:20] rogpeppe: i'm tired now and want to keep hacking on the juju side of things for a bit, but will come back to the charmrepo stuff tomorrow, thanks for looking [12:20] wallyworld: ok, cool [12:21] rogpeppe: one thing - name in meta doesn't have to be same as directory [12:21] so i'm not sure about yuor comment [12:21] wallyworld: yeah, but it's very confusing if it's not [12:21] hmm, ok, i habe test charms i have written where it doesn't match, so i guess i'm used to it [12:33] dimitern, did you mention this morning that you got the spaces demo to work without having to have a public ip address? (Or perhaps I misheard you.) [12:34] frobware, yes, eventually - initially the machines in the subnets without auto-public-ip set were "pending", because they didn't manage to download some packages (no outbound access, just dns works) [12:35] dimitern, aha. that's what I see. [12:35] frobware, so I presume after apt-get retried 30 or so times it gave up and cloud-init finished OK [12:35] dimitern, so did you flick the switch for auto-public-ip on the subnets? [12:36] dimitern, I'm not seeing a timeout though. machine state still in allocating. [12:36] frobware, no, but even if I did the flag is only honored when starting instances - not after they're running [12:36] frobware, is the instance running in the EC2 UI? [12:37] dimitern, I was trying on my local account. HO? [12:37] dimitern, yes instance is running [12:38] frobware, it might take 30m or so for apt-get retry script to give up I guess - I waited at least 30m with no change, but in the morning all machines shouled up as started [12:39] frobware, and it "worked" I guess just because I was deploying the ubuntu charm (which was pre-fetched by the apiserver and then the isolated machine got it from there - as usual), which doesn't need anything from the internet - wordpress I suspect won't work [12:39] dimitern, so in the real world how is this supposed to / going to work? service in the "private" subnet will need access on provisioning, installing packages, et al [12:40] dimitern, so I was deploying the ubuntu charm, like we were doing yesterday. [12:41] frobware, in the real world we can do things like setting up squid-deb-proxy for apt + another proxy + nat + forwarding etc. on machine 0 (or another "public" machine) [12:42] frobware, the ubuntu charm is useful only for really simple tests - for more "real-world-like" tests, we need charms like in that bundle - scalable, with relations, config, etc. [12:42] * dimitern needs to eat something - bbiab [12:44] * frobware also needs to eat something too. [13:33] dimitern, when bootstrapping a node with two NICs is it possible to configure which NIC gets selected? [13:58] frobware: no [13:58] frobware: that's why we need spaces [13:58] voidspace, :) [13:58] seriously :-) [13:58] voidspace, ok ok ook okkkkk [13:59] voidspace, I'm sold! [13:59] frobware: hah :-) [13:59] voidspace, I manually provisioned a machine with two NICs [13:59] frobware: right [13:59] voidspace, sent 'bootstrap-host: 10.17.17.117' in my environment.yaml [14:00] voidspace, then bootstrapped. Which indicates that the dns-name=10.17.17.117 [14:00] frobware: so it's at least using the address you gave it [14:00] voidspace, however, both mongod and jujud are listening on all interfaces [14:00] right [14:00] voidspace, http://pastebin.ubuntu.com/12624701/ [14:01] voidspace, whereas I was trying to coerce it to listen on the single NIC only. [14:02] frobware: yep [14:02] voidspace, OK answers my questions. thanks [14:02] frobware: not possible at the moment with a vanilla install [14:02] frobware, you mean in maas? [14:02] dimitern, yes [14:02] AFAIK anyway... [14:02] dimitern, well, no just maas [14:05] frobware, yeah - voidspace is correct actually :) [14:05] it does happen sometimes [14:05] frobware, one of the many goals of the model is giving you this sort of flexibility, while hiding the gruesome details :) [14:05] dimitern: frobware: I'm just bootstrapping an EC2 environment with my fix in place (for the ec2 Subnets issue) to see if it actually works... [14:05] it should do... [14:06] dimitern: btw, just recognized it. we still document the networks constraint as we still support this constraint. but shall I already remove it from the constraints documentation? [14:09] wwitzel3: ping? [14:09] cherylj: heya, in standup [14:09] wwitzel3: kk [14:12] TheMue, I think so, it should be dropped from the docs (as we're on that stage) and later from the code as well (I'm not too worried about this now) [14:12] well, the error is no longer in the logs - but the container still has a 10.0 address [14:12] dimitern: yep, feels better to me so too, thx [14:13] voidspace, with the address-allocation feature flag set? [14:13] I thought so... [14:14] godammit [14:14] must be a different shell window [14:14] *sigh* [14:17] cherylj: ping [14:19] wwitzel3: I heard that you've got experience using virtual MAAS? [14:19] cherylj: yep [14:19] wwitzel3: is there documentation somewhere on how to set that up? What I've found seems to be out of date [14:20] cherylj: did you sacrafice a chicken? first step ;) [14:20] cherylj: yeah, one sec, I used the videos that Kirkland made, and they worked well for me [14:20] wwitzel3: no, no chicken. I've got some pigeons around here. Will that work? === natefinch-afk is now known as natefinch [14:21] cherylj: sorry, it was beisner who made them [14:21] cherylj: https://www.youtube.com/playlist?list=PLvn2jxYHUxFlxNmc1dAbw524aoPmHxNpC [14:22] wwitzel3: yay, thank you! [14:22] cherylj: I've refered to them a few times, I just follow his steps and it has always worked [14:22] cherylj: gl [14:22] wwitzel3: thank you :) [14:25] cherylj: just remember, wwitzel3 said it'll be really easy with absolutely no problems. [14:25] natefinch: so long as I remember the chicken [14:25] that's the key [14:26] cherylj: that must have been what I forgot when I was trying to do it at the sprint in Germany. [14:26] never did get it working [14:29] frobware, voidspace, I have a patched version of the gui which works and deploys the slightly modified bundle and respects spaces constraints! [14:30] (writing down all the steps and will send them later) [14:38] dimitern: awesome [14:45] dimitern: great, sounds cool [14:51] dimitern: yay, it worked this time [14:51] dimitern: it's done properly (subnetIds also honours as well as instId) - just needs some tests [14:55] voidspace, you're the man! :) great [14:57] How does one go about getting something backported to 1.24.x? i.e., this fixes juju with osx 10.11, which comes out today: https://github.com/juju/juju/pull/2969 [14:59] sinzui: heya, El Cap just went gold today, IMO we should probably send a mail to the list telling people they should be fine with 1.25.x [14:59] any issues you think I should bring up? [15:00] jcastro: 1.24.6 in homebrew is fine. I delievered the patch to them personally [15:00] I saw that, that's why I wanted to mention it [15:00] jcastro: 1.25 is a beta [15:01] sinzui: I was meaning more like "this is the last time you'll have to care about this, future juju versions won't break on your beta OS." [15:01] jcastro: I WONT say that unitl it is true [15:01] heh [15:01] ok, I can not say that then. [15:01] el capitan is hardoded in 1.25. I read the coew [15:11] Bug #1501381 opened: panic: cannot pass empty version to VersionSeries() [15:17] mgz, ^^^ is this bug in master? or all branches [15:18] alexisb: master and feature branches off master [15:18] mgz, ok thanks [15:22] alexisb: clarified the bug [15:22] voidspace, dooferlad, TheMue, frobware, you should've all received demo prep instructions [15:23] dimitern: +1, great, thanks [15:23] alexisb: I'm not clear if it will only happen on maas, or if it's just our testing on maas that happens to hit this [15:27] dimitern, received, queued (and not quite read). :) [15:31] TheMue, frobware, cheers :) [15:31] * dimitern is outta here ;) [15:31] dimitern, thanks; great to see the demo coming along :) [15:32] frobware, yeah - I'm happy we won't be the only team not showing interesting stuff :D [15:35] katco: you had mentioned enabling worked for the lease feature tests.... where is that code? I can't find it [15:35] s/worked/workers/ [15:36] natefinch: let me tal [15:37] natefinch: err... looks like they were deleted? [15:37] natefinch: here: https://github.com/juju/juju/blob/1.22/featuretests/leadership_test.go [15:37] katco: lol, well, that explains why I couldn't find them :) [15:38] natefinch: don't forget to submit your sick leave [15:38] katco: oh yeah, I'll do that right now [15:50] Bug #1501398 opened: stateSuite setup fails on windows with WSARecv timeout [15:58] voidspace, you still about? Regarding the multi-nic question from above: am I wrong in thinking that spaces should allow for: juju bootstrap --constraints mem,cpu,etc,spaces=my-network-with-nic-192.168.1.123 [16:01] hm, it's not possible to be in more than one hangout at once [16:02] mgz, ping [16:02] alexisb: omw [16:03] mgz, it's odd though - you would think computers should be good at multitasking. :) [16:04] apparently not :) [16:30] o/ hi mgz - fyi, i pulled thumper's binaries, re-ran loop, hit that bootstrap fail. updated @ bug 1335885 [16:30] Bug #1335885: destroy-environment reports WARNING cannot delete security group [16:30] [16:31] beisner, thanks for the update [16:31] beisner: thanks. [16:32] alexisb, mgz - yw. thx for the focus on this. [16:37] frobware: still around [16:37] frobware: that question would be better directed to dimiter I think, but I don't see why that shouldn't work [16:43] frobware: hmmm... although thinking about it [16:43] frobware: our implementation of spaces is at the "juju model" level - which requires the state server to be in place [16:44] frobware: so making it work at bootstrap time will require making the client "spaces aware" (i.e. able to discover spaces and resolve constraints) [16:44] frobware: so it isn't going to work initially, would require specific work [17:35] Bug #1500843 changed: Windows ftb due to unused import is diskmanager [17:35] Bug #1501432 opened: BootstrapSuite tests fail on non-ubuntu platforms with no matching tools [18:01] thanks for the quick review, cmars [18:10] cherylj, thanks for the bug fix [18:33] cherylj: where will that error propogate to exactly? [18:33] cherylj: I'm wondering if we're still not logging enough information to work out what the bad data actually is === cmars is now known as cmars_noodles [18:34] cherylj: (code change looks sensible regardless) [18:35] mgz: it will cause the image to be ignored when we update stored image metadata [18:35] mgz: I was thinking I should update the logging to indicate the ID of the ignored image [18:36] cherylj: sounds good to me - can be a seperate branch [18:36] mgz: I'm going to include it in the branch that updates dependencies.tsv [18:39] cherylj: one thing that comes to mind from what you've found so far, [18:39] our maas has a windows image which will obviously not have an ubuntu series [18:39] how that would cause panics some of the times but not others though I have no idea, so may be unrelated [18:40] mgz: it shouldn't. This panic was because we were trying to determine the version from a series of "" (empty string) [18:42] mgz: if there's some data in simple streams that just doesn't make sense, (like having nothing for the version), we should ignore it [18:43] erm, my previous comment should have been that we were trying to determine the series from an empty version [18:43] I had it backwards [18:50] Bug #918386 opened: config.yaml should have enum type [19:00] Bug #918386 changed: config.yaml should have enum type [19:02] arg.... I have a feeling the jujuconn tests are somehow mucking with the database in just the right way to break my worker [19:03] Bug #918386 opened: config.yaml should have enum type [19:06] Bug #918386 changed: config.yaml should have enum type [19:09] Bug #918386 opened: config.yaml should have enum type [19:12] my country for SOME of our code to have unit tests... you know, so they don't break when totally f'ing unrelated code is changed. [19:12] wow, mup, calm down, the bug isn't that important [19:18] ahh, hmm I think I got it. Interesting difference between a real environment and the test environment [19:30] Bug #1501475 opened: Status presents unnecessary MAAS API info for machines [19:42] why can't I bootstrap local as root? ERROR failed to bootstrap environment: bootstrapping a local environment must not be done as root [19:45] marcoceppi: I forget, but it messes up permissions of certain things, and probably puts things in the wrong directories. Why would you want to, anyway? [19:45] natefinch: its not like local wont do that for you anyway [19:45] natefinch: because I'm in an LXC container as root and I want to bootstrap as the root user [19:46] marcoceppi: can you bootstrap local inside an lxc container? [19:46] perrito666: well, I was going to find out (it's a LXD container, so should work) [19:47] famous last words [19:47] worst case scenario it doesnt' work [19:47] but stopping me because I'm root makes me sad [19:47] marcoceppi: looks like it doesn't ;) [19:47] marcoceppi: just adduser [19:48] I get that [19:48] but because of the way these mounts outside the system work I need to be root to access them anyways [19:48] marcoceppi: sudo? [19:49] but as a rule of thumb, any question matching with: "why .* local .*?" is answered with: because local provider sucks [19:50] +1 [19:50] perrito666: I know how to work around this, I'm saying it's silly that juju would stop me as the root user, it should try to detect sudo vs root to discourage old local provider behaviou [19:50] also "lol its local so deal with it" isn't really a great answer [19:50] furthermore, local bootstraps in a LXD container [19:51] can we get a LXD provider now plz [19:51] marcoceppi: it was more in the tone of an apology than a mockery === cmars_noodles is now known as cmars [19:51] I see [19:51] local provider is the number one cause of my screweing my work computer during the past year or so [19:53] marcoceppi: funny you should ask [19:53] marcoceppi: moonstone started work on an LXD provider as of today [19:53] perrito666: which is why I'm running it in a LXD container [19:54] natefinch: yes, 1000 times yes, I will happily test anything you throw at me [19:54] * natefinch screenshots for later [19:54] I stand by my assertion! ;) [19:55] marcoceppi: I could totally use a brief howto for what you are doing [19:56] perrito666: well, if I could juju bootstrap local as root the howto would be way easier :P [19:58] perrito666: I'll write a blog [19:58] marcoceppi: tx [20:00] beisner: that binary I created for you was me taking wild guesses at times, I'd like to tweak and get you to try again, keen? [20:00] Bug #1501490 opened: juju-local can't bootstrap as root user [20:01] thumper, indeed [20:01] thumper, i suspect 2s may not be enough, just based on observing nova compute, et al, after nova deleting an instance. [20:01] beisner: how long do you think we need? [20:02] thumper, i think it's variable, depending on the hardware, and load on that cloud [20:02] thumper, how do we handle similar needs with other providers? [20:03] ie. is there an existing max_wait / retry_interval approach in any other provider? [20:07] thumper, i'll do a little ditty on serverstack to see if i can measure timing [20:07] beisner: awesome [20:07] * thumper otp [20:12] beisner: we handle similar things in other clouds terribly IO [20:12] IMO [20:13] we should be treating many other cloud calls as retryable calls, but in most cases we don't [20:17] thumper, ah i see. so i think a max_wait and retry_sleep would work well. it's a matter of how long you're comfortable blocking on destroy. [20:18] beisner: you think having them configurable by config? [20:20] thumper, i'd aim for a resilient default. ie. say ... max_wait 30s, recheck every 1s or 2s. but hold the line, i'm about to have data. [20:20] ;-) [20:27] bootstrap: http://paste.ubuntu.com/12626772/ [20:27] destroy: http://paste.ubuntu.com/12626773/ [20:27] nova instance: http://paste.ubuntu.com/12626774/ [20:27] nova secgroup: http://paste.ubuntu.com/12626775/ [20:28] thumper, ^ checking and timestamping nova secgroups and nova instances as fast as apis will allow, while bootstrapping and destroying [20:28] * thumper looks [20:28] * beisner too [20:30] ok, so 2s is no where near enough [20:30] beisner: let me build you one with 30s max :) [20:30] thumper, a-ok. i'll put together a timeline from those ^ [20:31] copying files now [20:32] beisner: it appears to be as small as instant, but as large as 4s [20:32] I'm doing 30s max with 1s retry [20:32] *should* be solid enough [20:33] getting about 702.1KB/s up to chinstrap [20:34] beisner: the binaries are up, in the same place as before [20:37] thumper, timeline @ https://bugs.launchpad.net/juju-core/+bug/1335885/comments/17 [20:37] Bug #1335885: destroy-environment reports WARNING cannot delete security group [20:37] [20:37] thumper, ack, will pull bins [20:59] thumper, fyi 3 iterations in. seeing 3s, 11s, 5s between 'terminating instances' and 'command finished' ... going to let that run. i'm eod, but will prob check back in late evening. [21:00] beisner: ok, cool [21:00] thumper, thanks again! [22:34] wallyworld: before I merge this openstack retry branch [22:35] wallyworld: perhaps we should chat about exponential backoff? [22:37] thumper: ok, give me a minute [22:38] wallyworld: although, I'm tempted to land this and discuss the exponential backoff as part of a bigger picture provider retry system [22:38] as I'm starting with the 1.24 branch [22:38] sgtm [22:38] k [22:38] * thumper does that [22:38] thumper: storageprovision/schedule.go [22:38] is the storage solution [22:39] that we can discuss moving to utils [22:39] storageprovisioner/schedule.go i mean [22:41] ack [23:07] fuuuuuuuuuuuuuuuuuu. sick of blocked master