[01:00] wallyworld: https://github.com/juju/juju/pull/5462 doesn't actually have a bug, should I JFDI or leave it? [01:01] axw: um, create a bug i think [01:01] ok [01:05] Bug #1586217 opened: azure: bootstrapping prints out scary, spurious ERROR messages [01:06] axw: when you are free, a small one, then i can propose the charmrepo one https://github.com/juju/charm/pull/211 [01:18] is there prior art for parsing cmd line flags? e.g. timestamps? [01:22] axw: i tried a high level test but couldn't get the bson marshalling behaviour to work. i couldn't figure out the magic to get a nil result from unmarshal instead of an empty struct (or a marshalling error) so figured a test is better than nothing [01:26] i'll take another look [01:28] wallyworld: marshal a struct with a field of type BundleData, with omitempty on the field? [01:28] yeah, that may work [01:36] Bug #1581748 changed: [2.0b7] add-credential no longer supports maas as a cloud type [01:38] davechen1y: please see my reply on http://reviews.vapour.ws/r/4911/ [01:38] * davechen1y looks [01:42] davechen1y: the race was between multiple writers, not writers&readers [01:42] the lock was added to stop the multiple writers from stepping on each other [01:42] axw: that doesn't matter, all readers and writers have to agree on the same mutex, so there is a happens before relationship [01:43] it's the same as if you use atomic.StoreUint32 you must always use atomic.LoadUint32 [01:43] once one path is covered by a lock, all the other paths have to be covered by the same lock [01:43] davechen1y: https://golang.org/ref/mem -- "Channel communication" disagrees with you [01:44] you're not using a channel [01:44] you're using a mutex [01:44] this is what I meant by "somethign else must be enforcing the happens before" [01:45] ie, all the gorutines are going through another lock (or channel send/receive) and that's what's enforcing the memory barrier [01:45] davechen1y: and like I said, we always write to the requests from the same goroutine that made the API call. there's two possibilities: they're in the same goroutine as the test, so natural happens-before. or they're made in a goroutine that we wait for with a wait group [01:46] if the former was happening, then there wouldn't be a race [01:46] so it's probably not always happening [01:47] davechen1y: in the particular case of the race that was reported, they were goroutines spawned and then waited for with a wait group. so the readers would wait for writers, but the writers were racing with each otehr [01:48] davechen1y: the writers racing with each other is fixed by the mutex. the readers and writers were never racing. [01:49] maybe you should get someone else to review this [01:49] i'm probaby wrong [01:50] davechen1y: you're one of the few people who would know about memory orderings, but I'll get a second opinion :) [02:12] is github.com/juju/juju/audit used anywhere? [02:27] redir: I think katco just started working on that, and so it may well not be used anywhere [02:28] natefinch: code from 2014, mostly looks like davechen1y's work [02:28] unrelated to the other audit stuff [02:29] oh huh [02:29] no clue [02:29] nothing imports it [02:31] redir: then it probably should be deleted. [02:33] I did, but then realize I should ask first, so I undeleted it. [02:33] redir: kill it with fire, and if anyone complains, that's what VCS is for :) [02:35] * redir gets out his big pink eraser [02:38] gone [02:38] and so am I === redir is now known as redir_afk [02:58] fix for one of the criticals: http://reviews.vapour.ws/r/4916/ [02:58] wallyworld or axw: ^? [02:59] menn0: looking [03:00] axw: thanks [03:00] Bug #1577988 changed: Revert destroy service when machine is off [03:00] Bug #1583109 changed: error: private-address/public-address not set (1.25.5) [03:00] Bug #1586007 changed: cannot ssh to LXD container on non MAAS systems using juju ssh --proxy 0/lxd/0 [03:01] menn0: looks good, just doing QA [03:01] tho pretty trivial, probably doesn't need it [03:02] axw: you'd just be doing exactly what I just did [03:02] axw: I just added a step by the way (you need to set a feature flag) [03:03] menn0: yeah ok, I'll skip. LGTM [03:03] axw: ta [03:24] so, who wants to know what the second top cpu consumer in the agent tests ? [03:24] you'll never guess [03:25] (the first is the gc, but that isn't a surprise) [03:25] + 6.34% agent.test agent.test [.] runtime.scanobject ◆ [03:25] - 4.11% agent.test agent.test [.] math/big.addMulVVW ▒ [03:25] - math/big.addMulVVW ▒ [03:25] - 97.78% math/big.nat.expNNMontgomery ▒ [03:25] - 99.91% math/big.nat.expNN ▒ [03:25] - 86.43% math/big.nat.probablyPrime ▒ [03:25] - 99.98% math/big.(*Int).ProbablyPrime ▒ [03:25] - 99.62% crypto/rand.Prime ▒ [03:25] - crypto/rsa.GenerateMultiPrimeKey ▒ [03:25] - 96.14% crypto/rsa.GenerateKey ▒ [03:25] - 99.77% github.com/juju/juju/cert.newLeaf ▒ [03:26] - 88.83% github.com/juju/juju/cert.NewDefaultServer ▒ [03:26] - 53.01% github.com/juju/juju/environs/config.(*Config).GenerateControllerCertAndKey ▒ [03:26] + github.com/juju/juju/worker/certupdater.(*CertificateUpdater).updateCertificate ▒ [03:26] + 46.99% github.com/juju/juju/cmd/jujud/agent.(*MachineAgent).upgradeCertificateDNSNames ▒ [03:26] + 11.00% runtime.stackBarrier [03:26] that's right, TLS negotiation :) [03:26] oh, no [03:26] it's generate key [03:26] it's generating all those faux tls certificates [03:31] // XXX Do bound checking against totalLen. [03:31] ^ shit i read [03:36] well that's a waste [03:42] % go test -v -race -timeout=9001s [03:42] this timeout is over 9000! [03:52] https://bugs.launchpad.net/juju-core/+bug/1586244 [03:52] Bug #1586244: state: DATA RACE in watcher <2.0-count> [03:52] ^ cherylj this one just crept in, the state tests were passing less than 48 hours ago when I re-enable them [04:01] wallyworld: you said in the other channel that there aren't any checks in the buildtxn loop that match the assert... is that something we require? I've never done that. I assumed the asserts were there specifically to ensure the state we expect. Doing a check before just seems like it's adding a race condition. [04:02] natefinch: if there's a build txn loop with logic to compose txn slices and retry, unless the logic matches things will blow up [04:02] it's not a race because the txn asserts will reject changes where someone else got in fiest - that's why tests need to use the txn hooks to check that stuff [04:03] but people don't always write the necessary tests [04:04] wallyworld: I guess I don't understand why it'll blow up. I mean, yes, the transaction will fail... but how is that different than if the initial checks fail? [04:05] because the loop that generates the txn slice will never generate a slice that will succeed [04:05] and so it will give up with failing asserts [04:05] wallyworld: oh, ok, I see. so you're saying the logic can be wrong and so the asserts could be effectively contradictory [04:06] yes [04:06] it may not be the case here - i haven't groked the code [04:06] but it's a likely cause [04:06] there's a loop, txns, and that error [04:07] the asserts may not match the code which created them [04:10] so, I checked on nil matching nonexistent fields, and it generally does (if you have a sparse collection, it won't, but we don't use those). All the asserts for this particular transaction are against different fields entirely, so they can't really be contradictory, as far as I can tell. The only thing that seems suspicious is that the assert I pointed you to assumes that either the address being set either was never set before, or is not being [04:10] changed. So I assume this is a case where the address actually is being changed. But I'm not sure why we're asserting that setAddress isn't changing the address. [04:12] Bug #1586244 opened: state: DATA RACE in watcher <2.0-count> [04:16] wallyworld: and actually... this code explicitly only calls setaddress if the address *has* changed: https://github.com/juju/juju/blob/master/state/machine.go#L1261 [04:18] natefinch: it's not the asserts that are contradictory, it's the asserts and the logic [04:18] ie if might be the asserts and the logic, that's what causes the changing too quickly error [04:19] wallyworld: that's my point... we're adding an assert that says "assert that the address hasn't changed" but we're only adding that assert if the address *has* changed. Granted, if it changes from nothing to something, then the assert will pass. but all the rest of the code seems to be assuming that it's ok to change from something to something else... except the pesky assert [04:20] perhaps, i haven't deeply read the code. i think martin may have done this logic. the issue if i recall was that we were setting the address all the time even if it hadn't changed and that was casuing watchers to fire etc etc [04:20] i think this code is an attempt to only hit the db if we really need to [04:21] it would be useful to know what the inputs are, write a failing test, and fix the test [04:24] wallyworld: yeah, I'll see if I can do that. [04:25] natefinch: also ask martin (if git says so) to confirm the intent of the logic [04:30] axw: got a sec ? [04:30] https://bugs.launchpad.net/juju-core/+bug/1586244/comments/1 [04:30] Bug #1586244: state: DATA RACE in watcher <2.0-count> [04:30] i'm not sure which way to go [04:30] i could try to make multiwatcherstore.Get return a copy, but that might be tricky [04:31] the alternative might be to make the test not screw with the data being returned, or try to do the copy then [04:31] davechen1y: looking [04:33] wallyworld: looks like it was michael foord. I'll ping him in the morning... but it seems like this logic is just backwards. The whole point of the PR is "unit address sometimes changes" and this assert is asserting that we're only setting the value if it *hasn't* changed. [04:33] wallyworld: https://github.com/juju/juju/pull/3215/files#diff-287434ac7f7449f3a0c18912a8650f6aR1108 [04:34] natefinch: that does seem to be the case doesn't it [04:38] davechen1y: it would be nice for the info thingies to be copied on the way out of allwatcher, but I think that's too much for right now. if it's possible without too much trouble, it'd be good to mock out time so we have predictable timestamps to compare against. otherwise, copy the info objects in the test [04:59] wallyworld: oh, I think I get it. I'm reading it wrong. It's asserting that the previous address that we think we're going to change it *from* hasn't changed [05:00] that makes sense [05:01] wallyworld: so if we think we're changing it from A->B and we run the transaction, and it's actually C, we bail [05:01] sounds right [05:02] and when we retry the transaction, we get a new copy of the machine from state, which should now have C [05:02] and if that changes we try again. maybe there's something else setting the value [05:10] wallyworld: I'll talk to michael in the morning, maybe he'll have some insight. I gotta get to bed, it's late. [05:10] np, ty [05:11] natefinch: recording input will be good [05:11] so we cn see what's coming in to trigger it [05:12] wallyworld: yep... I wish we had a simpler repro case, but I can make a binary with increased logging and see if we can trigger it and look at the values and/or stack traces. Wonder if it's competing workers somehow. [05:13] yeah, log what's in db, what we receive etc [05:13] if they can repor then great [05:14] yeah, landscape seems to be able to repro by running a big deploy script. [05:15] ok, I'm out. [05:59] axw: thanks for the advice [05:59] i'll see what I can do [06:01] the multiwatcher is such a garbage fire [07:12] cmars: ping [07:15] rogpeppe1: ping [07:15] anastasiamac_: hiya [07:22] dimitern: ping [07:22] frobware: pong [07:23] dimitern: I played with your patch a bit - no IPv6 addresses turned up. Also see on #juju - stokachu tried it too [07:23] frobware: yeah, but my fix was apparently to restrictive [07:24] dimitern: "frobware: so I did a new deploy with the default ipv6 enabled, and juju pulled the ipv4 like it was supposed to" [07:24] dimitern: how so? [07:24] frobware: I still need to allow IPv6-only tests and setups to work [07:24] frobware: that's why I initially wanted to store preferred addrs per type [07:25] frobware: e.g. there's single test in cmd/juju/commands/scp_unix_test.go that fails for a good reason "test 11: scp works with IPv6 addresses" [07:30] frobware: it feels wrong to pretend there's a single private/public address of any machine anyway [07:31] frobware: rather than fixing the charms like rabbitmq-server.. === frankban|afk is now known as frankban [07:49] Bug #1482226 changed: juju status with 'prefer-ipv6' shows address, not DNS name. [07:54] kwmonroe cory_fu kjackal admcleod stub, guys could anybody please point me to the docs on using spot instances with juju on AWS? [07:55] frobware: I think I got it [07:55] Hi Alex_____ , I guess I am the only one here at this time [07:56] frobware: so if we still return the preferred public|private by default when asked, but when they're not set we try to select a new one by scope from all the available addrs seems to work ok [07:56] dimitern: why would by scope return IPv4 in favour of IPv6? [07:57] kjackal: great! Could you help me pointing to the way of using spot instances plz? [07:57] (ssh/scp should really try to use all addresses, rather rely on a single public or private) [07:57] frobware: it won't [07:57] Alex_____: Let me do a quick search [07:57] frobware: but if you have any IPv4 addrs already, one will be preferred anyway [07:58] kjackal: thanks, I appreciate! can move to #juju [07:58] dimitern: is this "desired" for a mixed IPv4/6 setup? [07:58] frobware: and that extra lookup will allow ipv6-only setups to still work (and woe to whoever tries to deploy rabbitmq-server there :D) [07:59] frobware: our "desired" state should be "don't discriminate addresses by type/scope/etc. just try all in parallel" [08:01] well for legacy / badly written charms we need to pretend still there is 1 private and 1 public addr only.. [08:01] but well written charms can ask for all addrs (opt. filtering them by type etc.) via network-get [08:02] * dimitern dreams about the day when 'unit-get private-address' will be a 'charm proof' *ERROR* [08:06] jamespage, gnuoy: do you know why rabbitmq-server charm does not work well with IPv6 addresses? [08:07] dimitern, it should do [08:07] jamespage, gnuoy: it seems the upstream rabbit supports fine.. or is it due to magic hacluster VIP behavior? [08:08] dimitern, you don't need to use the hacluster charm with rabbitmq [08:08] infact I'm going to take out that support [08:08] jamespage: well, I've been struggling to fix juju to let rabbit work - see bug 1574844 [08:08] Bug #1574844: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. [08:09] dimitern, ok you might want to try my inflight fixes for rmq [08:09] jamespage: simply hiding ipv6 addresses from charms (unit-get private|public-address) sounds like the wrong kind of fix [08:10] jamespage: sure, I'd like to try [08:11] dimitern, try with cs:~james-page/xenial/rabbitmq-server-bug1584902 [08:11] I've simplied how units cluster together alot and removed the requirement for DNS forward/reverse lookup althogether [08:12] jamespage: ta! trying now [08:12] jamespage: awesome! so this might actually solve both that bug ^^ and the one about rmq failing to work when the node's hostname is not resolvable.. [08:12] dimitern, yes the fix is horrible but it should work [08:13] dimitern, the charm already writes into /etc/hosts for all peers in the cluster [08:13] so even if that's an IPv6 one it should just dtrt [08:14] jamespage: sweet! tyvm [08:14] that should let my os-lxd on maas 2.0 setup with the nucs to work without manual steps / tweaks! [08:15] dimitern, if you could feedback on your testing on the associated bug that would be great [08:15] dimitern, I want to backport that to the stable charm, so the more testing the better... [08:16] babbageclunk: morning [08:16] jamespage: will do [08:16] voidspace: o/ [08:26] jamespage: so just deploying rmq with your fix, then adding 1 or 2 units should cover the simplest test? [08:27] (also looking for errors, etc. ofc) [08:27] yah [08:28] ok [08:30] wallyworld: I'm thinking the cloud/credential stuff in mongo would probably fit well with other controller config, like ca-cert. so maybe "two birds" [08:39] jamespage: your fix works like a charm! :) - successfully tested deploying 3 units on maas 2 and 8 units on lxd (one of them with IPv6 address); eventually all reported "Unit is ready and clustered" and no hook errors in sight! [08:39] dimitern, love it when ripping out code results in better solutions... [08:40] jamespage: tell me about it.. :D [08:40] Bug #1586298 opened: Cannot run upgrade-juju as upload-tools refers to "admin" model post the s/admin/controller/ change [08:44] jamespage: ah, actually the lxd machine with IPv6 address had a hook error hook failed: "cluster-relation-joined" [08:44] jamespage: I'll add logs to the bug comment [08:50] jamespage: bug 1584902 updated; it looks like the hook error was due getting NXDOMAIN [08:50] Bug #1584902: Setting RabbitMQ NODENAME to non-FQDN breaks on MaaS 2.0 [08:50] jamespage: do you think I can provide anything else useful before I tear it down? [08:51] dimitern, content of /etc/hosts on all units please [08:51] jamespage: sure, that's all? [08:55] dimitern, OK i think I see the problem [08:55] I missed a get_host_ip call [08:55] jamespage: yeah - there's something fishy about that machine-7 - /e/hosts does not look like on the other machines: http://paste.ubuntu.com/16727753/ [08:56] added to the bug [08:56] dimitern, ta [08:57] dimitern, can you output unit-get private-address on all units as well pls [08:57] jamespage: it's already part of the status output paste [08:57] dimitern: ping [08:58] oh yes of course.. [08:58] sorry [08:58] np [08:58] voidspace: pong [08:58] dimitern: core/description/interfaces.go:IPAddress has a ConfigMethod - is it ok for that return a state.AddressConfigMethod do you think [08:58] dimitern: or should it return a string [08:59] dimitern: nothing else in core/description/interfaces.go returns state types [08:59] voidspace: yeah, we need to apply the same pattern as for api params and state docs [08:59] voidspace: i.e. duplicate the type in core/desc/ (however unfortunate) [09:00] voidspace: there's actually a test somewhere there that verifies description (or core?) does not import extra pkgs [09:00] dimitern: right [09:00] dimitern: I can see elsewhere places using string [09:01] dimitern: e.g. Address.Scope (which is a legacy IPAddress) [09:02] voidspace: we should not migrate the legacyipaddressesC stuff [09:03] dimitern: actually probably a network address [09:04] frobware, dooferlad, fwereade, jam: standup? [09:05] dimitern: I am out of the office today getting a new passport. [09:05] dooferlad: what are you doing here then? :D [09:05] dimitern: irccloud + laptop <--> phone. Welcome to the future :-) [09:08] dimitern: OoO too [09:11] frobware: yeah, I forgot, sorry [09:13] dimitern, ok so I see the issue [09:13] the is_ip function in charm-helpers fails to support IPv6 addresses [09:13] fixing that now [09:14] jamespage: awesome! \o/ [09:25] dimitern: why does IPAddress have DNSServers/DNSSearchDomains/GatewayAddress? [09:25] dimitern: surely they belong to Subnet [09:26] voidspace: they can apply per device as well [09:27] dimitern: ok, but not *per address*, surely? [09:27] voidspace: well, dooferlad insisted to keep layers (2 and 3) separate, which is good [09:27] dimitern, https://code.launchpad.net/~james-page/charm-helpers/is-ip-ipv6/+merge/295920 [09:27] voidspace: so they do apply per address [09:27] dimitern: hmm [09:27] voidspace: as they mirror what you can do on NICs [09:28] jamespage: cheers, looking [09:28] Bug #1574844 changed: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. [09:29] jamespage: nice and short fix :) could I test it with rmq? [09:31] dimitern, cs:~james-page/xenial/rabbitmq-server-bug1574844 [09:31] jamespage: ta! testing now on lxd [09:31] dimitern: I agree that we should keep physical separate from IP, separate from higher layer protocols. We should have collections per layer of the TCP/IP stack. That would give us NetPhysical and NetIP collections. [09:32] dimitern I agree with voidspace that IPAddress is a confusing name for something with DNS and routing information in it [09:33] dooferlad: well I suggested LinkLayerDeviceAttachment.. [09:33] :) [09:33] or assignment [09:33] dimitern: too long :-) [09:33] dooferlad: indeed [09:34] * dimitern wishes we had a good way to namespace stuff in state [09:35] then we could've used e.g. statenetwork.Device, statenetwork.Attachment, ... [09:36] fwereade: how about that? ^^ [09:38] dimitern, perhaps: +1 on namespacing, indeed [09:38] fwereade: cheers, I'll think about it [09:40] dimitern, but, well, I feel like namespacing of the implementation details is a more pressing issue? [09:40] Bug #1574844 opened: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. [09:42] fwereade: example? [09:43] jamespage: it works! \o/ http://paste.ubuntu.com/16728135/ [09:44] dooferlad: :-) [09:44] cool [09:45] dimitern, can I see the /etc/hosts again pls? [09:46] jamespage: all of them look consistent now: http://paste.ubuntu.com/16728155/ [09:47] dimitern, good-oh [09:47] dimitern, change is up for review [09:48] dimitern, well, let me reorient myself: is your problem that state exports too many types? [09:48] jamespage: great, thanks for the quick fix! [09:48] dimitern, do we understand why one unit gets an IPv6 address? [09:49] that does not seem particularly deterministic to me [09:49] dimitern, my perspective is that the private state namespace is utterly stuffed with bits that we plausibly *could* carve into a few /internal/ packages [09:50] dimitern, but that it will be much harder to separate useful exported types because import cycles [09:50] jamespage: yes, it's not - just so happens LXD provider reports the IPv6 first (and that's the only one at that moment), but the it's taken as preferred private address and sticks [09:50] fwereade: well, not quite - the problem I have is naming [09:51] fwereade: and SoR principle [09:51] dimitern, naming objecty things with methods, right? [09:51] dimitern, (rather than in/out data types?) [09:51] fwereade: yeah, but some names are much too overloaded to make sense in a global namespace; so we need to use longer, more descriptive, eventually awkward at times.. [09:52] Bug #1574844 changed: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. [09:53] fwereade: e.g. a machine can have a bunch of things in different 'namespaces', which are separate concepts and can use the same term (disk device vs network device) [10:52] voidspace, dimitern - If I want to try juju with mgo.v2-unstable to see if it fixes the txn problem I'm seeing with 3.2, is there any way to do it without changing all of the 286 files that import gopkg.in/mgo.v2 to have -unstable? [10:52] babbageclunk: there is - govers [10:53] Ooh [10:54] babbageclunk: go get -u -v github.com/rogpeppe/govers/... [10:54] dimitern: thanks. Ok, but I still need to rewrite all those files. [10:56] babbageclunk: don't you need it only for juju/juju/ though? [10:57] babbageclunk: Maybe? I guess - it depends if juju/juju calls anything in a library that then uses mgo. [10:57] dimitern: Duh, replied to myself. ^ [10:59] babbageclunk: well, you can try godeps -n -t github.com/juju/juju/... [10:59] dimitern: ? [10:59] babbageclunk: and see if you end up with more than one mgo import after running govers to update imports? [11:01] dimitern: Oh, I see - ok, I'll commit my current stuff and try that. Thanks! [11:20] dimitern: It turns out govers does that check itself, yay! Unfortunately: http://pastebin.ubuntu.com/16729162/ [11:22] babbageclunk: hmm.. well, try with e.g. gopkg.in/mgo.v2-unstable/bson ? [11:22] dimitern: I just did it at the top level, seems to have worked! Building now. [11:23] babbageclunk: \o/ [11:26] can anyone look in this pr? http://reviews.vapour.ws/r/4924/ [11:28] hoenir: I already did! Good stuff, thanks for revisiting it. [11:29] babbageclunk, thanks again ! [11:30] dimitern: Ah well, it was worth checking. Same problem still. [11:30] dimitern: Now to revert all those changes! [11:30] babbageclunk: so you managed to build with the unstable mgo? [11:31] babbageclunk: but it didn't help? [11:31] dimitern: yup [11:31] dimitern: nope [11:47] babbageclunk: I see - well, as you said it was worth at least checking :) [11:48] dimitern: :) Yeah, better that than report a bug that's already fixed in the unstable branch. [11:48] babbageclunk: +1, indeed [11:49] * dimitern finally managed to add & commission a kvm node to the hw maas 2.0.. after breaking *everything* for a short while [11:51] so babbageclunk no +1 and $$merge$$ or you forgot ? [11:54] hoenir: Oh, I'm not a fully-fledged reviewer yet (new myself) - you should get someone else to look at it as well. Sorry! [11:54] hoenir: I'll have a look [11:55] dimitern, thanks [11:56] hoenir: how could I test this? [11:56] hoenir: e.g. trying to add a windows machine to an controller should trigger it I guess? [11:58] dimitern, you should boostrap + deploy a windows machine and it should be fine, also when deploing the windows machine try sudo su in the machine where you have the maas installed [11:59] and execute this commands [11:59] maas-region-admin shell [11:59] from metadataserver.models.nodeuserdata import NodeUserData [11:59] nodeobj = NodeUserData.objects.get(node__hostname=" nodeobj.data [12:00] hoenir: awesome, will try that then [12:00] and you should check if the nodeobj.data dosen't contain "\n\n" in the begining.. [12:00] and also when the windows machine was deployed you could log into it and check the cloudbase-init logs. if in the logs tails if it installed the jujud and all other stuff it should be fine [12:01] hoenir: remind me how was I supposed to import a windows image into maas 2? [12:01] if you see "[WARNING] unsupported format,blablbala than the nodeobj.data var that's holding the string it's bad [12:03] I have one maas windows image win2kr12r2 from cloudbase , i think in order to make one yourself you must exec some custom python code. There is a script in the cloudbase git repo that will do so, but from what I know it will take 6-7 hours to complete it. [12:04] https://github.com/cloudbase/windows-openstack-imaging-tools/tree/experimental [12:05] powershell* , excuse me for saying it's python. [12:09] dimitern, I tested myself before submitting the patch to upstream so I think this will not be a problem, but feel free to test it yourself if you want to. [12:15] hoenir: it's not that I don't trust the patch, but so close to the release it was decided to verify all fixes more carefully [12:20] Bug #945862 opened: Support for AWS "spot" instances [12:32] I was disconnected, so dimitern , you said something more? [12:33] hoenir: it's not that I don't trust the patch, but so close to the [12:33] release it was decided to verify all fixes more carefully [15:15] [13:39] voidspace: you around? [13:43] natefinch: yep [13:45] voidspace: I'm looking at this: https://bugs.launchpad.net/juju-core/+bug/1537585 [13:45] Bug #1537585: machine agent failed to register IP addresses, borks agent <2.0-count> [13:46] voidspace: which I think is failing because this assert is failing: https://github.com/juju/juju/blob/master/state/machine.go#L1238 [13:47] voidspace: which git seems to say was something you worked on like 9 months ago :) [13:47] voidspace: so I'm sure it's still fresh in your mind [13:47] natefinch: :-) [13:47] dimitern: ^^^^ [13:48] natefinch: preferred address logic has changed recently [13:48] voidspace: my first question is - if we're just changing the address anyway, why do we care if the old address is the same as the one we expected it to be? [13:48] natefinch: the point of the assert is that we shouldn't change the preferred address once set [13:49] natefinch: we're not changing it - we should either be setting it for the first time (nil) or just setting it to the same [13:49] natefinch: that was the point of the assert [13:49] natefinch: a better way would be to just assert it's nil [13:49] voidspace: but the function one step up the stack has already checked if we're changing it: https://github.com/juju/juju/blob/master/state/machine.go#L1262 [13:50] natefinch: race condition - two concurrent operations changing it at the same time [13:50] (set on first access - very likely to get concurrent ops here) [13:50] that's why we have the assert as well as the check [13:50] voidspace: right, so last one in wins... that's what is going to happen anyway, since we're just going to retry the transaction until it works [13:51] voidspace: or in this case, it doesn't ever work and the machine gets borked... still not sure why that is [13:51] natefinch: but that doesn't matter as it will have to be the same one - it will always fail if it tries to set a different one [13:51] voidspace: but then what is that code doing that is checking if it's changed? [13:52] voidspace: on line 1261... if changing is bad, shouldn't we just bail there? [13:52] ah, no - I'm slightly mistaken [13:52] we can change if we have a better match on scope [13:52] let me look again at that assert [13:55] natefinch: hmmm... so actually - if two operations try to concurrently set a new address theen the first one will succeed and the second will fail because the address is now not "current" [13:56] natefinch: that would better be done as a buildTxn function [13:56] refreshing the doc on each attempt [13:56] voidspace: well, so, this is called from inside a buildtxn and we do refresh the doc each time [13:57] ah yes, I see that now in setAddresses [13:58] the intent of the assert is that the doc hasn't changed (i.e. current address is unset or as we expect) [13:59] if that fails the first time - the address has changed, why does it fail *again* [13:59] yeah... that's the question. I think I may need to toss in a bunch of extra logging so we can see what it's being called with. That might make it more obvious what's going on. If it's ping ponging between two or if one is just oddly failing over and over [14:01] voidspace: it's hard to repro... the landscape guys need to deploy a semi-large environment from a script to trigger it, and even then it doesn't always happen, so it's probably timing related [14:01] but it shouldn't ping pong as the address seen shouldn't actually change until the transaction is applied [14:01] so from my reading of the code even if two concurrent changes come in, one will work, the second will fail once and then either come back with no ops or succeed [14:02] yeah, that's what I would expect from reading the code [14:02] so I don't understand I'm afraid [14:02] voidspace: it's ok. I'll see if I can get some better logs [14:02] natefinch: :-/ good luck [14:02] voidspace: thanks for looking at it with me though. [14:12] natefinch, voidspace: I might have some insight about that issue with setting addresses [14:13] natefinch, voidspace: I've been changing that code while trying to fix a related bug avoiding IPv6 preferred addresses [14:14] natefinch, voidspace: and I suspect a few possible causes, mainly the way the buildTxn in setAddresses is handling subsequent attempts [14:15] dimitern: by the way - I have thoughts about your change to preferred addresses for ipv6 [14:15] dimitern: I'm back to thinking we don't need to track seperate ipv4 and ipv6 preferred addresses [14:15] natefinch, voidspace: there's also maybeGetNewAddress overriding the origin and not checking whether it was selected or not (i.e. it can try setting an empty address) [14:16] dimitern: you said that the intent of that field on the doc was not to change - so we needed separate ipv4 & ipv6 addresses [14:16] voidspace: told you so :) [14:16] dimitern: actually the intent was that the public api (PreferredAddress) doesn't change [14:16] dimitern: and your change *does* change that [14:16] dimitern: so under the hood changing preferred address from ipv6 to ipv4 is *fine* [14:16] dimitern: and better than adding new fields and methods [14:17] voidspace: but fortunately, it's unnecessary now as jamespage fixed the real issue with the rabbitmq charm and charm-tools (helpers?) not handling IPv6 addrs properly [14:17] dimitern: hah [14:17] dimitern: ok [14:20] dimitern: I can get more logs to get a better idea of what we're actually setting... but it sounds like you've found some problems in the logic anyway. [14:21] natefinch: yeah, I also tried *really* hard to repro it.. no luck.. well, except for poking into mongodb [14:22] dimitern: yeah, the only reliable repro I can find is landscape's deployment script that makes like 15 machines/containers at once [14:22] natefinch: e.g. if you manage to get to the mongo client shell of a running model, updating an existing machineDoc's preferredprivateaddress field to a non-nil, serialized version of state.address{} [14:23] and I suspect that's what actually happens only occasionally, under heavy load like with the landscape scenario [14:23] yep [14:28] natefinch: also, the upgrade step (see state.AddPreferredAddressesToMachines or whatsit) might be causing it (didn't we start doing auto-upgrading across patch-versions of the same minor version at some point?) [14:30] dimitern: hmm, good question [14:33] frobware: ping? [14:45] katco, natefinch: do you know where the surprise introduced at payload/context/register.go:83 is corrected? [14:45] fwereade: tal [14:46] katco, ty [14:46] fwereade: yikes [14:51] katco, natefinch: fwiw, all I can see is payload/state/unit.go:58, which is commented out and may just be a casualty of the recent changes [14:52] katco, natefinch: it certainly seems to get all the way to state without anything checking it makes sense [14:55] fwereade: natefinch: payload/api/helpers.go:72 looks like it overwrites it [14:56] katco, that's just the id parsed from whatever unit was sent [14:56] fwereade: hm you are correct [14:59] fwereade: we have standup and then we'll continue investigating [14:59] katco, thanks [15:18] is there a feature branch where service is being renamed to application? [15:19] cmars, yes [15:19] alexisb, service-to-application, that's gotta be it :) [15:19] thanks [15:19] :) [15:19] cmars, Ian sent me a summary of happenings as well [15:20] if it is relavent to you I can forward it along [15:20] alexisb, sure, that'd be great. i'm fixing up the romulus change atm, just wanted to test against juju before proposing [15:20] cmars, awesome [15:24] dooferlad, you still around? [15:24] fwereade: natefinch did a test, and it's definitely getting set somewhere. i think it should be passed in here (config.UnitName) and then utilized instead of a placeholder [15:26] fwereade: oops, there was intended to be a link there: https://github.com/juju/juju/blob/master/component/all/payload.go#L112-L113 [15:30] is the standard way to log into mongo different in 2.0? I'm using mongo --ssl -u admin -p localhost:37017/admin [15:30] In juju 2.0 mass is not listed in list-clounds, any ideas to why? [15:30] massIVE: you have to add-cloud [15:35] massIVE: it's a little cumbersome right now. You have to make a .yaml file with the maas cloud defined in it, thusly: http://pastebin.ubuntu.com/16733874/ [15:35] massIVE: then juju add-cloud mymaas myclouds.yaml [15:36] alexisb: sorry, yes [15:36] alexisb: for some reason my PC thought USB was a silly idea for a few minutes *confused*. I need that for typing! [15:37] massIVE: then you can juju bootstrap controllername mymaas .. I think it'll prompt for the oauth cred at that point [15:38] natefinch: i did that, but must have done something wrong last time, works now ;) [15:38] massIVE: cool.. yeah, it's not the most user friendly experience right now. I think we're working on simplifying it soon [15:39] * dooferlad reboots again. For the fun. [15:42] natefinch: um, ERROR no registered provider for "mass" [15:42] massIVE: maas or mass? [15:42] lol :) k [15:42] :D [15:44] * dooferlad has discovered that a USB extension cable not plugged into anything can be a bad thing [15:47] dooferlad, heya, was just looking for an udpate on: https://bugs.launchpad.net/juju-core/+bug/1577945 [15:47] Bug #1577945: Bootstrap failed: DNS/routing misconfigured on maas [15:48] alexisb: I am waiting for a +1 on http://reviews.vapour.ws/r/4902 [15:48] dooferlad, cool, can you put the PR in the bug with an update please [15:48] alexisb: sure [15:49] voidspace, if you are around looks like dooferlad could use a review :) [15:53] voidspace, frobware, dimitern: http://reviews.vapour.ws/r/4902/ *poke* [15:53] fwereade: i found it... this is a bug that was introduced very recently (~10 days). trying to provide you with a walkthrough [15:54] dooferlad: looking [16:09] fwereade: nate is checking whether tip works correctly, but to me it looks like a bug was introduced here: https://github.com/juju/juju/commit/b3fb5cbc9c31#diff-c04416a1afc4fb32911bcaafcfdb48a1L29 [16:10] fwereade: (in that last block) [16:11] could anyone check this PR and $$merge$$ it? http://reviews.vapour.ws/r/4924/ [16:12] katco, you don't consider "a-service/0" making it safely through... context, api client, api server, state... 4 layers, before being silently overwritten in the persistence layer, to be a bug? [16:12] katco, that is pretty much just working by accident [16:13] fwereade: I believe eric's intent was to remove unit from the payload struct entirely. This may have been a step in that direction, that just never got completed [16:13] fwereade: tip is broken btw [16:13] natefinch, I don't really see how introducing nonsense data at runtime is anything other than... introducing nonsense data at runtime [16:14] fwereade: well, certainly, I don't think this should have ever made it into master as-is [16:14] fwereade: just trying to help you figure out where it stopped working. looks like that's it. [16:14] natefinch, my branch fixes it in state, fwiw, if I finish it today it'll be late I'm afraid [16:15] dooferlad: lgtm; however, I'd appreciate steps how to verify this locally [16:16] fwereade: seems like a refactor that was not completed, and was accidentally committed to master. [16:16] katco, ok, sure, but... the *persistence* layer silently overwriting one of the fields specified by state? I thought that was where you were putting the business rules -- surely that *should* be the layer that decides *what* gets persisted? [16:16] dimitern: can you +1 this? http://reviews.vapour.ws/r/4924/ [16:17] dimitern: just start a node that uses DHCP as its address allocation method. dhclient will be running against the bridged interface rather than the parent. Very simple. [16:18] dooferlad: and you mean DHCP not Auto Assign? [16:18] fwereade: yes i agree. finding the commit where it broke does not imply complicit acceptance of an approach. [16:18] dimitern: yes [16:23] katco, well, this situation does seem to be a direct consequence of the loose/flexible style you were advocating earlier [16:25] fwereade: what? no, not at all [16:25] katco, heh, it struck me as well-timed and instructive [16:25] fwereade: those two concepts aren't even connected [16:26] katco, bad data making it through 4 layers *not* connected with trusting your clients because it lets you go "fast"? [16:27] fwereade: you are misconstruing my comment. it's not at all about not doing data validation at your layer boundaries. [16:27] fwereade: going to go do some work, enjoy your evening. [16:29] fwereade: FWIW, the only way to verify the unit info would be to hit the DB.... should we really do that at every boundary? IMO, the only real problem is that we let it get into the DB that way. That's the only time you really need to ensure the unit is valid. There should have been an assert that the unit exists. === frankban is now known as frankban|afk [16:31] natefinch, uh, no: in the apiserver layer, you are supplied with an authoriser that tells you what entity you're connecting on behalf of. if you don't check that the connected entity is allowed to make the changes it asks for, you are just broken [16:31] fwereade: yes, also a bug [16:31] fwereade: sorry, gotta run to pick up my daughter from preschool... back in am hour [16:34] katco, sorry, but: (1) you don't validate, and (2) bad data gets through; ISTM that (1) => (2). do you have some alternative explanation for (2)? [16:42] katco, or: you are disagreeing with some aspect of "don't trust your clients" but not actually advocating the approach taken in payloads, despite "trusting its clients" apparently being a design principle and a source of evident problems? [16:47] dooferlad: still there? [17:06] dimitern: not really [18:12] haven't done this for a while, friday fun: here's a stab at a complete auto-generated representation of the Juju API (RPC calls only) http://paste.ubuntu.com/16739140/ [18:13] rogpeppe1: call it FacadeName so it prints out above Methods :) [18:13] rogpeppe1: very cool [18:14] natefinch: i should change rjson so that it preserves ordering [18:14] rogpeppe1: even better [18:16] rogpeppe1: also, man, that's huge. [18:16] rogpeppe1: I guess it includes definitions of even stdlib types, though [18:16] natefinch: yeah [18:16] natefinch: here's the code i used to generate it: http://paste.ubuntu.com/16739827/ [18:18] natefinch: anyway, early days yet. at some point, i'll add doc comments too, and produce linked HTML output :) [18:18] natefinch: then we might actually have something like an API doc... [18:18] rogpeppe1: that would be amazing [18:19] natefinch: i don't think it would be more than half a day's work [18:19] natefinch: anyway, gotta go and frolic [18:19] natefinch: cheerio, and thanks for being interested :) [18:19] rogpeppe1: definitely. happy weekend! [18:26] rogpeppe1, this is awesome [18:26] rogpeppe1, you would be my hero! [18:45] Bug #1584815 changed: SSHSuite.TestSSHCommand fails on windows [18:46] Bug #1585388 changed: Container networking cannot ssh after machine is ready [18:49] Bug #1584815 opened: SSHSuite.TestSSHCommand fails on windows [18:49] Bug #1585388 opened: Container networking cannot ssh after machine is ready [18:49] So does this build not failed? https://github.com/juju/juju/pull/5449 [18:55] hoenir, it looks like htat did merge, there were some issues with the bot yesterday, so it is a bit clear to me what happened there [18:56] aha, so everything is ok then.. [19:13] Bug #1584815 changed: SSHSuite.TestSSHCommand fails on windows [19:13] Bug #1585388 changed: Container networking cannot ssh after machine is ready [19:16] Bug #1586512 opened: juju2 websocket api response consistency APIHostPorts versus Login response