/srv/irclogs.ubuntu.com/2016/05/27/#juju-dev.txt

axwwallyworld: https://github.com/juju/juju/pull/5462 doesn't actually have a bug, should I JFDI or leave it?01:00
wallyworldaxw: um, create a bug i think01:01
axwok01:01
mupBug #1586217 opened: azure: bootstrapping prints out scary, spurious ERROR messages <blocker> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1586217>01:05
wallyworldaxw: when you are free, a small one, then i can propose the charmrepo one https://github.com/juju/charm/pull/21101:06
rediris there prior art for parsing cmd line flags? e.g. timestamps?01:18
wallyworldaxw: i tried a high level test but couldn't get the bson marshalling behaviour to work. i couldn't figure out the magic to get a nil result from unmarshal instead of an empty struct (or a marshalling error) so figured a test is better than nothing01:22
wallyworldi'll take another look01:26
axwwallyworld: marshal a struct with a field of type BundleData, with omitempty on the field?01:28
wallyworldyeah, that may work01:28
mupBug #1581748 changed: [2.0b7] add-credential no longer supports maas as a cloud type <cdo-qa> <credentials> <maas> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1581748>01:36
axwdavechen1y: please see my reply on http://reviews.vapour.ws/r/4911/01:38
* davechen1y looks01:38
axwdavechen1y: the race was between multiple writers, not writers&readers01:42
axwthe lock was added to stop the multiple writers from stepping on each other01:42
davechen1yaxw: that doesn't matter, all readers and writers have to agree on the same mutex, so there is a happens before relationship01:42
davechen1yit's the same as if you use atomic.StoreUint32 you must always use atomic.LoadUint3201:43
davechen1yonce one path is covered by a lock, all the other paths have to be covered by the same lock01:43
axwdavechen1y: https://golang.org/ref/mem -- "Channel communication" disagrees with you01:43
davechen1yyou're not using a channel01:44
davechen1yyou're using a mutex01:44
davechen1ythis is what I meant by "somethign else must be enforcing the happens before"01:44
davechen1yie, all the gorutines are going through another lock (or channel send/receive) and that's what's enforcing the memory barrier01:45
axwdavechen1y: and like I said, we always write to the requests from the same goroutine that made the API call. there's two possibilities: they're in the same goroutine as the test, so natural happens-before. or they're made in a goroutine that we wait for with a wait group01:45
davechen1yif the former was happening, then there wouldn't be a race01:46
davechen1yso it's probably not always happening01:46
axwdavechen1y: in the particular case of the race that was reported, they were goroutines spawned and then waited for with a wait group. so the readers would wait for writers, but the writers were racing with each otehr01:47
axwdavechen1y: the writers racing with each other is fixed by the mutex. the readers and writers were never racing.01:48
davechen1ymaybe you should get someone else to review this01:49
davechen1yi'm probaby wrong01:49
axwdavechen1y: you're one of the few people who would know about memory orderings, but I'll get a second opinion :)01:50
rediris github.com/juju/juju/audit used anywhere?02:12
natefinchredir: I think katco just started working on that, and so it may well not be used anywhere02:27
redirnatefinch: code from 2014, mostly looks like davechen1y's work02:28
redirunrelated to the other audit stuff02:28
natefinchoh huh02:29
natefinchno clue02:29
redirnothing imports it02:29
natefinchredir: then it probably should be deleted.02:31
redirI did, but then realize I should ask first, so I undeleted it.02:33
natefinchredir: kill it with fire, and if anyone complains, that's what VCS is for :)02:33
* redir gets out his big pink eraser02:35
redirgone02:38
redirand so am I02:38
=== redir is now known as redir_afk
menn0fix for one of the criticals: http://reviews.vapour.ws/r/4916/02:58
menn0wallyworld or axw: ^?02:58
axwmenn0: looking02:59
menn0axw: thanks03:00
mupBug #1577988 changed: Revert destroy service when machine is off <juju-core:Invalid> <https://launchpad.net/bugs/1577988>03:00
mupBug #1583109 changed: error: private-address/public-address not set (1.25.5) <sts> <juju-core:New> <https://launchpad.net/bugs/1583109>03:00
mupBug #1586007 changed: cannot ssh to LXD container on non MAAS systems using juju ssh --proxy 0/lxd/0 <lxc> <lxd> <ssh> <juju-core:In Progress by menno.smits> <https://launchpad.net/bugs/1586007>03:00
axwmenn0: looks good, just doing QA03:01
axwtho pretty trivial, probably doesn't need it03:01
menn0axw: you'd just be doing exactly what I just did03:02
menn0axw: I just added a step by the way (you need to set a feature flag)03:02
axwmenn0: yeah ok, I'll skip. LGTM03:03
menn0axw: ta03:03
davechen1yso, who wants to know what the second top cpu consumer in the agent tests ?03:24
davechen1yyou'll never guess03:24
davechen1y(the first is the gc, but that isn't a surprise)03:25
davechen1y+   6.34%       agent.test  agent.test                          [.] runtime.scanobject                                                                                                                                                        ◆03:25
davechen1y-   4.11%       agent.test  agent.test                          [.] math/big.addMulVVW                                                                                                                                                        ▒03:25
davechen1y   - math/big.addMulVVW                                                                                                                                                                                                                       ▒03:25
davechen1y      - 97.78% math/big.nat.expNNMontgomery                                                                                                                                                                                                   ▒03:25
davechen1y         - 99.91% math/big.nat.expNN                                                                                                                                                                                                          ▒03:25
davechen1y            - 86.43% math/big.nat.probablyPrime                                                                                                                                                                                               ▒03:25
davechen1y               - 99.98% math/big.(*Int).ProbablyPrime                                                                                                                                                                                         ▒03:25
davechen1y                  - 99.62% crypto/rand.Prime                                                                                                                                                                                                  ▒03:25
davechen1y                     - crypto/rsa.GenerateMultiPrimeKey                                                                                                                                                                                       ▒03:25
davechen1y                        - 96.14% crypto/rsa.GenerateKey                                                                                                                                                                                       ▒03:25
davechen1y                           - 99.77% github.com/juju/juju/cert.newLeaf                                                                                                                                                                         ▒03:25
davechen1y                              - 88.83% github.com/juju/juju/cert.NewDefaultServer                                                                                                                                                             ▒03:26
davechen1y                                 - 53.01% github.com/juju/juju/environs/config.(*Config).GenerateControllerCertAndKey                                                                                                                         ▒03:26
davechen1y                                    + github.com/juju/juju/worker/certupdater.(*CertificateUpdater).updateCertificate                                                                                                                         ▒03:26
davechen1y                                 + 46.99% github.com/juju/juju/cmd/jujud/agent.(*MachineAgent).upgradeCertificateDNSNames                                                                                                                     ▒03:26
davechen1y                              + 11.00% runtime.stackBarrier03:26
davechen1ythat's right, TLS negotiation :)03:26
davechen1yoh, no03:26
davechen1yit's generate key03:26
davechen1yit's generating all those faux tls certificates03:26
davechen1y  // XXX Do bound checking against totalLen.03:31
davechen1y^ shit i read03:31
natefinchwell that's a waste03:36
davechen1y% go test -v -race -timeout=9001s03:42
davechen1ythis timeout is over 9000!03:42
davechen1yhttps://bugs.launchpad.net/juju-core/+bug/158624403:52
mupBug #1586244: state: DATA RACE in watcher <2.0-count> <blocker> <race-condition> <juju-core:New> <https://launchpad.net/bugs/1586244>03:52
davechen1y^ cherylj this one just crept in, the state tests were passing less than 48 hours ago when I re-enable them03:52
natefinchwallyworld: you said in the other channel that there aren't any checks in the buildtxn loop that match the assert... is that something we require?  I've never done that.  I assumed the asserts were there specifically to ensure the state we expect.  Doing a check before just seems like it's adding a race condition.04:01
wallyworldnatefinch: if there's a build txn loop with logic to compose txn slices and retry, unless the logic matches things will blow up04:02
wallyworldit's not a race because the txn asserts will reject changes where someone else got in fiest - that's why tests need to use the txn hooks to check that stuff04:02
wallyworldbut people don't always write the necessary tests04:03
natefinchwallyworld: I guess I don't understand why it'll blow up.  I mean, yes, the transaction will fail... but how is that different than if the initial checks fail?04:04
wallyworldbecause the loop that generates the txn slice will never generate a slice that will succeed04:05
wallyworldand so it will give up with failing asserts04:05
natefinchwallyworld: oh, ok, I see.  so you're saying the logic can be wrong and so the asserts could be effectively contradictory04:05
wallyworldyes04:06
wallyworldit may not be the case here - i haven't groked the code04:06
wallyworldbut it's a likely cause04:06
wallyworldthere's a loop, txns, and that error04:06
wallyworldthe asserts may not match the code which created them04:07
natefinchso, I checked on nil matching nonexistent fields, and it generally does (if you have a sparse collection, it won't, but we don't use those).  All the asserts for this particular transaction are against different fields entirely, so they can't really be contradictory, as far as I can tell. The only thing that seems suspicious is that the assert I pointed you  to assumes that either the address being set either was never set before, or is not being04:10
natefinchchanged.  So I assume this is a case where the address actually is being changed.   But I'm not sure why we're asserting that setAddress isn't changing the address.04:10
mupBug #1586244 opened: state: DATA RACE in watcher <2.0-count> <blocker> <race-condition> <juju-core:New> <https://launchpad.net/bugs/1586244>04:12
natefinchwallyworld: and actually... this code explicitly only calls setaddress if the address *has* changed: https://github.com/juju/juju/blob/master/state/machine.go#L126104:16
wallyworldnatefinch: it's not the asserts that are contradictory, it's the asserts and the logic04:18
wallyworldie if might be the asserts and the logic, that's what causes the changing too quickly error04:18
natefinchwallyworld: that's my point... we're adding an assert that says "assert that the address hasn't changed" but we're only adding that assert if the address *has* changed.  Granted, if it changes from nothing to something, then the assert will pass.  but all the rest of the code seems to be assuming that it's ok to change from something to something else... except the pesky assert04:19
wallyworldperhaps, i haven't deeply read the code. i think martin may have done this logic. the issue if i recall was that we were setting the address all the time even if it hadn't changed and that was casuing watchers to fire etc etc04:20
wallyworldi think this code is an attempt to only hit the db if we really need to04:20
wallyworldit would be useful to know what the inputs are, write a failing test, and fix the test04:21
natefinchwallyworld: yeah, I'll see if I can do that.04:24
wallyworldnatefinch: also ask martin (if git says so) to confirm the intent of the logic04:25
davechen1yaxw: got a sec ?04:30
davechen1yhttps://bugs.launchpad.net/juju-core/+bug/1586244/comments/104:30
mupBug #1586244: state: DATA RACE in watcher <2.0-count> <race-condition> <juju-core:New for dave-cheney> <https://launchpad.net/bugs/1586244>04:30
davechen1yi'm not sure which way to go04:30
davechen1yi could try to make multiwatcherstore.Get return a copy, but that might be tricky04:30
davechen1ythe alternative might be to make the test not screw with the data being returned, or try to do the copy then04:31
axwdavechen1y: looking04:31
natefinchwallyworld: looks like it was michael foord.  I'll ping him in the morning... but it seems like this logic is just backwards.  The whole point of the PR is "unit address sometimes changes" and this assert is asserting that we're only setting the value if it *hasn't* changed.04:33
natefinchwallyworld: https://github.com/juju/juju/pull/3215/files#diff-287434ac7f7449f3a0c18912a8650f6aR110804:33
wallyworldnatefinch: that does seem to be the case doesn't it04:34
axwdavechen1y: it would be nice for the info thingies to be copied on the way out of allwatcher, but I think that's too much for right now. if it's possible without too much trouble, it'd be good to mock out time so we have predictable timestamps to compare against. otherwise, copy the info objects in the test04:38
natefinchwallyworld: oh, I think I get it.  I'm reading it wrong.  It's asserting that the previous address that we think we're going to change it *from* hasn't changed04:59
wallyworldthat makes sense05:00
natefinchwallyworld: so if we think we're changing it from A->B and we run the transaction, and it's actually C, we bail05:01
wallyworldsounds right05:01
natefinchand when we retry the transaction, we get a new copy of the machine from state, which should now have C05:02
wallyworldand if that changes we try again. maybe there's something else setting the value05:02
natefinchwallyworld: I'll talk to michael in the morning, maybe he'll have some insight.  I gotta get to bed, it's late.05:10
wallyworldnp, ty05:10
wallyworldnatefinch: recording input will be good05:11
wallyworldso we cn see what's coming in to trigger it05:11
natefinchwallyworld: yep... I wish we had a simpler repro case, but I can make a binary with increased logging and see if we can trigger it and look at the values and/or stack traces.  Wonder if it's competing workers somehow.05:12
wallyworldyeah, log what's in db, what we receive etc05:13
wallyworldif they can repor then great05:13
natefinchyeah, landscape seems to be able to repro by running a big deploy script.05:14
natefinchok, I'm out.05:15
davechen1yaxw: thanks for the advice05:59
davechen1yi'll see what I can do05:59
davechen1ythe multiwatcher is such a garbage fire06:01
anastasiamac_cmars: ping07:12
anastasiamac_rogpeppe1: ping07:15
rogpeppe1anastasiamac_: hiya07:15
frobwaredimitern: ping07:22
dimiternfrobware: pong07:22
frobwaredimitern: I played with your patch a bit - no IPv6 addresses turned up. Also see on #juju - stokachu tried it too07:23
dimiternfrobware: yeah, but my fix was apparently to restrictive07:23
frobwaredimitern: "frobware: so I did a new deploy with the default ipv6 enabled, and juju pulled the ipv4 like it was supposed to"07:24
frobwaredimitern: how so?07:24
dimiternfrobware: I still need to allow IPv6-only tests and setups to work07:24
dimiternfrobware: that's why I initially wanted to store preferred addrs per type07:24
dimiternfrobware: e.g. there's single test in cmd/juju/commands/scp_unix_test.go that fails for a good reason "test 11: scp works with IPv6 addresses"07:25
dimiternfrobware: it feels wrong to pretend there's a single private/public address of any machine anyway07:30
dimiternfrobware: rather than fixing the charms like rabbitmq-server..07:31
=== frankban|afk is now known as frankban
mupBug #1482226 changed: juju status with 'prefer-ipv6' shows address, not DNS name. <amd64> <apport-bug> <ipv6> <network> <status> <trusty> <uec-images> <juju-core:Won't Fix> <https://launchpad.net/bugs/1482226>07:49
Alex_____kwmonroe cory_fu kjackal admcleod stub, guys could anybody please point me to the docs on using spot instances with juju on AWS?07:54
dimiternfrobware: I think I got it07:55
kjackalHi Alex_____ , I guess I am the only one here at this time07:55
dimiternfrobware: so if we still return the preferred public|private by default when asked, but when they're not set we try to select a new one by scope from all the available addrs seems to work ok07:56
frobwaredimitern: why would by scope return IPv4 in favour of IPv6?07:56
Alex_____kjackal: great! Could you help me pointing to the way of using spot instances plz?07:57
kjackal<Alex_____ I am not using AWS but let me search on this. It seems to be a question for Juju in general07:57
dimitern(ssh/scp should really try to use all addresses, rather rely on a single public or private)07:57
dimiternfrobware: it won't07:57
kjackalAlex_____: Let me do a quick search07:57
dimiternfrobware: but if you have any IPv4 addrs already, one will be preferred anyway07:57
Alex_____kjackal: thanks, I appreciate! can move to #juju07:58
frobwaredimitern: is this "desired" for a mixed IPv4/6 setup?07:58
dimiternfrobware: and that extra lookup will allow ipv6-only setups to still work (and woe to whoever tries to deploy rabbitmq-server there :D)07:58
dimiternfrobware: our "desired" state should be "don't discriminate addresses by type/scope/etc. just try all in parallel"07:59
dimiternwell for legacy / badly written charms we need to pretend still there is 1 private and 1 public addr only..08:01
dimiternbut well written charms can ask for all addrs (opt. filtering them by type etc.) via network-get08:01
* dimitern dreams about the day when 'unit-get private-address' will be a 'charm proof' *ERROR*08:02
dimiternjamespage, gnuoy: do you know why rabbitmq-server charm does not work well with IPv6 addresses?08:06
jamespagedimitern, it should do08:07
dimiternjamespage, gnuoy: it seems the upstream rabbit supports fine.. or is it due to magic hacluster VIP behavior?08:07
jamespagedimitern, you don't need to use the hacluster charm with rabbitmq08:08
jamespageinfact I'm going to take out that support08:08
dimiternjamespage: well, I've been struggling to fix juju to let rabbit work - see bug 157484408:08
mupBug #1574844: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. <conjure> <juju-release-support> <landscape> <lxd-provider> <juju-core:In Progress by dimitern> <rabbitmq-server (Juju Charms Collection):New> <https://launchpad.net/bugs/1574844>08:08
jamespagedimitern, ok you might want to try my inflight fixes for rmq08:09
dimiternjamespage: simply hiding ipv6 addresses from charms (unit-get private|public-address) sounds like the wrong kind of fix08:09
dimiternjamespage: sure, I'd like to try08:10
jamespagedimitern, try with cs:~james-page/xenial/rabbitmq-server-bug158490208:11
jamespageI've simplied how units cluster together alot and removed the requirement for DNS forward/reverse lookup althogether08:11
dimiternjamespage: ta! trying now08:12
dimiternjamespage: awesome! so this might actually solve both that bug ^^ and the one about rmq failing to work when the node's hostname is not resolvable..08:12
jamespagedimitern, yes the fix is horrible but it should work08:12
jamespagedimitern, the charm already writes <ip> <hostname> into /etc/hosts for all peers in the cluster08:13
jamespageso even if that's an IPv6 one it should just dtrt08:13
dimiternjamespage: sweet! tyvm08:14
dimiternthat should let my os-lxd on maas 2.0 setup with the nucs to work without manual steps / tweaks!08:14
jamespagedimitern, if you could feedback on your testing on the associated bug that would be great08:15
jamespagedimitern, I want to backport that to the stable charm, so the more testing the better...08:15
voidspacebabbageclunk: morning08:16
dimiternjamespage: will do08:16
babbageclunkvoidspace: o/08:16
dimiternjamespage: so just deploying rmq with your fix, then adding 1 or 2 units should cover the simplest test?08:26
dimitern(also looking for errors, etc. ofc)08:27
jamespageyah08:27
dimiternok08:28
axwwallyworld: I'm thinking the cloud/credential stuff in mongo would probably fit well with other controller config, like ca-cert. so maybe "two birds"08:30
dimiternjamespage: your fix works like a charm! :) - successfully tested deploying 3 units on maas 2 and 8 units on lxd (one of them with IPv6 address); eventually all reported "Unit is ready and clustered" and no hook errors in sight!08:39
jamespagedimitern, love it when ripping out code results in better solutions...08:39
dimiternjamespage: tell me about it.. :D08:40
mupBug #1586298 opened: Cannot run upgrade-juju as upload-tools refers to "admin" model post the s/admin/controller/ change <juju-core:New> <https://launchpad.net/bugs/1586298>08:40
dimiternjamespage: ah, actually the lxd machine with IPv6 address had a hook error hook failed: "cluster-relation-joined"08:44
dimiternjamespage: I'll add logs to the bug comment08:44
dimiternjamespage: bug 1584902 updated; it looks like the hook error was due getting NXDOMAIN08:50
mupBug #1584902: Setting RabbitMQ NODENAME to non-FQDN breaks on MaaS 2.0 <canonical-bootstack> <cpec> <juju2> <maas2> <sts> <rabbitmq-server (Juju Charms Collection):Fix Committed by james-page> <https://launchpad.net/bugs/1584902>08:50
dimiternjamespage: do you think I can provide anything else useful before I tear it down?08:50
jamespagedimitern, content of /etc/hosts on all units please08:51
dimiternjamespage: sure, that's all?08:51
jamespagedimitern, OK i think I see the problem08:55
jamespageI missed a get_host_ip call08:55
dimiternjamespage: yeah - there's something fishy about that machine-7 - /e/hosts does not look like on the other machines: http://paste.ubuntu.com/16727753/08:55
dimiternadded to the bug08:56
jamespagedimitern, ta08:56
jamespagedimitern, can you output unit-get private-address on all units as well pls08:57
dimiternjamespage: it's already part of the status output paste08:57
voidspacedimitern: ping08:57
jamespageoh yes of course..08:58
jamespagesorry08:58
dimiternnp08:58
dimiternvoidspace: pong08:58
voidspacedimitern: core/description/interfaces.go:IPAddress has a ConfigMethod - is it ok for that return a state.AddressConfigMethod do you think08:58
voidspacedimitern: or should it return a string08:58
voidspacedimitern: nothing else in core/description/interfaces.go returns state types08:59
dimiternvoidspace: yeah, we need to apply the same pattern as for api params and state docs08:59
dimiternvoidspace: i.e. duplicate the type in core/desc/ (however unfortunate)08:59
dimiternvoidspace: there's actually a test somewhere there that verifies description (or core?) does not import extra pkgs09:00
voidspacedimitern: right09:00
voidspacedimitern: I can see elsewhere places using string09:00
voidspacedimitern: e.g. Address.Scope (which is a legacy IPAddress)09:01
dimiternvoidspace: we should not migrate the legacyipaddressesC stuff09:02
voidspacedimitern: actually probably a network address09:03
dimiternfrobware, dooferlad, fwereade, jam: standup?09:04
dooferladdimitern: I am out of the office today getting a new passport.09:05
dimiterndooferlad: what are you doing here then? :D09:05
dooferladdimitern: irccloud + laptop <--> phone. Welcome to the future :-)09:05
frobwaredimitern: OoO too09:08
dimiternfrobware: yeah, I forgot, sorry09:11
jamespagedimitern, ok so I see the issue09:13
jamespagethe is_ip function in charm-helpers fails to support IPv6 addresses09:13
jamespagefixing that now09:13
dimiternjamespage: awesome! \o/09:14
voidspacedimitern: why does IPAddress have DNSServers/DNSSearchDomains/GatewayAddress?09:25
voidspacedimitern: surely they belong to Subnet09:25
dimiternvoidspace: they can apply per device as well09:26
voidspacedimitern: ok, but not *per address*, surely?09:27
dimiternvoidspace: well, dooferlad insisted to keep layers (2 and 3) separate, which is good09:27
jamespagedimitern, https://code.launchpad.net/~james-page/charm-helpers/is-ip-ipv6/+merge/29592009:27
dimiternvoidspace: so they do apply per address09:27
voidspacedimitern: hmm09:27
dimiternvoidspace: as they mirror what you can do on NICs09:27
dimiternjamespage: cheers, looking09:28
mupBug #1574844 changed: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. <conjure> <juju-release-support> <landscape> <lxd-provider> <juju-core:Won't Fix> <rabbitmq-server (Juju Charms Collection):In Progress by james-page> <https://launchpad.net/bugs/1574844>09:28
dimiternjamespage: nice and short fix :) could I test it with rmq?09:29
jamespagedimitern, cs:~james-page/xenial/rabbitmq-server-bug157484409:31
dimiternjamespage: ta! testing now on lxd09:31
dooferladdimitern: I agree that we should keep physical separate from IP, separate from higher layer protocols. We should have collections per layer of the TCP/IP stack. That would give us NetPhysical and NetIP collections.09:31
dooferladdimitern I agree with voidspace that IPAddress is a confusing name for something with DNS and routing information in it09:32
dimiterndooferlad: well I suggested LinkLayerDeviceAttachment..09:33
dimitern:)09:33
dimiternor assignment09:33
dooferladdimitern: too long :-)09:33
dimiterndooferlad: indeed09:33
* dimitern wishes we had a good way to namespace stuff in state 09:34
dimiternthen we could've used e.g. statenetwork.Device, statenetwork.Attachment, ...09:35
dimiternfwereade: how about that? ^^09:36
fwereadedimitern, perhaps: +1 on namespacing, indeed09:38
dimiternfwereade: cheers, I'll think about it09:38
fwereadedimitern, but, well, I feel like namespacing of the implementation details is a more pressing issue?09:40
mupBug #1574844 opened: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. <conjure> <juju-release-support> <landscape> <lxd-provider> <juju-core:Won't Fix> <rabbitmq-server (Juju Charms Collection):In Progress by james-page> <https://launchpad.net/bugs/1574844>09:40
dimiternfwereade: example?09:42
dimiternjamespage: it works! \o/ http://paste.ubuntu.com/16728135/09:43
voidspacedooferlad: :-)09:44
voidspacecool09:44
jamespagedimitern, can I see the /etc/hosts again pls?09:45
dimiternjamespage: all of them look consistent now: http://paste.ubuntu.com/16728155/09:46
jamespagedimitern, good-oh09:47
jamespagedimitern, change is up for review09:47
fwereadedimitern, well, let me reorient myself: is your problem that state exports too many types?09:48
dimiternjamespage: great, thanks for the quick fix!09:48
jamespagedimitern, do we understand why one unit gets an IPv6 address?09:48
jamespagethat does not seem particularly deterministic to me09:49
fwereadedimitern, my perspective is that the private state namespace is utterly stuffed with bits that we plausibly *could* carve into a few /internal/ packages09:49
fwereadedimitern, but that it will be much harder to separate useful exported types because import cycles09:50
dimiternjamespage: yes, it's not - just so happens LXD provider reports the IPv6 first (and that's the only one at that moment), but the it's taken as preferred private address and sticks09:50
dimiternfwereade: well, not quite - the problem I have is naming09:50
dimiternfwereade: and SoR principle09:51
fwereadedimitern, naming objecty things with methods, right?09:51
fwereadedimitern, (rather than in/out data types?)09:51
dimiternfwereade: yeah, but some names are much too overloaded to make sense in a global namespace; so we need to use longer, more descriptive, eventually awkward at times..09:51
mupBug #1574844 changed: juju2 gives ipv6 address for one lxd, rabbit doesn't appreciate it. <conjure> <juju-release-support> <landscape> <lxd-provider> <juju-core:Won't Fix> <rabbitmq-server (Juju Charms Collection):In Progress by james-page> <https://launchpad.net/bugs/1574844>09:52
dimiternfwereade: e.g. a machine can have a bunch of things in different 'namespaces', which are separate concepts and can use the same term (disk device vs network device)09:53
babbageclunkvoidspace, dimitern - If I want to try juju with mgo.v2-unstable to see if it fixes the txn problem I'm seeing with 3.2, is there any way to do it without changing all of the 286 files that import gopkg.in/mgo.v2 to have -unstable?10:52
dimiternbabbageclunk: there is - govers10:52
babbageclunkOoh10:53
dimiternbabbageclunk: go get -u -v github.com/rogpeppe/govers/...10:54
babbageclunkdimitern: thanks. Ok, but I still need to rewrite all those files.10:54
dimiternbabbageclunk: don't you need it only for juju/juju/ though?10:56
babbageclunkbabbageclunk: Maybe? I guess - it depends if juju/juju calls anything in a library that then uses mgo.10:57
babbageclunkdimitern: Duh, replied to myself. ^10:57
dimiternbabbageclunk: well, you can try godeps -n -t github.com/juju/juju/...10:59
babbageclunkdimitern: ?10:59
dimiternbabbageclunk: and see if you end up with more than one mgo import after running govers to update imports?10:59
babbageclunkdimitern: Oh, I see - ok, I'll commit my current stuff and try that. Thanks!11:01
babbageclunkdimitern: It turns out govers does that check itself, yay! Unfortunately: http://pastebin.ubuntu.com/16729162/11:20
dimiternbabbageclunk: hmm.. well, try with e.g. gopkg.in/mgo.v2-unstable/bson ?11:22
babbageclunkdimitern: I just did it at the top level, seems to have worked! Building now.11:22
dimiternbabbageclunk: \o/11:23
hoenircan anyone look in this pr? http://reviews.vapour.ws/r/4924/11:26
babbageclunkhoenir: I already did! Good stuff, thanks for revisiting it.11:28
hoenirbabbageclunk, thanks again !11:29
babbageclunkdimitern: Ah well, it was worth checking. Same problem still.11:30
babbageclunkdimitern: Now to revert all those changes!11:30
dimiternbabbageclunk: so you managed to build with the unstable mgo?11:30
dimiternbabbageclunk: but it didn't help?11:31
babbageclunkdimitern: yup11:31
babbageclunkdimitern: nope11:31
dimiternbabbageclunk: I see - well, as you said it was worth at least checking :)11:47
babbageclunkdimitern: :) Yeah, better that than report a bug that's already fixed in the unstable branch.11:48
dimiternbabbageclunk: +1, indeed11:48
* dimitern finally managed to add & commission a kvm node to the hw maas 2.0.. after breaking *everything* for a short while11:49
hoenirso babbageclunk no +1 and $$merge$$ or you forgot ?11:51
babbageclunkhoenir: Oh, I'm not a fully-fledged reviewer yet (new myself) - you should get someone else to look at it as well. Sorry!11:54
dimiternhoenir: I'll have a look11:54
hoenirdimitern, thanks11:55
dimiternhoenir: how could I test this?11:56
dimiternhoenir: e.g. trying to add a windows machine to an controller should trigger it I guess?11:56
hoenirdimitern, you should boostrap + deploy a windows machine and it should be fine, also when deploing the windows machine try sudo su in the machine where you have the maas installed11:58
hoenirand execute this commands11:59
hoenirmaas-region-admin shell11:59
hoenirfrom metadataserver.models.nodeuserdata import NodeUserData11:59
hoenirnodeobj = NodeUserData.objects.get(node__hostname="<name_of_the_windows_machine_in_maas")11:59
hoenirnodeobj.data11:59
dimiternhoenir: awesome, will try that then12:00
hoenirand you should check if the nodeobj.data dosen't contain "\n\n" in the begining..12:00
hoenirand also when the windows machine was deployed you could log into it and check the cloudbase-init logs. if in the logs tails if it installed the jujud and all other stuff it should be fine12:00
dimiternhoenir: remind me how was I supposed to import a windows image into maas 2?12:01
hoenirif you see "[WARNING] unsupported format,blablbala than the nodeobj.data var that's holding the string it's bad12:01
hoenirI have one maas windows image win2kr12r2 from cloudbase , i think in order to make one yourself you must exec some custom python code. There is a script in the cloudbase git repo that will do so, but from what I know it will take 6-7 hours to complete it.12:03
hoenirhttps://github.com/cloudbase/windows-openstack-imaging-tools/tree/experimental12:04
hoenirpowershell* , excuse me for saying it's python.12:05
hoenirdimitern, I tested myself before submitting the patch to upstream so I think this will not be a problem, but feel free to test it yourself if you want to.12:09
dimiternhoenir: it's not that I don't trust the patch, but so close to the release it was decided to verify all fixes more carefully12:15
mupBug #945862 opened: Support for AWS "spot" instances <adoption> <juju-core:New> <https://launchpad.net/bugs/945862>12:20
hoenirI was disconnected, so dimitern , you said something more?12:32
dimitern<dimitern> hoenir: it's not that I don't trust the patch, but so close to the12:33
dimitern           release it was decided to verify all fixes more carefully  [15:15]12:33
natefinchvoidspace: you around?13:39
voidspacenatefinch: yep13:43
natefinchvoidspace: I'm looking at this: https://bugs.launchpad.net/juju-core/+bug/153758513:45
mupBug #1537585: machine agent failed to register IP addresses, borks agent <2.0-count> <blocker> <landscape> <network> <juju-core:Triaged by natefinch> <juju-core 1.25:Triaged by natefinch> <https://launchpad.net/bugs/1537585>13:45
natefinchvoidspace: which I think is failing because this assert is failing: https://github.com/juju/juju/blob/master/state/machine.go#L123813:46
natefinchvoidspace: which git seems to say was something you worked on like 9 months ago :)13:47
natefinchvoidspace: so I'm sure it's still fresh in your mind13:47
voidspacenatefinch: :-)13:47
voidspacedimitern: ^^^^13:47
voidspacenatefinch: preferred address logic has changed recently13:48
natefinchvoidspace: my first question is - if we're just changing the address anyway, why do we care if the old address is the same as the one we expected it to be?13:48
voidspacenatefinch: the point of the assert is that we shouldn't change the preferred address once set13:48
voidspacenatefinch: we're not changing it - we should either be setting it for the first time (nil) or just setting it to the same13:49
voidspacenatefinch: that was the point of the assert13:49
voidspacenatefinch: a better way would be to just assert it's nil13:49
natefinchvoidspace: but the function one step up the stack has already checked if we're changing it: https://github.com/juju/juju/blob/master/state/machine.go#L126213:49
voidspacenatefinch: race condition - two concurrent operations changing it at the same time13:50
voidspace(set on first access - very likely to get concurrent ops here)13:50
voidspacethat's why we have the assert as well as the check13:50
natefinchvoidspace: right, so last one in wins... that's what is going to happen anyway, since we're just going to retry the transaction until it works13:50
natefinchvoidspace: or in this case, it doesn't ever work and the machine gets borked... still not sure why that is13:51
voidspacenatefinch: but that doesn't matter as it will have to be the same one - it will always fail if it tries to set a different one13:51
natefinchvoidspace: but then what is that code doing that is checking if it's changed?13:51
natefinchvoidspace: on line 1261... if changing is bad, shouldn't we just bail there?13:52
voidspaceah, no - I'm slightly mistaken13:52
voidspacewe can change if we have a better match on scope13:52
voidspacelet me look again at that assert13:52
voidspacenatefinch: hmmm... so actually - if two operations try to concurrently set a new address theen the first one will succeed and the second will fail because the address is now not "current"13:55
voidspacenatefinch: that would better be done as a buildTxn function13:56
voidspacerefreshing the doc on each attempt13:56
natefinchvoidspace: well, so, this is called from inside a buildtxn and we do refresh the doc each time13:56
voidspaceah yes, I see that now in setAddresses13:57
voidspacethe intent of the assert is that the doc hasn't changed (i.e. current address is unset or as we expect)13:58
voidspaceif that fails the first time - the address has changed, why does it fail *again*13:59
natefinchyeah... that's the question.  I think I may need to toss in a bunch of extra logging so we can see what it's being called with. That might make it more obvious what's going on.  If it's ping ponging between two or if one is just oddly failing over and over13:59
natefinchvoidspace: it's hard to repro... the landscape guys need to deploy a semi-large environment from a script to trigger it, and even then it doesn't always happen, so it's probably timing related14:01
voidspacebut it shouldn't ping pong as the address seen shouldn't actually change until the transaction is applied14:01
voidspaceso from my reading of the code even if two concurrent changes come in, one will work, the second will fail once and then either come back with no ops or succeed14:01
natefinchyeah, that's what I would expect from reading the code14:02
voidspaceso I don't understand I'm afraid14:02
natefinchvoidspace: it's ok. I'll see if I can get some better logs14:02
voidspacenatefinch: :-/ good luck14:02
natefinchvoidspace: thanks for looking at it with me though.14:02
dimiternnatefinch, voidspace: I might have some insight about that issue with setting addresses14:12
dimiternnatefinch, voidspace: I've been changing that code while trying to fix a related bug avoiding IPv6 preferred addresses14:13
dimiternnatefinch, voidspace: and I suspect a few possible causes, mainly the way the buildTxn in setAddresses is handling subsequent attempts14:14
voidspacedimitern: by the way - I have thoughts about your change to preferred addresses for ipv614:15
voidspacedimitern: I'm back to thinking we don't need to track seperate ipv4 and ipv6 preferred addresses14:15
dimiternnatefinch, voidspace: there's also maybeGetNewAddress overriding the origin and not checking whether it was selected or not (i.e. it can try setting an empty address)14:15
voidspacedimitern: you said that the intent of that field on the doc was not to change - so we needed separate ipv4 & ipv6 addresses14:16
dimiternvoidspace: told you so :)14:16
voidspacedimitern: actually the intent was that the public api (PreferredAddress) doesn't change14:16
voidspacedimitern: and your change *does* change that14:16
voidspacedimitern: so under the hood changing preferred address from ipv6 to ipv4 is *fine*14:16
voidspacedimitern: and better than adding new fields and methods14:16
dimiternvoidspace: but fortunately, it's unnecessary now as jamespage fixed the real issue with the rabbitmq charm and charm-tools (helpers?) not handling IPv6 addrs properly14:17
voidspacedimitern: hah14:17
voidspacedimitern: ok14:17
natefinchdimitern: I can get more logs to get a better idea of what we're actually setting... but it sounds like you've found some problems in the logic anyway.14:20
dimiternnatefinch: yeah, I also tried *really* hard to repro it.. no luck.. well, except for poking into mongodb14:21
natefinchdimitern: yeah, the only reliable repro I can find is landscape's deployment script that makes like 15 machines/containers at once14:22
dimiternnatefinch: e.g. if you manage to get to the mongo client shell of a running model, updating an existing machineDoc's preferredprivateaddress field to a non-nil, serialized version of state.address{}14:22
dimiternand I suspect that's what actually happens only occasionally, under heavy load like with the landscape scenario14:23
natefinchyep14:23
dimiternnatefinch: also, the upgrade step (see state.AddPreferredAddressesToMachines or whatsit) might be causing it (didn't we start doing auto-upgrading across patch-versions of the same minor version at some point?)14:28
natefinchdimitern: hmm, good question14:30
dooferladfrobware: ping?14:33
fwereadekatco, natefinch: do you know where the surprise introduced at payload/context/register.go:83 is corrected?14:45
katcofwereade: tal14:45
fwereadekatco, ty14:46
natefinchfwereade: yikes14:46
fwereadekatco, natefinch: fwiw, all I can see is payload/state/unit.go:58, which is commented out and may just be a casualty of the recent changes14:51
fwereadekatco, natefinch: it certainly seems to get all the way to state without anything checking it makes sense14:52
katcofwereade: natefinch: payload/api/helpers.go:72 looks like it overwrites it14:55
fwereadekatco, that's just the id parsed from whatever unit was sent14:56
katcofwereade: hm you are correct14:56
katcofwereade: we have standup and then we'll continue investigating14:59
fwereadekatco, thanks14:59
cmarsis there a feature branch where service is being renamed to application?15:18
alexisbcmars, yes15:19
cmarsalexisb, service-to-application, that's gotta be it :)15:19
cmarsthanks15:19
alexisb:)15:19
alexisbcmars, Ian sent me a summary of happenings as well15:19
alexisbif it is relavent to you I can forward it along15:20
cmarsalexisb, sure, that'd be great. i'm fixing up the romulus change atm, just wanted to test against juju before proposing15:20
alexisbcmars, awesome15:20
alexisbdooferlad, you still around?15:24
katcofwereade: natefinch did a test, and it's definitely getting set somewhere. i think it should be passed in here (config.UnitName) and then utilized instead of a placeholder15:24
katcofwereade: oops, there was intended to be a link there: https://github.com/juju/juju/blob/master/component/all/payload.go#L112-L11315:26
natefinchis the standard way to log into mongo different in 2.0?  I'm using mongo --ssl -u admin -p <oldpassword from agent.conf> localhost:37017/admin15:30
massIVEIn juju 2.0 mass is not listed in list-clounds, any ideas to why?15:30
natefinchmassIVE: you have to add-cloud15:30
natefinchmassIVE: it's a little cumbersome right now.  You have to make a .yaml file with the maas cloud defined in it, thusly: http://pastebin.ubuntu.com/16733874/15:35
natefinchmassIVE: then juju add-cloud mymaas myclouds.yaml15:35
dooferladalexisb: sorry, yes15:36
dooferladalexisb: for some reason my PC thought USB was a silly idea for a few minutes *confused*. I need that for typing!15:36
natefinchmassIVE: then you can juju bootstrap controllername mymaas   .. I think it'll prompt for the oauth cred at that point15:37
massIVEnatefinch: i did that, but must have done something wrong last time, works now ;)15:38
natefinchmassIVE: cool.. yeah, it's not the most user friendly experience right now.  I think we're working on simplifying it soon15:38
* dooferlad reboots again. For the fun.15:39
massIVEnatefinch: um, ERROR no registered provider for "mass"15:42
natefinchmassIVE: maas or mass?15:42
massIVElol  :) k15:42
natefinch:D15:42
* dooferlad has discovered that a USB extension cable not plugged into anything can be a bad thing15:44
alexisbdooferlad, heya, was just looking for an udpate on: https://bugs.launchpad.net/juju-core/+bug/157794515:47
mupBug #1577945: Bootstrap failed: DNS/routing misconfigured on maas <blocker> <bootstrap> <ci> <maas-provider> <network> <regression> <juju-core:In Progress by dooferlad> <juju-core 1.25:Triaged by dooferlad> <https://launchpad.net/bugs/1577945>15:47
dooferladalexisb: I am waiting for a +1 on http://reviews.vapour.ws/r/490215:48
alexisbdooferlad, cool, can you put the PR in the bug with an update please15:48
dooferladalexisb: sure15:48
alexisbvoidspace, if you are around looks like dooferlad could use a review :)15:49
dooferladvoidspace, frobware, dimitern: http://reviews.vapour.ws/r/4902/ *poke*15:53
katcofwereade: i found it... this is a bug that was introduced very recently (~10 days). trying to provide you with a walkthrough15:53
dimiterndooferlad: looking15:54
katcofwereade: nate is checking whether tip works correctly, but to me it looks like a bug was introduced here: https://github.com/juju/juju/commit/b3fb5cbc9c31#diff-c04416a1afc4fb32911bcaafcfdb48a1L2916:09
katcofwereade: (in that last block)16:10
hoenircould anyone check this PR and $$merge$$ it? http://reviews.vapour.ws/r/4924/16:11
fwereadekatco, you don't consider "a-service/0" making it safely through... context, api client, api server, state... 4 layers, before being silently overwritten in the persistence layer, to be a bug?16:12
fwereadekatco, that is pretty much just working by accident16:12
natefinchfwereade: I believe eric's intent was to remove unit from the payload struct entirely. This may have been a step in that direction, that just never got completed16:13
natefinchfwereade: tip is broken btw16:13
fwereadenatefinch, I don't really see how introducing nonsense data at runtime is anything other than... introducing nonsense data at runtime16:13
natefinchfwereade: well, certainly, I don't think this should have ever made it into master as-is16:14
katcofwereade: just trying to help you figure out where it stopped working. looks like that's it.16:14
fwereadenatefinch, my branch fixes it in state, fwiw, if I finish it today it'll be late I'm afraid16:14
dimiterndooferlad: lgtm; however, I'd appreciate steps how to verify this locally16:15
natefinchfwereade: seems like a refactor that was not completed, and was accidentally committed to master.16:16
fwereadekatco, ok, sure, but... the *persistence* layer silently overwriting one of the fields specified by state? I thought that was where you were putting the business rules -- surely that *should* be the layer that decides *what* gets persisted?16:16
katcodimitern: can you +1 this? http://reviews.vapour.ws/r/4924/16:16
dooferladdimitern: just start a node that uses DHCP as its address allocation method. dhclient will be running against the bridged interface rather than the parent. Very simple.16:17
dimiterndooferlad: and you mean DHCP not Auto Assign?16:18
katcofwereade: yes i agree. finding the commit where it broke does not imply complicit acceptance of an approach.16:18
dooferladdimitern: yes16:18
fwereadekatco, well, this situation does seem to be a direct consequence of the loose/flexible style you were advocating earlier16:23
katcofwereade: what? no, not at all16:25
fwereadekatco, heh, it struck me as well-timed and instructive16:25
katcofwereade: those two concepts aren't even connected16:25
fwereadekatco, bad data making it through 4 layers *not* connected with trusting your clients because it lets you go "fast"?16:26
katcofwereade: you are misconstruing my comment. it's not at all about not doing data validation at your layer boundaries.16:27
katcofwereade: going to go do some work, enjoy your evening.16:27
natefinchfwereade: FWIW, the only way to verify the unit info would be to hit the DB.... should we really do that at every boundary?  IMO, the only real problem is that we let it get into the DB that way.  That's the only time you really need to ensure the unit is valid.  There should have been an assert that the unit exists.16:29
=== frankban is now known as frankban|afk
fwereadenatefinch, uh, no: in the apiserver layer, you are supplied with an authoriser that tells you what entity you're connecting on behalf of. if you don't check that the connected entity is allowed to make the changes it asks for, you are just broken16:31
natefinchfwereade: yes, also a bug16:31
natefinchfwereade: sorry, gotta run to pick up my daughter from preschool... back in am hour16:31
fwereadekatco, sorry, but: (1) you don't validate, and (2) bad data gets through; ISTM that (1) => (2). do you have some alternative explanation for (2)?16:34
fwereadekatco, or: you are disagreeing with some aspect of "don't trust your clients" but not actually advocating the approach taken in payloads, despite "trusting its clients" apparently being a design principle and a source of evident problems?16:42
dimiterndooferlad: still there?16:47
dooferladdimitern: not really17:06
rogpeppe1haven't done this for a while, friday fun: here's a stab at a complete auto-generated representation of the Juju API (RPC calls only) http://paste.ubuntu.com/16739140/18:12
natefinchrogpeppe1: call it FacadeName so it prints out above Methods :)18:13
natefinchrogpeppe1: very cool18:13
rogpeppe1natefinch: i should change rjson so that it preserves ordering18:14
natefinchrogpeppe1: even better18:14
natefinchrogpeppe1: also, man, that's huge.18:16
natefinchrogpeppe1: I guess it includes definitions of even stdlib types, though18:16
rogpeppe1natefinch: yeah18:16
rogpeppe1natefinch: here's the code i used to generate it: http://paste.ubuntu.com/16739827/18:16
rogpeppe1natefinch: anyway, early days yet. at some point, i'll add doc comments too, and produce linked HTML output :)18:18
rogpeppe1natefinch: then we might actually have something like an API doc...18:18
natefinchrogpeppe1: that would be amazing18:18
rogpeppe1natefinch: i don't think it would be more than half a day's work18:19
rogpeppe1natefinch: anyway, gotta go and frolic18:19
rogpeppe1natefinch: cheerio, and thanks for being interested :)18:19
natefinchrogpeppe1: definitely. happy weekend!18:19
alexisbrogpeppe1, this is awesome18:26
alexisbrogpeppe1, you would be my hero!18:26
mupBug #1584815 changed: SSHSuite.TestSSHCommand fails on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1584815>18:45
mupBug #1585388 changed: Container networking cannot ssh after machine is ready <blocker> <ci> <lxc> <lxd> <network> <regression> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1585388>18:46
mupBug #1584815 opened: SSHSuite.TestSSHCommand fails on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1584815>18:49
mupBug #1585388 opened: Container networking cannot ssh after machine is ready <blocker> <ci> <lxc> <lxd> <network> <regression> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1585388>18:49
hoenirSo does this build not failed? https://github.com/juju/juju/pull/544918:49
alexisbhoenir, it looks like htat did merge, there were some issues with the bot yesterday, so it is a bit clear to me what happened there18:55
hoeniraha, so everything is ok then..18:56
mupBug #1584815 changed: SSHSuite.TestSSHCommand fails on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1584815>19:13
mupBug #1585388 changed: Container networking cannot ssh after machine is ready <blocker> <ci> <lxc> <lxd> <network> <regression> <juju-core:Fix Released by menno.smits> <https://launchpad.net/bugs/1585388>19:13
mupBug #1586512 opened: juju2 websocket api response consistency APIHostPorts versus Login response <landscape> <usability> <juju-core:New> <https://launchpad.net/bugs/1586512>19:16

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!