wallyworld | axw: ah, i think i understand what you might be saying - snap lxd daemon picks up the request from juju rather than deb lxd daemon and hence the wrong lxd get used. i thought setting LXD_DIR was supposed to cause the right lxd to get used (at least that's what nicholas thought). | 00:16 |
---|---|---|
menn0 | wallyworld: I just realised that I hit the same thing with the conjure-up snap last week. it ships with its own version of juju and has LXD issues. | 00:22 |
wallyworld | menn0: yeah. adam and nick know about it. not sure exactly what their solution is. my system is still somewhat screwed even after removing lxd snap | 00:22 |
wallyworld | need to look at it again today | 00:23 |
menn0 | wallyworld: do you need to run sudo lxd init again? | 00:23 |
wallyworld | that's my last resort, yeah | 00:23 |
=== ben__ is now known as benk01 | ||
axw | wallyworld: PR to update azure regions: https://github.com/juju/juju/pull/6969. seeing as we need to get the public-clouds.yaml updated anyway, figured we may as well try and get this one in too? | 01:25 |
wallyworld | sure | 01:25 |
wallyworld | axw: sorry about delay. lgtm but there's a block on landing it due to current QA policy. there's a meeting tomorrow where that will be fixed | 01:45 |
axw | wallyworld: thanks | 01:46 |
menn0 | wallyworld: were you running into this at bootstrap? 2017-02-13 03:05:33 ERROR cmd supercommand.go:458 new environ: Get https://10.0.8.1:8443/1.0: x509: certificate has expired or is not yet valid | 03:13 |
menn0 | anastasiamac: or is this what you saw? ^^ | 03:13 |
wallyworld | menn0: similar, yeah, can't recall exact wording | 03:13 |
menn0 | its happening for me r | 03:14 |
menn0 | too | 03:14 |
menn0 | axw: could ^^^ be related to the lxd cert caching change? | 03:14 |
anastasiamac | menn0: i did not see this. wallyworld may have... just fixed my lxd to work over the weekend. my failures were related to lxc profile misconfiguration | 03:14 |
wallyworld | menn0: i've had to totally purge lxd, manually remove neworks and ip links, lxd init again etc. still not there though - juju can't talk to container | 03:15 |
axw | menn0: maybe. or the snap. are you using the snap? | 03:15 |
menn0 | no snaps involved | 03:15 |
axw | menn0: you haven't installed the juju or lxd snaps? | 03:16 |
menn0 | not on this machine | 03:16 |
axw | menn0: ok. seems most likely related to my changes then. it doesn't happen to me though. can you try and isolate it? | 03:17 |
menn0 | axw: sure. where's the cache file? | 03:17 |
axw | menn0: juju will pull certs out of either ~/.config/lxc or ~/.local/share/juju/lxd | 03:18 |
menn0 | axw: I don't have ~/.local/share/juju/lxd | 03:19 |
axw | menn0: either/or. if that one doesn't exist, it'll look in ~/.config/lxc | 03:19 |
axw | (I don't have ~/.local/share/juju/lxd either) | 03:19 |
menn0 | axw: lxc is working fine by itself | 03:20 |
menn0 | axw: I just launched a container with "lxc launch" | 03:20 |
axw | menn0: lxc will use the unix socket locally | 03:21 |
axw | menn0: so HTTPS won't factor at all | 03:21 |
menn0 | axw: ok right | 03:21 |
menn0 | axw: moving the .config/lxc directory out of the way doesn't fix things | 03:31 |
jam | thumper: menn0: coming by to say hi? | 03:31 |
thumper | jam: yeah | 03:32 |
menn0 | jam: coming | 03:32 |
axw | menn0: can you try adding some logging to finalizeLocalCertificateCredential in provider/lxd/credentials.go? we should be generating a new cert, uploading it to lxd, and then using that | 03:35 |
axw | wallyworld: can you pastebin the output of "lxc config trust list" for me? | 03:41 |
axw | wallyworld menn0: one possibility is that the certs in ~/.config/lxc are expired. mine expire in 2026... | 03:43 |
menn0 | axw: but I moved them out of the way and the symptoms stayed the same? | 03:43 |
axw | menn0: ok, weird | 03:43 |
axw | menn0: even that doesn't repro for me | 03:45 |
axw | menn0: which version of ubuntu, lxd? | 03:45 |
menn0 | axw: well those certs *have* just expired | 03:47 |
menn0 | Validity | 03:47 |
menn0 | Not Before: Feb 10 02:21:23 2016 GMT | 03:47 |
menn0 | Not After : Feb 9 02:21:23 2017 GMT | 03:47 |
menn0 | lxd 2.0.8, xenial | 03:47 |
axw | menn0: did you add a lxd cert credential to credentials.yaml? by autoload-credentials perhaps? | 03:49 |
menn0 | axw: no lxd creds in credentials.yaml | 03:50 |
axw | menn0: well if you moved the certs, I can't see how their expiry matters :/ | 03:51 |
menn0 | axw: unless there's something inside the lxd daemon? | 03:53 |
axw | menn0: maybe the server cert is the thing that has expired? | 03:54 |
axw | menn0: the client credential changes could be a coincidence. seeing as the client certs you had have expired, possibly the server cert has too | 03:55 |
menn0 | axw: I think you're right. the problem happens with juju 2.0.3 too | 03:56 |
axw | menn0: ok, cool | 03:57 |
* axw wonders how to regen | 03:57 | |
axw | menn0: looks like if you delete /var/lib/lxd/server.{crt,key}, lxd will recreate them on startup | 03:59 |
axw | wallyworld: ^^ | 04:00 |
menn0 | axw: they've expired too. that's got to be the problem. | 04:00 |
axw | menn0: terribly confusing coindidence :) | 04:00 |
wallyworld | axw: yeah, i did that earlier, full lxd reinstall and init was also needed for me | 04:10 |
wallyworld | due to wierd network issues | 04:10 |
=== frankban|afk is now known as frankban | ||
jam | wallyworld: ping if you're around | 12:29 |
wallyworld | hey | 12:29 |
perrito666 | morning you two | 12:30 |
wallyworld | evening here :-) | 12:30 |
perrito666 | wallyworld: any urgent bug that was left last night? otherwise ill just pick from the pile | 12:33 |
wallyworld | perrito666: this one is important https://bugs.launchpad.net/juju/+bug/1623217 | 12:35 |
mup | Bug #1623217: juju bundles should be able to reference local resources <juju:Triaged> <https://launchpad.net/bugs/1623217> | 12:35 |
wallyworld | not sure if it can be done in time | 12:36 |
wallyworld | ie not sure of the scope of any change, haven't looked into it | 12:36 |
wallyworld | jam: can i help with something? | 12:40 |
jam | on bug #1577556 they just mentioned that they saw a 'statuses' doc that didn't have a txn-revno. I was a bit confused but it looks like statuseshistory is where we don't use TXNs but 'statuses' we *should* be using txns, right? | 12:41 |
mup | Bug #1577556: unit failing to get unit-get private-address in the install hook <intermittent-failure> <network> <uosci> <OpenStack Charm Test Infra:Confirmed> <juju:Triaged> | 12:41 |
mup | <juju-core:Triaged> <juju-core 1.25:Triaged> <ubuntu-openstack-ci:Triaged> <mysql (Juju Charms Collection):Fix Released> <https://launchpad.net/bugs/1577556> | 12:41 |
jam | wallyworld: sorry, wrong bug. actual is bug #1484105 | 12:43 |
mup | Bug #1484105: juju upgrade-charm returns ERROR state changing too quickly; try again soon <bug-squad> <canonical-is> <upgrade-charm> <upgrade-juju> <juju-core:Fix Released> <https://launchpad.net/bugs/1484105> | 12:43 |
perrito666 | wallyworld: tx | 12:44 |
wallyworld | jam: yeah, status hisotry avoids txns. but the status doc itself in the status collection should use txn | 12:44 |
perrito666 | afaik the regular status collection does use txns | 12:45 |
wallyworld | jam: all usages of statusDoc should be in the context of a txn.Op slice | 12:46 |
jam | perrito666: right, its supposed to, I was worried we had a case where sometimes we were and sometimes we weren't. | 12:47 |
jam | but they are reliably (?) hitting a case where the statuses docs are missing the txn-revno, which is bad mojo for the TXN logic | 12:47 |
perrito666 | jam: odd, I wonder if someone confused status with status history | 12:49 |
perrito666 | jam: the latest status added was model status iirc | 12:49 |
wallyworld | jam: i just did a (quick) code search and can't see anywhere where we are writing to the status collection not using txns | 12:49 |
jam | perrito666: wallyworld: yeah, I grepped the code as well. It does mention an assert that "txn-revno == 0" which would indicate we tried to read it, got back 0 and then said "well, its gotta stay 0 for this updaet" | 12:50 |
wallyworld | jam: you talking about assert := bson.D{{"txn-revno", txnRevno}} in statusSetOps() I assume | 12:51 |
jam | wallyworld: comment #16 on the bug | 12:51 |
jam | there is an enttry in txn queue that says: "a": { "txn-revno": NumberLong(0) } | 12:52 |
jam | my guess is that it read the doc, saw there was no txn-queue so got the 'zero' value and then put that back into the assert. | 12:52 |
jam | So I don't think that is what *caused* the problem in the first place, just those txns all fall over because the data is bad. | 12:52 |
jam | we don't really know what caused it to be bad in the beginning. | 12:53 |
wallyworld | jam: juju uses the txn-revno assert in a set status function (this pattern isused elsewhere too, eg updating ports). but i think that relies on a create being done first to insert the original doc and set the txnrev-no field. and we do have a createStatusOp. i can't see how we would attempt to set a status value on a doc with a given doc id without having done a create first | 12:56 |
wallyworld | and that create should insert the txnrev-no field | 12:56 |
jam | wallyworld: (a) race? (b) if we're creating the doc with an upsert, we're still using the txn logic, and the mgo/txns is the thing that keeps txn-revno correct | 12:57 |
jam | we're just using an assert to say "if this doc is changed underneath us, ignore this change" | 12:57 |
wallyworld | that matches my understanding | 12:58 |
perrito666 | jam: just a wild idea, do you think this could be caused by mgopurge? | 12:58 |
wallyworld | we create the doc with an insert and a doc not exists assert | 12:59 |
perrito666 | wallyworld: https://bugs.launchpad.net/juju/+bug/1623217 seems like it will need some spec-ing and agreement from stakeholders | 12:59 |
mup | Bug #1623217: juju bundles should be able to reference local resources <juju:Triaged> <https://launchpad.net/bugs/1623217> | 12:59 |
wallyworld | probably, i just saw it and it looked important | 13:00 |
wallyworld | should move to 2.2 | 13:00 |
perrito666 | wallyworld: it has been like that since december, I dont think it is that urgent that we can skip proper procedure :) | 13:00 |
perrito666 | we should get someone started on that spec though | 13:01 |
wallyworld | fair enough, i just skimmed it | 13:01 |
perrito666 | true, was just a comment not a rant | 13:01 |
rick_h | perrito666: the goal there is just to allow local file paths like local charms | 13:01 |
rick_h | perrito666: so hopefully a short spec | 13:01 |
perrito666 | rick_h: yes | 13:02 |
jam | rick_h: perrito666: not sure how much of a spec it should have, vs you already have a syntax for 'use this local charm' ./path, we just support that for a resource blob as well | 13:09 |
rick_h | jam: +1 | 13:11 |
jam | perrito666: short enough spec for you ? :) | 13:11 |
perrito666 | jam: sure it is, if annyone comes asking ill post that line :p | 13:13 |
=== plars-away is now known as plars | ||
=== perrito667 is now known as perrito666 | ||
jam | perrito666: I didn't think that was actually going into 2.1, and doing it post rc feels a bit late, but if it isn't terribly hard, it is a nice quality-of-life for a bunch of people | 13:38 |
perrito666 | jam: I dont think ill be able to fit this into 2.1 most likely tonight ill move that to 2.2 | 13:48 |
=== balloons26 is now known as balloons | ||
=== frankban is now known as frankban|afk | ||
Dmitrii-Sh | https://github.com/cloud-green/juju-relation-mongodb/pull/6 cmars, mattyw - JFYI: sent out a couple of patches for the mongodb interface | 18:49 |
thumper | morning folks | 19:19 |
* perrito666 touches tip of hat | 19:25 | |
mattyw | Dmitrii-Sh, hey there, thanks very much, will take a look | 19:25 |
=== elmo_ is now known as elmo | ||
tasdomas | could there be a reason why juju 1.25.6 fails to bootstrap an aws model with authentication failure, while juju 2.0.0 works fine (with the same credentials) | 19:45 |
thumper | tasdomas: no idea sorry | 19:48 |
mup | Bug #1664359 opened: Authentication fails for juju 1.25.6 on aws <juju-core:New> <https://launchpad.net/bugs/1664359> | 20:27 |
wallyworld | perrito666: thumper: IMO the proposed test fix for that status history test is too lenient - it throws away the notion of the order of the expected status messges | 23:09 |
wallyworld | the fix should have been to be lenient with the expected count of messages, not the order | 23:10 |
perrito666 | wallyworld: I can very well send a follow up, since the test belo does test the order I thought it was not really that important | 23:15 |
wallyworld | it does but the two tests are ostensibly similar - one uses a filter, the other doesn't. they should essentially test the same things the same way | 23:17 |
wallyworld | just with nd without filter | 23:17 |
perrito666 | yes, but there is a race there, that infrastructure was built by william but he did not notice that by inserting the records with identical dates the ordering is non deterministic | 23:19 |
perrito666 | two items with the same date can be returned in any given order, and that is ok | 23:20 |
perrito666 | because in reality it will never happen that two items have the same date to the nanosecond | 23:20 |
wallyworld | ah, i see, that helps explain it a bit | 23:24 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!