/srv/irclogs.ubuntu.com/2013/11/26/#juju-dev.txt

wallyworld_davecheney: question for you01:33
davecheneywallyworld_: shoot01:33
wallyworld_on trunk, i try this: juju set-env some-bool-field=true01:34
wallyworld_it fails01:34
wallyworld_expected bool, got string("true")01:34
davecheneyo_o01:34
wallyworld_have you seen that?01:34
davecheneyi haven't use that command01:35
davecheneycertainly never with bool fields01:35
davecheneydo we even support them ?01:35
davecheneywhich charm ?01:35
wallyworld_there's code there to parse a string to bool, but it appears to not be called at the right place01:35
wallyworld_this is setting an env config value01:35
davecheneyahh01:35
davecheneyi bet nobody ever tried01:35
davecheneycf. the horror show that is the environment config01:35
davecheneyand updating it after the fact01:35
wallyworld_yeah, appears so :-(01:36
davecheneytime for a bug report01:36
wallyworld_or it could be fallout from moving to api01:36
davecheneycould be01:36
davecheneythe only bool env field I know of is01:36
davecheneyuse-ssl01:36
wallyworld_thanks, just wanted to check before raising a bug01:36
davecheneyor the use-insecure-ssl01:36
davecheneyi think you've got a live one01:36
wallyworld_there's also development01:36
wallyworld_]and a new one i am doing01:36
wallyworld_provisioner-safe-mode01:36
wallyworld_which will tell provisioner not to kill unknown instances01:37
davecheneywallyworld_: i think nobody has ever tried to change a boolean env field after deployment01:37
wallyworld_:-(01:37
davecheneywe've only even had that use insecure ssl one and you need that to be set for bootstrapping your openstack env01:38
wallyworld_ok, ta. bug time then01:38
davecheneysinzui: any word on 1.16.5 / 1.17.0 ?02:26
jamwallyworld_, axw: if you're unable to get into garage MaaS, you could probably ask bigjools nicely if you can use his equipment.02:47
wallyworld_jam: he's been busy supporting site02:48
wallyworld_and only has small micro servers02:48
jamaxw: from what I inferred when Nate got access, it was essentially smoser just "ssh-import-id nate.finch" as the shared user on that machine.02:48
jamwallyworld_: sure, but I don't think we're testing scaling, just that the backup restore we've put together works02:49
jamw/ MaaS02:49
wallyworld_sure, but we need at least 2 virtual instances, not sure how well that will be handled02:49
jamwallyworld_: well, I wasn't suggesting using VMs on top of his MaaS, just using the MaaS02:50
bigjoolswallyworld_ you already have access02:51
wallyworld_yes i do. i was waiting for your on site support effortd to wind down02:51
bigjoolsconsider it down02:51
wallyworld_you seemed stressed enough, didn;t want to add to it02:51
jamhi bigjools02:51
bigjoolswhen the guy you're helping f*cks off mid-help, I consider it done.02:51
jamouch02:51
axw:/02:52
bigjoolswallyworld_: you could come round as well if you want direct access02:52
wallyworld_so i'm currently working on one of the critical bugs02:52
jambigjools: so do you know someone who already has Garage MaaS access to the shared user? From what I can tell the actual way you get added is by adding your ssh key to the "shared" account02:52
wallyworld_was hoping to get that done before i looked at the restore doc02:52
jam"needing to be in the group" seems like a red herring02:53
bigjoolsjam: I have access, want me to add anyone?02:53
wallyworld_me and axw :-)02:53
axwme please02:53
jambigjools: axw, wallyworld_, and ?02:53
bigjoolsheh02:53
bigjoolslp ids please02:53
wallyworld_wallyworld02:53
axw~axwalk02:53
jambigjools: I haven't done the other steps, but ~jameinel is probably good for my long term health02:53
bigjoolsok you're all in02:55
bigjoolsI am having a lunch break, if you need me wallyworld_ can just call me02:56
axwhooray. thanks bigjools02:56
bigjoolsnp02:56
jamaxw: the other bit that I've seen, is that you might have a *.mallards.com line in your .ssh/config with your normal user, but you need to still use the other User shared02:57
jamif the *.mallards line comes first, it overrides the individual stanza02:58
axwjam: I explcitly tried logging in as shared@02:58
wallyworld_i can ssh in now02:58
axwit works for me now too02:58
jamaxw: so I think you have to be in the iom-maas to get into loquat.canonical.com, but to get into maas.mallards you just get added to the shared account02:59
axwthat would seem to be the case03:00
jamaxw: as in, I'm trying and can't get to loquat03:00
jamaxw: can you update the wiki?03:00
axwjam: right, you need to get IS to do that03:00
jamI would get rid of the "host maas.mallards" line in favor of the *.mallards line03:00
axwjam: sure - "step 3: ask bigjools to add you to the shared account"? ;)03:00
jamaxw: ask someone who has access to run "ssh-import-id $LPUSERNAME"03:01
jamas the shared user03:01
jamaxw: Hopefully we can make it a big enough warning for IS people to realize they aren't managing that acccount03:01
jamthanks for setting them up bigjools03:03
* jam is off to take my son to school03:04
davecheneyping -> https://code.launchpad.net/~dave-cheney/goose/001-move-gccgo-specific-code-to-individual/+merge/19664303:50
davecheneyaxw: thanks for the review03:55
davecheneynow I can close this issue03:55
axwnps03:55
jamdavecheney: you forgot to set a commit message on lp:~dave-cheney/goose/goose04:04
jamhttps://code.launchpad.net/~dave-cheney/goose/goose/+merge/19647104:04
jamI'll put one in there04:04
jamit should get picked up in 1 min04:05
jamand then you can approve your above branch04:05
davecheneyah04:06
davecheneythanks04:06
davecheneyi was wondering whta was going on04:06
davecheneyi didn't realise you added the commit message for me04:06
* axw froths at the mouth a little bit04:15
axwwtf is going on with garage maas04:15
=== philipballew is now known as philip
jamaxw: isn't maas server supposed to be localhost given Nate's instructions?05:30
jamYou're generally supposed to be running a KVM (virtual MaaS) system just on one of the nodes05:30
jamin Garage Maas05:31
jamthe main reason we use g-MaaS is because the nodes there have KVM extensions and are set up for it05:31
jamin theory you could do it on your personal machine05:31
axwjam: maas-server gets inherited by the nodes05:31
axwthey'll just try to contact whatever you put in there05:32
axw(e.g. localhost)05:32
axwyou need to put in an absolute address05:32
jamaxw: ah, sure. So 10.* whatever, but not 'localhost'05:33
axwyup05:33
jamaxw: I can imagine that maybe bootstrap works, or some small set of things, but then it doesn't actually work together05:33
axwseems like the provider should be able to figure it out itself, but I dunno the specifics05:33
axwjam: bootstrap doesn't even work- the node comes up, but the cloud-init script tries to grab tools from localhost05:34
jamaxw: well "juju bootstrap" pre-synchronous works, right? Just nothing else does :)05:35
jam"the command runs and exits cleanly"05:35
axwyes :)05:35
jamwallyworld_: how's bug #1254729 coming?06:02
_mup_Bug #1254729: Update Juju to make a "safe mode" for the provisioner <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1254729>06:02
davecheneyjam: we hit  small bug where juju set-env something-boolean={true,false}06:03
davecheneydidn't work as expected06:03
jamI saw that part, didn't know you were working on it with him06:03
davecheneyi think wallyworld_ is in that rabbit hold atm06:03
wallyworld_jam: been stuck on some stuff inside the provisioner task. i think i've got a handle on it. issues with knowing about dead vs missing machines06:03
davecheneywhen I saw,06:03
jamYou could cheat and make it an int06:03
davecheneyi mean wallyworld_06:03
davecheneyand when i say we, i mean ian06:04
wallyworld_yeah me06:04
jam:)06:04
* davecheney ceases to 'help'06:04
jamwallyworld_: so you mean we "should kill machines that are marked dead" but not "machines which are missing" ?06:04
jamdavecheney: thanks for being supportive06:04
wallyworld_yeah06:04
wallyworld_sort of06:05
wallyworld_we have a list of instance ids06:05
jamwallyworld_: I'm guessing thats "we asked to shutdown a machine, wait for the agent to indicate it is dead, and then Terminate" it06:05
wallyworld_and knowing which of those are dead vs missing is the issue, due to how the code is constructed06:05
jambut we were detecting that via a mechanism that wasn't distinguishing an instance-id we don't know about from one that we asked to die06:05
jamwallyworld_: I don't think you mean "missing", I think you mean "extraneous"06:06
wallyworld_yeah06:06
wallyworld_the code was destroying the known instance id too soon06:06
jamagent for $INSTANCE-ID is now Dead => kill machine, unknown INSTANCEID => do nothing.06:06
axwjam: I've just started a new instance in MAAS manually - shouldn't machine-0 be killing it?06:26
axwit's been there for a little while now, still living06:26
jamaxw: you're using 1.16.2+ ?06:26
axwjam: 1.16.306:26
jamaxw: did you start it manually using the same "agent_name" ?06:26
axwjam: yeah, I used my juju-provision plugin06:27
axwjam: do you know how I can confirm that it's got the same agent_name?06:27
jamaxw: some form of maascli node list06:27
jamaxw: it has been a while for me, might want to ask in #maas06:28
jamjtv and bigjools should be up around now06:28
axwnodes list doesn't seem to show it06:28
axwok06:28
jamaxw: if nodes list doesn't list it, it sure sounds like it isn't running06:28
axwjam: no I mean it doesn't show agent_name06:29
axwthe node is there in the list06:29
jamaxw: try "maascli node list agent_name=XXXXX"06:30
jamit looks like it isn't rendered, but if supplied it will be used as a filter06:30
axwthat worked06:32
axwjam: the new one does have the same agent_name06:33
jamaxw: so my understanding is that we only run the Provisioner loop when we try to start a new unit. You might try add-unit or something and see if it tries to kill of the one you added06:33
axwah ok06:33
axwthanks06:33
jamaxw: did it work?06:38
axwjam: not exactly; I tried to deploy to an existing machine. it only triggers if a machine is added or removed06:38
axwmakes sense06:38
axwanyway, it was removed06:39
axwso I'll go through the rest of the steps now06:39
jamaxw: so I've heard talk about us polling and noticing these things earlier, but with what ian mentioned it actually makes sense06:39
jamthe code exists there to kill machines that were in the environment but whose machine agents were terminated06:39
axwyup06:40
jamand it had the side effect of killing machines it never knew about06:40
jamwhich we decided to go with06:40
* axw watches paint dry06:50
jamaxw: ?06:54
axwprovisioning nodes does not seem to be the quickest thing06:54
jamaxw: provisioning in vmaas I would think would be reasonably quick, no?06:55
axwjam: it's likely the apt-get bit that's slow, but *shrug*06:55
axwit's definitely not quick06:56
axwI will investigate later06:56
davecheneyaxw: the fix for that is to smuggle the apt-cache details into your environment06:58
davecheneyhowever when you're on one side of the world06:58
davecheneyand the env is on the other06:58
davecheneyit's unlikely that there is a good proxy value that will work for both you and your enviornment06:58
jamdavecheney: garage maas is in Mark S's garage, so I think it would be both reasonably close and have decent bandwidth to the datacenter (I could be completely wrong on that)07:19
* jam heads to the grocery store for a bit07:31
fwereademgz, rogpeppe: any updates re agent-fixing scripts?07:58
rogpeppefwereade: i've got a script that works, but i don't know whether mgz wanted to use it or not08:23
rogpeppefwereade: i phrased it as a standalone program rather than a plugin, but that wouldn't be too hard to change08:23
fwereaderogpeppe, I don't see updates to the procedure doc explaining exactly how to fix the agent and rsyslog configs08:23
fwereaderogpeppe, documenting exactly how to fix is the most important thing08:24
fwereaderogpeppe, scripting comes afterwards08:24
fwereaderogpeppe, sorry if that wasn't clear08:24
rogpeppefwereade: ah, ok, i'll paste the shell scripty bits into the doc08:24
fwereaderogpeppe, <308:24
axwfwereade: I just finished running the process (manually) on garage MAAS08:25
axwI keep writing garaage08:25
axwanyway08:25
axwall seems to be fine08:25
axwI missed rsyslog, now that I think of it08:25
fwereadeaxw, ok, great08:25
axwfwereade: sent out an email with the steps I took08:26
fwereadeaxw, if you can be around for a little bit, would you follow rog's instructions for fixing those please, just for independent verification?08:26
axwfwereade: sure thing08:26
fwereadeaxw, so did the addressupdater code not work?08:27
axwfwereade: the what?08:27
fwereadeaxw, you said you fixed addresses in mongo08:27
axwah, maybe I didn't need to do that bit?08:27
fwereadeaxw, rogpeppe: addresses should update automatically once we're running08:27
axwok08:27
fwereaderogpeppe, can you confirm?08:28
rogpeppefwereade, axw: it seemed to work for me08:28
axwrogpeppe: no worries, I was just poking in the database and thought I'd have to update - I'll put a comment in the doc that it was unnecessary08:29
rogpeppefwereade: hmm, i realised i fixed up the rsyslog file, but didn't do anything about restarting rsyslog...08:30
fwereadeaxw, well, technically, we don't know it was unnecessary08:31
fwereadeaxw, rogpeppe: I am a little bit baffled that the "one approach" notes seem to have been used instead of the main doc08:32
rogpeppefwereade: i didn't suggest that08:32
axwfwereade: my mistake, I just picked up the wrong thing08:32
fwereaderogpeppe, I know you didn't suggest that bit08:32
rogpeppefwereade: i thought dimitern had some notes somewhere, but i haven't seen them08:33
fwereaderogpeppe, they're linked in the main document08:33
fwereaderogpeppe, axw, dimitern: fwiw I have no objection to writing your own notes for things, this is good08:33
axwfwereade: just trying to fill in the hand wavy "do X in MAAS" bits :)08:34
fwereaderogpeppe, axw, dimitern: but if they don't filter back into updates to the main doc -- and if they're left lying around without a big link to the canonical one -- we end up with contradictory information smeared around everywhere08:34
axwsure08:34
fwereaderogpeppe, axw, dimitern: eg axw trying to use rogpeppe's incorrect mongo syntax08:35
rogpeppefwereade: tbh dimitern's isn't quite right either, currently08:36
rogpeppedimitern: shall i update it to use $set ?08:36
fwereaderogpeppe, dimitern: fixing your notes is fine if you want08:36
rogpeppefwereade: my notes were fixed when you mentioned the problem FWIW08:36
axwfwereade: I'll run through the main doc and see if I can spot any problems08:37
rogpeppefwereade: it was just a copy/paste failure08:37
fwereaderogpeppe, dimitern, axw: but the artifact we're meant to have *perfect* by now is the main one08:37
fwereaderogpeppe, I don't mind what notes you make, so long as it's 100% clear that they're not meant to be used by anyone else, and they link to the canonical document08:37
fwereaderogpeppe, and I'm pretty sure mramm and I were explicit about using something that understands yaml to read/write yaml files08:38
fwereaderogpeppe, sed, for all its joys, is not aware of the structure of the document;)08:39
rogpeppefwereade: does it actually matter in this case? we know what they look like and how they're marshalled, and the procedure leaves everything else unaffected - it's pretty much what you'd do using a text editor08:39
rogpeppefwereade: i wanted to use something that didn't need anything new installed on the nodes08:40
dimiternfwereade, sorry, just catching up on emails08:40
dimiternfwereade, yes, the $set syntax should work08:40
rogpeppefwereade: and i'm not sure that there's anything yaml-savvy there by default08:40
fwereaderogpeppe, crikey08:41
fwereaderogpeppe, well if that's the case I withdraw my objections08:41
axwrogpeppe: pyyaml is required by cloud-init, so it's on there08:42
fwereaderogpeppe, objections backin force08:42
axwbut... IMHO sed is fine here08:42
* axw makes everyone hate him at the same time08:43
* rogpeppe leaves it to someone with less rusty py skills to do the requisite yaml juggling08:43
* fwereade flings stones indiscriminately08:43
jamaxw: did you check with anyone in #maas if maas-cli still doesn't support uploading? The post from allenap was from June (could be true, and you could have experienced it first hand)08:43
fwereaderogpeppe, did you hear from mgz at all yesterday?08:43
axwjam: the bug is still open, so I didn't08:43
axwbut08:43
axwI couldn't get it to work08:44
rogpeppefwereade: briefly - he'd been offline, but i didn't see his stuff08:44
jamfwereade: mgz posted his plugin to the review queue08:45
axwfwereade: I'll just update the address in mongo back to something crap and make sure the addressupdater does its job; so far the main doc is fine, tho I had to add the quotes into the mongo _id value filters08:46
jamaxw: as long as its "I tried and couldn't, then I found the bug" I'm happy. vs if it was "I found the bug, so I didn't try"08:46
axwjam: defintely the former :)08:47
fwereadeaxw, thanks for fixing the main doc :)08:48
axwnp08:48
fwereadeaxw, and let me know if the address-updating works as expected08:48
axwwill do08:48
bigjoolsjam, axw: it doesn't support uploading still08:49
axwbigjools: thanks for confirming08:50
jamthanks bigjools08:50
axwfwereade: confirmed, addressupdater does its job08:51
axwsorry for the confusion08:51
jamaxw: did you have to set LC or LC_ALL when doing mongodump ?08:52
jamaxw: or is it (possibly) set when you ssh into things08:53
axwjam: I did not, but I didn't check if it was there already; I'll check now08:53
jamthx08:54
axwnot set to anything08:54
axwdunno why it didn't affect me08:54
jamaxw: one thought is that you only have to set it if you don't have the current lang pack installed (which a cloud install may not have) ? not really suer08:55
fwereadejam, rogpeppe: hey, I just thought of something08:58
rogpeppefwereade: oh yes?08:58
jam?08:58
fwereadejam, rogpeppe: we should probably be setting *all* the unit-local settings revnos to 008:59
rogpeppefwereade: i thought of something similar yesterday actually, but not so nice08:59
rogpeppefwereade: that would be a good thing to do08:59
fwereaderogpeppe, yeah, it was inspired by your comments yesterday, it just took a day for it to filter through09:00
jamfwereade: I don't actually know what revnos you are talking about. Mongo txn ids?09:00
rogpeppefwereade: that gets you unit settings, but what about join/leave?09:00
rogpeppejam: the unit agent stores some state locally09:00
rogpeppejam: so that it can be sure to execute the right hooks, even after a restart09:00
fwereaderogpeppe, join/leave should be good, the hook queues reconcile local state against remote09:01
rogpeppefwereade: great09:01
rogpeppefwereade: do config settings need anything special?09:01
fwereaderogpeppe, config settings should also be fine thanks to the somewhat annoying always-run-config-changed behaviour09:01
fwereaderogpeppe, we have a bug for that09:01
rogpeppefwereade: currently we can treat it as a useful feature :-)09:02
fwereaderogpeppe, indeed :)09:02
jamaxw: when you did your testing, did you start machine-0 before updating the agent address in the various units?09:03
rogpeppefwereade: it would be interesting to try to characterise the system behaviour when restoring at various intervals after a backup09:03
axwjam: no, I started it last09:03
rogpeppefwereade: e.g. when the unit/service was created but is not restored09:04
axwjam: sorry, I'll add that step in :)09:04
axwjam: actually09:04
axwI lie09:04
axwI did start it first09:04
rogpeppefwereade: i suspect that's another case where we really don't want to randomly kill unknown instances09:04
fwereadeaxw, dimitern, rogpeppe, mgz, *everyone* -- *please* be *doubly* sure that you test the canonical procedure09:04
jamaxw: actually we *wanted* to do it last09:05
fwereaderogpeppe, well, there's no way to restore those things at the moment anyway09:05
fwereadejam, why?09:05
jamaxw: so update machine-0 config, start it, then go around and fix the agent.conf09:05
dimiternfwereade, ok, i'm starting a fresh test with the canonical procedure now09:05
jamfwereade: didn't you want to split "fixing up mongo + machine-0" from "fixing up all other agents" ?09:05
rogpeppefwereade: agreed, but the user might have important data on those nodes09:05
axwjam: yeah that's what I did, sorry09:06
jamaxw: sorry, "when I say do it last" it was confusing what thing "it" is09:06
jamaxw: start jujud-machine-0  should come before updating agent.conf09:06
fwereadejam, I think we suffered a communication failure -- you seemed to be suggesting he should fix agent confs before starting the machine 0 agent09:06
jamthanks09:06
jamfwereade: yes. I think we all agree on what should be done :)09:07
axwjam: I fixed mongo, started machine-0, fixed provider-state, fixed agent.conf09:07
jamaxw: I'm copying some of your maas specific steps into the doc09:07
axwcool09:07
fwereaderogpeppe, this is true, hence https://codereview.appspot.com/32710043/ -- would you cast your eyes over that please?09:07
rogpeppefwereade: looking09:08
fwereaderogpeppe, there's not much opportunity to fix them, it's true09:08
axwfwereade: I used my plugin to provision the new node; how are people expected to do it without it (and get a valid agent_name)?09:08
fwereaderogpeppe, and the rest of the system should anneal so as to effectively freeze them out09:08
jamrogpeppe: fwereade: are we actually suggesting run "initctl stop" rather than just "stop foo" ?09:09
fwereadejam, I don't think so09:09
jamwe do it differently at different points in the file09:09
rogpeppefwereade: yeah09:09
fwereadejam, where didinitctlcomefrom?09:09
jam"sudo start jujud-machine-0" but09:09
jam"for agent in *; do initctl stop juju-$agent"09:09
jamfwereade: in the main doc, I think rogpeppe put it09:09
jamI'll switch it09:09
rogpeppejam: i generally prefer "initctl stop" rather than "stop" as i think it's more obvious, but that's probably just me09:09
rogpeppejam: the two forms are exactly equivalent i believe09:10
dimiternrogpeppe, it's just you :)09:10
dimiternrogpeppe, i preferred service stop xyz before, but now i find stop xyz or start xyz pretty useful09:11
dimiternrogpeppe, and i don't think they are quite equivalent09:11
* rogpeppe thinks it was rather unnecessary for upstart to take control of all those useful verbs09:11
jamrogpeppe: honestly, I think they are at least roughly equivalent, but we should be consistent in the doc09:11
rogpeppedimitern: no?09:11
rogpeppedimitern:09:12
rogpeppe% file /sbin/stop09:12
rogpeppe/sbin/stop: symbolic link to `initctl'09:12
jammain problem *I* had with "service stop" is I always wanted to type it wrong "service stop mysql" vs "service mysql stop"09:12
jamI still am not sure which is correct :)09:12
dimiternrogpeppe, initctl is the same as calling the script in /etc/init.d/xyz {start|stop|etc..}09:12
rogpeppejam: initctl stop mysql09:12
dimiternrogpeppe, whereas start/stop and service are provided by upstart09:12
jamdimitern: yeah, I confirmed rogpeppe is right that stop is a symlink to initctl09:13
jamat least on Precise09:13
rogpeppedimitern: i believe that stop is *exactly* equivalent to initctl stop09:13
rogpeppedimitern: try man 8 stop09:14
dimiternrogpeppe, hmm.. seems right09:14
rogpeppedimitern: (it doesn't even mention the aliases)09:14
rogpeppedimitern: that's why i like using initctl, as it's in some sense the canonical form09:14
jamrogpeppe: can you double check the main doc again. I reformatted the text, and reformatting regexes is scary :)09:14
dimiternrogpeppe, but again, I usually am too lazy to type more, if I can type less :)09:14
jamhttps://docs.google.com/a/canonical.com/document/d/1c1XpjIoj9ob_06fvvGJz7Jm4qS127Wtwd5vw_Jeyebo/edit#09:14
rogpeppedimitern: this is a script :-)09:15
rogpeppejam: looking09:15
jamrogpeppe: actually, it is a document describing what we want other people to type09:15
jamagain, it doesn't matter terribly, but we should be consistent09:15
rogpeppejam: i don't expect anyone to actually type that09:15
jamrogpeppe: that is what this doc *is about* actually09:16
jamrogpeppe: right down what the manual steps are to get things working09:16
jamand then maybe we'll script it later09:16
rogpeppejam: i realise that, but surely anyone that's doing it will copy/paste?09:16
rogpeppejam: rather than manually (and probably wrongly) type it all out by hand09:16
jamrogpeppe: well, C&P except they have to edit bits, and its actually small, so they'll just type it, and ...09:16
rogpeppejam: i wouldn't trust anyone (including myself) to type out that script by hand09:16
jamlike "8.3" ADDR=<...>"09:17
jamthey *can't* just C&P09:17
rogpeppejam: i deliberately changed it so that the only bit to edit was that bit09:17
dimiternrogpeppe, btw for the copy/paste to work we need to use the correct arguments, like --ssl instead of -ssl ;)09:18
rogpeppedimitern: good catch, done09:18
jamso... have we stopped the "stay on the hangout" bit of the day ?09:23
axwfwereade: I used my plugin to provision the new node; how are people expected to do it without it (and get a valid agent_name)?09:25
dimiternjam, i for one find it a bit distracting tbo09:25
axw(just wondering if I should proceed to fix it or not)09:25
jamaxw: maascli acquire agent_name=XXXXX09:26
axwjam: ah :)09:27
axwthen I shall just let that code sit there for now09:27
axwjam: do you think it's worth putting that in the doc?09:28
jamaxw: well, if you don't mind testing it and finding the exact right syntax, then I'd like it in the doc09:34
axwjam: I'll see what I can do before the family gets home09:35
rogpeppejam, wallyworld_: reviewed https://codereview.appspot.com/32710043/09:36
rogpeppefwereade: ^09:36
wallyworld_rogpeppe: i'll read your comments in detail - the changes i made were what i found i had to do to make the tests pass09:38
rogpeppewallyworld_: what was failing?09:39
wallyworld_otherwise it had issues distinguishing between dead vs extra instances09:39
wallyworld_a number of provisioner tests09:39
wallyworld_concerning seeing which instances were stopped09:39
wallyworld_your proposed code may well work also09:40
rogpeppewallyworld_: so that original variable "unknown" didn't actually contain the unknown instances?09:40
axwjam: there are other things that StartInstance does for MAAS too, like creating the bridge interface09:40
rogpeppewallyworld_: i would very much prefer to change as little logic as possible here09:41
wallyworld_rogpeppe: it also contained dead ones i think from memory09:41
wallyworld_cause the dead ones were removed early from machines map09:41
axwjam: tho I guess this is moot if they're just doing a bare-metal backup/restore09:41
jamaxw: so... we should now if this stuff works by going through the checklist we've created. If we really do need something like juju-provision, then we should document it as such.09:43
axwjam: the problem is that step 1 is vague as to how to achieve the goal09:44
jamaxw: so 1.1 in the main doc is about "provision an instance matching the existing as much as possible"09:45
axwjam: yeah, how? maybe it's obvious to people seasoned in maas, I don't know09:45
jamaxw: as in *we need to put it in there* to help people09:46
jamit may be your juju-provision09:46
jamit may be "maascli do stuff"09:46
jamit may be ?09:46
jambut we shouldn't have ? in that doc :)09:46
axwjam: ok, we're on the same page now: that is what my question was before09:46
axwi.e. is there some other way to do this, or do we still need juju-provision09:46
jamaxw: so we are focused on "manual steps you can do today" in that document, though referencing "there is a script over here you can use"09:48
axwjam: ok. well, fwiw that plugin works fine now, so if we can't figure out something better, there's that09:50
wallyworld_rogpeppe: so i needed to leave the dead machines in the machine map until the allinstances had been checked, so that the difference between nachine map and allinstances really represented unknown machines. after that the dead ones could be processed09:51
axwjam: didn't get anywhere with maas-cli; I need to head off now, I'll check in later10:07
jamaxw: np10:07
jamaxw: have a good afternoo10:07
jamafternoon10:07
rogpeppewallyworld_: ok - i'd assumed that unknown really was unknown. i will have a better look at your CL in that light now10:20
rogpeppefwereade: i've added a script to change the relation revnos10:20
fwereaderogpeppe, cool, thanks10:21
dimiternfwereade, the procedure as described checks out10:22
fwereadedimitern, awesomesauce10:24
dimiternfwereade, for ec2 ofc, haven't tried the maas parts10:24
fwereaderogpeppe, great, thanks10:27
fwereadedimitern, would you run rog's new change-version tweak against your env too please?10:27
dimiternfwereade, what's that tweak?10:28
fwereadedimitern, in the doc: if [[ $agent = unit-* ]]10:28
fwereadethen10:28
fwereadesed -i -r 's/change-version: [0-9]+$/change-version: 0/' $agent/state/relations/*/*10:28
fwereadefi10:28
fwereadedimitern, to be run while the unit agent's stopped10:28
fwereadedimitern, it'll trigger a whole round of relation-changed hooks10:28
fwereadedimitern, should be sufficient to bring the environment back into sync with itself even if it was backed up while not in a steady state10:29
dimiternfwereade, i'll try that10:29
dimiternfwereade, wait, which doc? machine doc?10:30
fwereadedimitern, in the canonical source-of-truth doc, in section 8, with the scripts rog write10:31
dimiternfwereade, ah, ok10:31
dimiternfwereade, i can see the hooks, seems fine10:34
fwereadedimitern, sweet10:34
dimiternfwereade, rogpeppe, mgz, jam, TheMue, natefinch, standup time10:46
dimiternmgz, jam, TheMue: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig10:50
jamTheMue: ^^ ? if you want to join10:53
jammgz: ^^10:53
wallyworld_fwereade: pushed some changes. wrt the question - can we call processMachines when setting safe mode - what machine ids would i use in that case?11:48
wallyworld_cause normally the ids come from the changes pushed out by the watcher11:48
rogpeppewallyworld_: i think you could probably get all environ machines and use their ids11:59
wallyworld_rogpeppe: i considered that but in a large environment the performance could be an issue12:00
rogpeppewallyworld_: no worse than the provisioner bouncing12:00
rogpeppewallyworld_: and this is something that won't happen very often at all, i'd hope12:00
wallyworld_hmmm ok12:01
rogpeppewallyworld_: um, actually...12:01
wallyworld_i'll look into it12:01
rogpeppewallyworld_: perhaps you could pass in an empty slice12:01
wallyworld_then it wouldn't pick up any dead machines, but may not matter12:02
rogpeppewallyworld_: i don't think we'll do anything differently with dead machines between safe and unsafe mode12:02
rogpeppewallyworld_: the thing that changes is how we treat instances that aren't in state at all, i think12:03
wallyworld_i thought about using a nil slice and thought it may be an issue but i can't recall why now. i'll look again12:03
rogpeppewallyworld_: BTW you probably only need to call processMachines when provisioner-safe-mode has been turned off12:04
wallyworld_yep, figured that :-)12:05
* TheMue => lunch12:15
rogpeppewallyworld_: reviewed. sorry for the length of time it took.12:28
wallyworld_np, thanks. i'll take a look12:28
wallyworld_rogpeppe: with the life == Dead check - if i remove it, wont' we encounter this line  else if !params.IsCodeNotProvisioned(err) {12:32
wallyworld_and exit with an error12:32
rogpeppewallyworld_: i don't *think* it's an error to call InstanceId on a dead machine12:33
wallyworld_well, it will try and find an instance record in the db and fail12:33
wallyworld_or maybe not12:33
wallyworld_i think it will only fail once the machine is removed12:34
wallyworld_i just don't see the point of a rpc round trip12:34
rogpeppewallyworld_: i think it will probably work even then12:34
wallyworld_when it is not needed12:34
rogpeppewallyworld_: it is strictly speaking not necessary, yes, but your comment is only necessary because the context that makes the code correct as written is not inside that function12:35
rogpeppewallyworld_: it only works if we *know* that stopping contains all dead machines12:36
wallyworld_yeah it sorta is - the population of stopping and processiing of that12:36
wallyworld_ok,i see your point12:36
wallyworld_but12:36
wallyworld_the comment clears up any confusion12:37
rogpeppewallyworld_: i'd prefer robust code to a comment, tbh12:37
wallyworld_and i hate invoking rpc unless necessary, and we are trusting that we either get an instance id or that specific error always and we are not sure12:37
wallyworld_calling rpc  unnecessarily can be unrobust also12:38
rogpeppewallyworld_: i believe it's premature optimisation12:38
rogpeppewallyworld_: correctness is much more important here12:38
wallyworld_eliminating rpc is never premature optimisation12:39
wallyworld_especially when we can have 1000s of machines12:39
rogpeppewallyworld_: *any* optimisation is premature optimisation unless you've measured it12:39
wallyworld_except for networking calls12:39
rogpeppewallyworld_: none of this is on a critical time scale12:39
wallyworld_they can be indeterminately long12:39
rogpeppewallyworld_: it's all happening at leisure12:39
wallyworld_but, it is a closed system and errors/delays add up12:40
rogpeppewallyworld_: look, we're getting the instance ids of every single machine in the environment12:40
rogpeppewallyworld_: saving calls for just the dead ones seems like it won't save much at all12:40
rogpeppewallyworld_: if we wanted to save time there, we should issue those rpc's concurrently12:41
jamfwereade: for the fix for "destroy machines". I'd like to warn if you supply --force but it won't be supported, should that go via logger.Warning or is there something in command.Context we would use?12:41
jamthere is a Context.Stderr12:41
wallyworld_rogpeppe: can we absolutely guarantee that for all dead/removed machines, instanceid() will return a value or a not provisioned error?12:42
fwereadejam, I'd write it to context.Stderr, yeah12:42
rogpeppewallyworld_: assuming the api server is up, yes12:42
fwereaderogpeppe, wallyworld_: NotFound?12:43
jamrogpeppe: we don't have any way to make our RPC server pretend an API doesn't actually exist, right?12:43
rogpeppefwereade: can't happen12:43
jamit would be nice for testing backwards compat12:43
rogpeppefwereade: look at the InstanceId implementation12:43
rogpeppefwereade: i wouldn't mind an explicit IsNotFound check too though, for extra resilience12:44
fwereaderogpeppe, looks possible to me12:44
rogpeppefwereade: if (err == nil && instData.InstanceId == "") || (err != nil && errors.IsNotFoundError(err)) {12:44
rogpeppefwereade: err = NotProvisionedError(m.Id())12:44
fwereaderogpeppe, I'm looking at apiserver12:44
wallyworld_looks like it will return not found12:44
wallyworld_looking at api server12:45
rogpeppefwereade: ah, it'll fetch the machine first12:45
wallyworld_that's my issue12:45
rogpeppewallyworld_: in which case, check for notfound too12:45
wallyworld_hence the == dead check12:45
wallyworld_seems rather fragile12:45
rogpeppewallyworld_: will the == dead check help you?12:45
wallyworld_yes, because that short circuits the need for getting instance if12:45
wallyworld_id12:46
wallyworld_so we don't need to guess error codes12:46
jamfwereade: I can give a warning, or I can make it an error, thoughts? (juju destroy-machine --force when not supported should try just plain destroy-machine, or just abort ?)12:46
rogpeppewallyworld_: can't the machine be removed anyway, even if the machine is not dead? it could become dead and then be removed12:46
fwereadejam, I'd be inclined to error, myself, tbh12:46
wallyworld_rogpeppe: if it is not dead, there is also processing for that elsewhere12:47
rogpeppewallyworld_: i think this code is preventing you from calling processMachines with a nil slice12:48
wallyworld_which code specifically?12:49
rogpeppewallyworld_: the "if m.Life() == params.Dead {" code12:49
wallyworld_save me looking, how?12:50
rogpeppewallyworld_: because stopping doesn't contain *all* stopping machines (your comment there is wrong, i think)12:50
rogpeppewallyworld_: it (i *think*) contains all dead machines that we've just been told had their lifecycle change12:51
wallyworld_yes12:51
rogpeppewallyworld_: and this is what makes me think that the code is not robust12:51
wallyworld_but that's the the current processing does12:51
wallyworld_looks at changed machines12:51
rogpeppewallyworld_: no12:51
rogpeppewallyworld_: task.machines contains every machine, i think, doesn't it?12:51
wallyworld_yes, i meant the ids12:52
wallyworld_stopping is populated from the ids12:52
rogpeppewallyworld_: so, if there's a dead machine that's not in the ids passed to processMachines, its instance id will be processed as unknown, right?12:53
wallyworld_i think so12:53
wallyworld_but it would have previous triggered12:53
rogpeppewallyworld_: so this code will be wrong if you pass an empty slice to processMachines, yes?12:53
wallyworld_i'd have to trace it through12:54
rogpeppewallyworld_: (which is something that would be good to do)12:54
rogpeppewallyworld_: please write the code in such a way that it's obviously correct12:54
rogpeppewallyworld_: (which the current code is not, IMHO)12:54
wallyworld_obviously is subjective12:54
rogpeppewallyworld_: ok, *more* obviously :-)12:55
rogpeppewallyworld_: "If a machine is dead, it is already in stopping" is an incorrect statement, I believe. Or only coincidentally correct. And thus it seems wrong to me to base the logic around it.12:55
wallyworld_if a changing machine is dead it is in stoppting12:56
wallyworld_that assumption still needs to be true12:56
wallyworld_regardless of if i take out the == dead check12:56
rogpeppewallyworld_: thanks12:57
rogpeppewallyworld_: "if a changing machine is dead it is in stoppting" is not the invariant you asserted12:57
wallyworld_what for?12:57
rogpeppewallyworld_: making the change12:57
wallyworld_i haven't yet12:57
rogpeppewallyworld_: oh, sorry, i misread12:57
wallyworld_still trying to see if i can rework it12:57
rogpeppewallyworld_: i think this code should be robust even in the case that there are dead machines that were not in the latest change event12:58
rogpeppeoh dammit12:59
wallyworld_yes. i wonder what the code used to do, i'll look at the old code12:59
rogpeppehmm, this code is the only code that removes machines, right?13:00
wallyworld_i think so13:00
wallyworld_at first glance, i'm not sure if the old code was immune to the issue of ids not containing all dead machines13:01
wallyworld_the old code looks like it used to rely on dead machines being notified via incoming ids13:02
rogpeppewallyworld_: i *think* it was13:02
rogpeppewallyworld_: it certainly relied on that13:02
wallyworld_so i'm doing something similar here then13:03
rogpeppewallyworld_: but the unknown-machine logic didn't rely on the fact that all dead machines were in stopping13:03
rogpeppewallyworld_: which your code does13:03
wallyworld_hmmm.13:03
fwereadewallyworld_, rogpeppe: I'd really prefer to avoid further dependencies on machine status, the pending/error stuff is bad enough as it is13:03
rogpeppewallyworld_, fwereade: BTW i can't see any way that a machine that has not been removed could return a not-found error from the api InstanceId call, can you?13:03
rogpeppefwereade: i'm not quite sure what you mean there13:04
fwereaderogpeppe, wallyworld_: the "stopping" sounded like a reference to the status -- as in SetStatus13:04
rogpeppefwereade: nope13:04
fwereaderogpeppe, wallyworld_: ok sorry :)13:05
rogpeppefwereade: i'm talking about the stopping slice in provisioner_task.go13:05
rogpeppefwereade: and in particular to the comment at line 288 of the proposal:13:05
wallyworld_what he said13:05
rogpeppe// If a machine is dead, it is already in stopping and13:05
rogpeppe 289                 // will be deleted from instances below. There's no need to13:05
rogpeppe 290                 // look at instance id.13:05
fwereaderogpeppe, wallyworld_: wrt machine removal: destroy-machine --force *will* remove from state, but I'd be fine just dropping that last line in the cleanup method and leaving the provisioner to finally remove it13:06
rogpeppefwereade: this discussion is stemming from my remark on that comment13:06
rogpeppefwereade: that would be much better13:06
wallyworld_one place to remove is best13:06
rogpeppefwereade: otherwise we can leak that machine's instance id13:06
rogpeppefwereade: if we're in safe mode13:06
=== gary_poster|away is now known as gary_poster
fwereaderogpeppe, wallyworld_: I saw I'd done that the other day and thought "you idiot", for I think exactly the same reasons, consider a fix for that pre-blessed13:07
wallyworld_rogpeppe: i think i can see your point13:07
rogpeppewallyworld_: phew :-)13:08
wallyworld_that stopping won't contain all dead machines13:08
wallyworld_sorry, it's late here, i'm tired, that's my excuse :-)13:08
rogpeppewallyworld_: np13:08
rogpeppewallyworld_: thing is, it *probably* does, but i don't think it's an invariant we want to rely on implicitly13:08
wallyworld_i was originally worried about the error fragility13:09
rogpeppewallyworld_: especially because we can usefully break that invariant to good effect (by passing an empty slice to processMachines)13:09
wallyworld_i'm still quite concerned about all the rpc calls we make (in general)13:09
rogpeppewallyworld_: an extra piece of code explicitly ignoring a not-found error too would probably be a good thing to add13:10
wallyworld_ok13:10
rogpeppewallyworld_: well, me too, but you'll only be saving a tiny fraction of them here13:10
wallyworld_yeah, we really need a bulk instance id call - i thought all our apis were supposed to be bulk13:10
wallyworld_putting remote interfaces on domain objects eg machine is also wrong, but thats another discussion13:11
wallyworld_imagine a telco with 10000 or more machines13:12
rogpeppewallyworld_: they are, kinda, but a) we don't make them available to the client and b) we don't implement any server-side optimisation that would make it significantly more efficient13:12
wallyworld_well, here the provisioner is a client13:12
wallyworld_task13:12
rogpeppewallyworld_: if we had 10000 or more machines, we would not want to process them all in a single bulk api call anyway13:12
rogpeppewallyworld_: indeed13:13
wallyworld_sure, but that optimisation can be done under the covers13:13
wallyworld_the bulk api can batch13:13
wallyworld_so bottom line - we can't claim to scale well just yet13:13
wallyworld_more work to do13:13
rogpeppewallyworld_: to be honest, just making concurrent API calls here would yield a perfectly sufficient amount of speedup, even in the 10000 machine case, i think13:14
rogpeppewallyworld_: without any need for more mechanism13:14
wallyworld_you mean using go routines?13:14
rogpeppewallyworld_: yeah13:15
wallyworld_well, that could happen under the covers13:15
wallyworld_but we need to expose a bulk api to callers13:15
rogpeppewallyworld_: i'm not entirely convinced.13:15
wallyworld_and then the implementation can decide how best to do it13:15
rogpeppewallyworld_: the caller may well want to do many kinds of operation at the same time. bulk calls are like vector ops - they only allow a single kind of op to be processed many times13:16
rogpeppewallyworld_: that may not map well to the caller's requirements13:16
wallyworld_yes, which is why remote apis need to be desinged th match the workflow13:16
rogpeppewallyworld_: agreed13:16
wallyworld_ours are just a remoting layer on top of server methods13:16
wallyworld_which is kinda sad13:17
rogpeppewallyworld_: which is why i think that one-size-fits all is not a good fit for bulk methods13:17
rogpeppewallyworld_: actually, it's perfectly sufficient, even for implementing bulk calls13:17
wallyworld_all remote methods should be bulk, but how stuff is accumulated up for the call is workflow dependent13:17
rogpeppewallyworld_: it's just a name space mechanism13:18
wallyworld_anytime a remote method call is O(N) is bad13:18
rogpeppewallyworld_: there are many calls where a bulk version of the call is inevitably O(n)13:18
wallyworld_it should't be if designed right13:18
wallyworld_to match the workflow13:19
rogpeppewallyworld_: if i'm adding n services, how can that not be O(n) ?13:19
wallyworld_what i mean is - if you have N objects, you don't make N remote calls to get info on each one13:19
wallyworld_i don't mean the size of the api13:19
wallyworld_but the call frequency13:19
wallyworld_to get stuff done13:20
rogpeppewallyworld_: if calls can be made concurrently (which they can), then the overall time can still be O(1)13:20
wallyworld_the client should not have to manually do that boiler plate13:20
rogpeppewallyworld_: assuming perfect concurrency at the server side of course :-)13:20
rogpeppewallyworld_: now that's a different argument, one of convenience13:20
wallyworld_so imagine if you downloaded a file and the networking stack made you as a client figure out how to chunk it13:21
rogpeppewallyworld_: personally, i think it's reasonable that API calls are exactly as easy to make concurrent as calling any other function in Go13:21
wallyworld_no - rpc calls should never be treated like normal calls13:21
rogpeppewallyworld_: it does13:21
wallyworld_networked calls are always different13:22
rogpeppewallyworld_: i disagree totally13:22
wallyworld_so, you've never read the 7 falicies of neworked code or whatever that paper is called?13:22
rogpeppewallyworld_: any time you call http.Get, it looks like a normal call but is networking under the hood.13:22
rogpeppewallyworld_: we should not assume that it cannot fail, of course13:23
rogpeppewallyworld_: and that's probably one of the central fallacies13:23
wallyworld_people know http get is networked at do tend to programme aroud it accordingly13:23
rogpeppewallyworld_: but a function works well to encapsulate arbitrary network logic13:23
rogpeppewallyworld_: sure, you should probably *know* that it's interacting with the network, but that doesn't mean that calling a function that interacts with the network in some way is totally different from calling any other function that interacts in some way with global state13:24
rogpeppewallyworld_: in a way that can potentially fail13:24
wallyworld_it is different - networks can disappear, have arbitary lag, different failure modes etc etc13:25
wallyworld_the programming model is different13:25
rogpeppewallyworld_: not really - the function returns an error - you deal with that error13:25
wallyworld_it is different at a higher level that that13:26
rogpeppewallyworld_: i don't believe that any network interaction breaks all encapsulation13:26
wallyworld_see http://www.rgoarchitects.com/files/fallacies.pdf13:26
rogpeppewallyworld_: which is what i think you're saying13:26
rogpeppewallyworld_: i have seen that13:27
rogpeppewallyworld_: i'm not sure how encapsulating a networking operation in a function that returns an error goes against any of that13:27
wallyworld_the apis design, error handling and all sorts of other things are different when dealing with networked apis13:27
wallyworld_the encapsulation isn;t the issue13:28
wallyworld_it's the whole api design13:28
wallyworld_and underlying assumptions abut how such apis can be called13:28
rogpeppewallyworld_: i don't understand13:28
wallyworld_case in point - it might make sense to call instanceId() once per 10000 machines when inside a service where a machine domain object is colocated, but it is madness to do that over a network13:29
wallyworld_the whole api decomposiiton, assumptoons about errors, retries etc needs to be different for networked apis13:30
rogpeppewallyworld_: so, there's no reason that where we need it, we couldn't have State.InstanceIds(machineIds ...string) as well as Machine.InstanceId13:30
wallyworld_we should never have machine.InstanceId() - networked calls do not belong on domain objects but services13:31
rogpeppewallyworld_: well, it's certainly true that some designs can make that necessary; eventual consistency for one breaks a lot of encapulation13:31
wallyworld_thats the big mistake java made with EJB 1.013:31
wallyworld_and it took a decade to recover13:31
rogpeppewallyworld_: what's the difference between machine.InstanceId() and InstanceId(machine) ?13:32
wallyworld_domain objects encapsulate state; they shouldn't call out to services13:32
jamdimitern: trivial review of backporting your rpc.IsNoSuchRPC to 1.16: https://codereview.appspot.com/3285004313:33
wallyworld_the first example above promotes single api calls13:33
wallyworld_which is bad13:33
rogpeppewallyworld_: and the second one doesn't?13:33
dimiternwallyworld_, looking13:33
wallyworld_the second should be a bulk call on a service13:33
rogpeppewallyworld_: even if it doesn't make sense to be a bulk call?13:34
dimiternwallyworld_, the diff is messy13:34
rogpeppewallyworld_: anyway, i think this is somewhat of a religious argument :-)13:34
jamdimitern: did you mean jam ?13:34
rogpeppewallyworld_: we should continue at some future point, over a beer.13:34
wallyworld_rogpeppe: it always makes sense to provide bulk calls, and if there happens to be only one, just pass that in as a single elemnt array13:35
wallyworld_yes13:35
dimiternjam, oops yes13:35
rogpeppewallyworld_: i'm distracting you :-)13:35
wallyworld_yes13:35
wallyworld_:-)13:35
jamdimitern: the diff looks clean here, is it because of unified vs side-by-side?13:35
wallyworld_i've seen too many systems fall over due to the issues i am highlighting13:35
jamI have "old chunk mismatch" in side-by-side but it looks good in unified, I think13:36
jamugh, it is targetting trunk13:36
dimiternjam, yeah, the s-by-s diff is missing13:36
jamI thought I stopped it in time13:36
jamdimitern: so I'll repropose, lbox broke stuff13:36
jamyou can look at the unified diff, and that will tell you what you'll see in a minute or so13:36
dimiternjam, cheers13:36
jamdimitern: https://codereview.appspot.com/32860043/ updated13:40
dimiternjam, lgtm, thanks13:41
jamdimitern, fwereade: if you want to give it a review, this is the "compat with 1.16.3" for 1.16.4 destroy-machines, on the plus side, we *don't* have to fix DestroyUnit because that API *did* exist. (GUI didn't think about Machine or Environment, but it *did* think about Units)13:47
jamhttps://codereview.appspot.com/3288004313:47
dimiternjam, looking13:47
dimiternjam, lgtm13:51
jamfwereade: do you want to give an eyeball if that seems to be a reasonable way to do compat code? We'll be using it as a template for future compat13:51
fwereadejam,will do, we have that meeting in a sec13:52
jamfwereade: sure, but it is 1hr past my EOD, and my son needs me to take him to McDonalds :)13:52
fwereadejam, ok then, I will look as soon as I can, thanks13:53
jamfwereade: no rush on your end13:53
jamI think it is ~ok, though I'd *love* to actually have tests that compat is working13:53
wallyworld_rogpeppe: more changes pushed. but calling processMachines(nil) hangs the tests so that bit is not there yet13:53
jamsinzui: maybe we could do cross version compat testing in CI for stuff we know changed?13:54
jamI could help write those tests13:54
fwereadewallyworld_, might processMachines(nil) be a problem if the machines map is empty?13:54
rogpeppewallyworld_: looking13:54
rogpeppewallyworld_: could you propose again? i'm getting chunk mismatch13:55
wallyworld_fwereade: could be, i haven't traced through the issue yet fully. not sure how much further i'll get tonight, it's almost midnight and i'm having trouble staying awake13:55
fwereadewallyworld_, ok, stop now :)13:55
fwereadewallyworld_, tired code sucks13:55
fwereadewallyworld_, landing it now will not make the world of difference13:56
wallyworld_yep. i don't have to be tired to write sucky code :-)13:56
rogpeppewallyworld_, fwereade: i could try to take it forward. mgz is now online so can probably take the bootstrap-update stuff forward13:56
fwereadewallyworld_, ;p13:56
fwereaderogpeppe, wallyworld_, mgz: if that works for you all, go for it13:56
rogpeppeor, it probably doesn't make much difference, as fwereade says13:56
wallyworld_rogpeppe: i pushed again13:56
rogpeppewallyworld_: thanks13:56
rogpeppewallyworld_: you need to lbox propose again.13:58
rogpeppewallyworld_: oh, hold on!13:58
wallyworld_a thrid time?13:58
rogpeppewallyworld_: page reload doesn't work, i now remember13:58
* wallyworld_ hates reitveld13:58
rogpeppewallyworld_: ah, it works, thanks!13:59
rogpeppewallyworld_: that bit is really shite, it's true13:59
rogpeppewallyworld_: i saw a proposal recently to fix the upload logic13:59
wallyworld_hope they land it soon13:59
rogpeppewallyworld_: it would be nice if the whole thing was a little more web 2.0, so you didn't have to roundtrip to the server all the time.14:00
wallyworld_yeah14:00
wallyworld_that also messes up browser history14:01
sinzuijam, I had the same idea. I added it to my proposal of what we want to see about a commit in CI https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AoY1kjOB7rrcdEl3dWl0NUM3RzE2dXFxcGxwbVZtUFE&usp=drive_web#gid=014:03
rogpeppewallyworld_: i think i know why your processMachines(nil) call might be failing14:06
wallyworld_ok14:06
rogpeppewallyworld_: were you calling it from inside SetSafeMode?14:07
wallyworld_yeah14:07
rogpeppewallyworld_: thought so. that's not good - it needs to be called within the main provisioner task look14:07
rogpeppes/look/loop/14:07
wallyworld_ok14:08
rogpeppewallyworld_: so i think the best way to do that is with a channel rather than using a mutex14:08
wallyworld_rogpeppe: but setsafemode is called from the loop14:08
rogpeppewallyworld_: it is?14:08
wallyworld_ah, provisioner loop14:09
rogpeppewallyworld_: yup14:09
wallyworld_not provisioner task14:09
rogpeppewallyworld_: indeed14:09
wallyworld_save me tracing through the code, why does it matter?14:09
rogpeppewallyworld_: because there is lots of logic in the provisioner task that relies on single-threaded access (all the state variables in environProvisioner)14:10
rogpeppewallyworld_: that's why we didn't need a mutex there14:10
wallyworld_makes sense14:11
rogpeppewallyworld_: you'll have to be a bit careful with the channel (you probably don't want the provisioner main loop to block if the provisioner task isn't ready to receive)14:13
wallyworld_yeah, channels can be tricky like that14:14
hazmatif anyone has a moment, i would appreciate a review of this trivial that resolves two issues with manual provider, https://code.launchpad.net/~hazmat/juju-core/manual-provider-fixes14:14
rogpeppewallyworld_: this kind of idiom can be helpful: http://paste.ubuntu.com/6479150/14:15
wallyworld_rogpeppe: thanks, i'll look to use something like that14:16
rogpeppewallyworld_: it works well when there's a single producer and consumer14:16
rogpeppehazmat: i'll look when the diffs are available. codereview would be more conventional.14:17
hazmatrogpeppe, doh.14:18
hazmatrogpeppe, its a 6 line diff fwiw14:18
rogpeppehazmat: lp says "An updated diff will be available in a few minutes. Reload to see the changes."14:18
hazmathttp://bazaar.launchpad.net/~hazmat/juju-core/manual-provider-fixes/revision/209514:19
* hazmat lboxes14:19
hazmatrogpeppe, https://codereview.appspot.com/3289004314:22
rogpeppehazmat: axw_ might have some comments on the LookupAddr change.14:24
hazmatrogpeppe, what it was doing previously was broken14:24
rogpeppehazmat: it looks like it was done like that deliberately.14:24
rogpeppehazmat: agreed.14:24
hazmatrogpeppe, yes deliberately broken, i've already discussed with axw14:25
rogpeppehazmat: it should at the least fall back to the original address14:25
hazmatrogpeppe, it hangs indefinitely14:25
rogpeppehazmat: ok, if you've already discussed, that's fune14:25
rogpeppefine14:25
hazmatrogpeppe, and there's no reason for requiring dns name14:25
rogpeppehazmat: hmm, hangs indefinitely?14:25
rogpeppehazmat: ah, if it doesn't resolve, then WaitDNSName will loop14:26
rogpeppehazmat: yeah, i think that's fair enough. the only thing i was wondering was if something in the manual provider used the address to name the instance14:26
rogpeppehazmat: but even then, a numeric address should be fine14:26
hazmatyes.. slavish adherence to name is name, when name is actually address and the api should get renamed.14:27
hazmatto name is the issue14:27
rogpeppehazmat: yeah.14:27
* hazmat grabs a cup of coffee14:28
rogpeppehazmat: i think the api was originally named after the ec2 name14:28
jamsinzui: * vs x is ?14:37
jamstuff that is done, vs proposed ?14:37
jamor stuff that is done but failing tests14:37
jamsinzui: if you can give me a template or some sort of process to write tests for you, I can do a couple14:39
sinzuijam, in15 minutes I can14:40
hazmatrogpeppe, thanks for the review, replied and pushed.14:52
rogpeppehazmat: looking14:53
rogpeppehazmat: LGTM14:54
jamsinzui: no rush on my end. I'm EOD and just stopping by IRC from time to time15:06
sinzuijam, okay, I will send an email to the juju-dev list so that the knowledge is documented somewhere15:07
natefinchis there a way to move a window that's off the screen back onto the screen?  I know windows tricks to do it, but not linux. (and I know about workspaces, I'm not using them)15:10
rogpeppenatefinch: i enabled workspaces for that reason only15:11
natefinchrogpeppe: heh, well, maybe I should turn them back on15:11
TheMuenatefinch: if you click on the workspace icon in the bar you'll get all four and can move windows15:15
natefinchTheMue: I had workspaces off.... I think Ubuntu just gets confused when I go from one monitor to multiple monitors and back again15:16
=== teknico_ is now known as teknico
TheMuenatefinch: computers don't have to have more than one monitor *tryingToSoundPowerful*15:17
TheMue;)15:17
natefinchhaha15:17
natefinchAnd I turned off workspaces because the keyboard shortcuts don't work :/15:18
* rogpeppe goes for a bite to eat15:52
hazmatare we doing 2 LGTM for branches or one?16:03
natefinchone16:04
hazmatnatefinch, thanks16:05
hazmatis there a known failing test in trunk?16:50
hazmatie cd juju-core/juju && go test -> http://pastebin.ubuntu.com/6479834/16:50
dimiternhazmat, which one?16:50
natefinchhazmat: thats a pretty common sporadic failure, yes.16:51
dimiternhazmat, yeah, that's known16:51
dimiternhazmat, it's pretty random to reproduce16:51
rogpeppehazmat: if you have a way of reliably reproducing it, i want to know16:51
hazmatk, it seems to happen fairly regularly for me16:51
hazmatrogpeppe, atm on my local laptop i can reproduce every time.. generating verbose logs atm16:52
rogpeppehazmat: do you get it when running the juju package tests on their own?16:53
hazmatrogpeppe, here's verbose logs on the same http://paste.ubuntu.com/6479841/16:53
hazmatrogpeppe, yes i do16:53
rogpeppehazmat: and this is on trunk?16:53
dimiternhazmat, can you check your /tmp folder to see and suspicious things - like too many mongo dirs or gocheck dirs?16:53
hazmatrogpeppe, if i just run -gocheck.f "DeployTest*" i don't get failure16:53
hazmatdimitern, not much in /tmp  three go-build* dirs16:54
hazmatrogpeppe, yes on trunk16:54
dimiternhazmat, ok, so it's not related then16:54
dimiternhazmat, running a netstat dump of open/closing/pending sockets to mongo might help16:55
rogpeppehazmat: is it always TestDeployForceMachineIdWithContainer that fails?16:55
hazmatrogpeppe, checking.. its failed a few times on that one.. every time.. not sure16:56
hazmatrogpeppe, yeah.. it does seem to happen primarily on that one16:56
rogpeppehazmat: how about: go test -gocheck.f DeploySuite ?16:56
hazmatrogpeppe, i think that works fine.. its just testing the whole package that fails16:56
hazmatyeah. that works fine16:56
hazmathmm16:57
rogpeppehazmat: i'd quite like to try bisecting to see which other tests cause it to fail16:57
hazmatrogpeppe, hold on  a sec.. your cli for gocheck.f  results in zero tests16:57
rogpeppehazmat: oops, sorry, DeployLocalSuite16:57
rogpeppehazmat: go test -gocheck.list will give you a list of all the tests it's running16:58
hazmatyeah.. all tests pass16:58
hazmatif running just that suite16:58
rogpeppehazmat: ok...16:58
rogpeppehazmat: how about go test -gocheck.f 'DeploySuite|ConnSuite' ?16:58
hazmatrogpeppe, thanks for the trip re -gocheck.list16:59
hazmatrogpeppe, that fails running both different test failure DeployLocalSuite.TestDeploySettingsError16:59
hazmatsame error16:59
rogpeppehazmat: good17:00
rogpeppehazmat: now how about go test -gocheck.f 'DeploySuite|^ConnSuite' ?17:00
hazmatrogpeppe, fwiw re Deploy|Conn -> http://paste.ubuntu.com/6479877/17:00
rogpeppehazmat: oops, that doesn't match what i thought it would17:01
hazmatrogpeppe, yeah.. it runs both still17:02
hazmatrogpeppe,  you meant this ? go test -v -gocheck.vv -gocheck.f 'DeployLocalSuite|!NewConnSuite'17:02
rogpeppehazmat: ok, instead of juggling regexps, how about putting c.Skip("something") in the SetUpSuite of all the suites except NewConnSuite, ConnSuite and DeployLocalSuite?17:02
rogpeppehazmat: no, i was trying to specifically exclude ConnSuite17:03
hazmatrogpeppe, thats what it does17:03
hazmatrogpeppe, that cli only runs deploy local suite tests17:03
rogpeppehazmat: hopefully you can then run go test and it'll still fail17:07
hazmatrogpeppe, so it passes with 'NewConnSuite|ConnSuite' and fails if i add |DeployLocalSuite17:08
rogpeppehazmat: then we can try skipping NewConnSuite17:08
hazmatk17:08
hazmatrogpeppe, fails with ConnSuite|DeployLocalSuite17:08
rogpeppehazmat: woo17:09
rogpeppehazmat: does anything change if you comment out the "if s.conn == nil { return }" line in ConnSuite.TearDownTest ?17:10
hazmatrogpeppe, no.. still fails with ConnSuite|DeployLocalSuite and that part commented out17:13
rogpeppehazmat: ok, that was a long shot :-)17:13
rogpeppehazmat: could you skip all the tests in connsuite, then gradually reenable and see when things start failing again?17:14
hazmatrogpeppe, sure17:15
rogpeppehazmat: hold on, i might see it17:15
rogpeppehazmat: try skipping just TestNewConnFromState first17:15
rogpeppehazmat: oh, no, that's rubbish17:16
rogpeppehazmat: ignore17:16
rogpeppehazmat: but ConnSuite does seem to be an enabler for the DeployLocalSuite failure, so i'd like to know what it is that's the trigger17:16
hazmatrogpeppe, lunch break, back in 2017:17
rogpeppehazmat: k17:17
hazmatback, and walking through the tests17:20
hazmatrogpeppe, interesting.. i added a skip to the top of every test method in ConnSuite, and it still fails when doing ConnSuite|DeployLocalSuite17:23
rogpeppehazmat: ah ha! i wondered if that might happen17:24
rogpeppehazmat: what happens if you actually comment out (or rename as something not starting with "Test") the test methods in ConnSuite?17:25
dimiternrogpeppe, what i'm seeing when it happens on my machine, is that the SetUpTest (or SetUpSuite - can't remember exactly) is the thing that fails17:27
rogpeppedimitern: which SetUpTest?17:27
dimiternwhich causes one of a few tests to fail17:27
hazmatrogpeppe, odd.. that gets a failure (deploymachineforceid), but effectively renaming all the tests negates the suite so... it should be equivalent to running DeployLocalSuite by itself.. which still works for me.17:27
dimiternrogpeppe, DeployLocalSuite - always17:27
hazmathmm.. rerunning gets failure on DeployLocalSuite.TestDeployWithForceMachineRejectsTooManyUnits17:27
rogpeppedimitern: i'm very surprised it's SetUpTest, because i don't think that checks for state connection closing17:27
rogpeppehazmat: that's which which tests commented out?17:28
hazmatTearDownTest that fails for me17:28
rogpeppes/which/with/17:28
rogpeppedimitern: i think it's usually TearDownTest because that calls MgoSuite.TearDownSuite17:28
hazmatrogpeppe, yes thats' with tests prefixed with XTest, the suite doesn't show up at all in -gocheck.list17:28
rogpeppeTearDownTest, of couse17:28
dimiternhazmat, rogpeppe, ha, yes - it was TearDownTest in fact with me as well17:29
rogpeppehazmat: interesting17:29
rogpeppehazmat: so just to sanity check, you still see failures if you comment out or delete all except SetUpSuite and TearDownSuite in ConnSuite?17:29
hazmatk17:30
dimiternbut can't reproduce it consistently - maybe one in 10 runs, but maybe not, and only when I run all the tests from the root dir17:30
rogpeppedimitern: i can't reproduce it even that reliably17:30
rogpeppedimitern: which why i get excited when someone can :-)17:30
rogpeppewhich is why...17:30
hazmatrogpeppe, yeah.. stilll fails17:31
hazmatrogpeppe, even with everything commented but the suite setup/teardown17:31
rogpeppehazmat: now we're starting to get suitably weird17:31
hazmatrogpeppe, and still passes if i run DeployLocalSuite in isolation17:31
dimiternhazmat, version of go?17:31
hazmat1.1.217:31
dimiternmaybe it's something related to parallelizing tests gocheck does?17:32
rogpeppehazmat: again to sanity check, does it pass if you comment out the MgoSuite.(SetUp|TearDown)Suite calls in ConnSuite?17:32
hazmati can switch versions of go if that helps.. i was running trunk of go for a little while, but its pretty broken with juju (and go trunk)17:32
dimiternhazmat, no, i'm on 1.1.2 as well17:32
rogpeppehazmat: please don't switch now!17:32
hazmat:-) ok17:32
dimitern:)17:33
* dimitern brb17:33
rogpeppehazmat: (though FWIW i'm using go 1.2rc2)17:33
hazmatrogpeppe, i had lots of issues with ec2/s3 and trunk.. (roughly close to 1.2rc2)  couldn't even bootstrap17:33
hazmatwhich is why i walked back to 1.1.217:34
rogpeppehazmat: weird. i've had no probs.17:34
rogpeppehazmat: i hope you filed bug reports17:34
hazmatrogpeppe, something for another time.. no i didn't.. i've fallen out bug reports.. i should get back into it17:34
hazmatrogpeppe, so that still fails with mgoSuite teardown/setup calls commented in ConnSuite17:34
rogpeppehazmat: oh damn17:35
rogpeppehazmat: now that's even weirder17:35
rogpeppehazmat: what if you comment out the LoggingSuite calls?17:35
rogpeppehazmat: (leaving ConnSuite as a do-nothing-at-all test suite)17:36
hazmatrogpeppe, sorry i think i missed something on the mgo teardown, revisiting17:36
hazmati had commented it out in setup/teardown on test not suite17:36
hazmatcommenting out setup/teardown on suite first17:37
hazmater.. on test17:37
hazmatsinzui, re this bug, its reproducable for me with JUJU_ENV set.. currently marked incomplete https://bugs.launchpad.net/juju-core/+bug/125028517:38
_mup_Bug #1250285: juju switch -l does not return list of env names <docs> <switch> <ui> <juju-core:Incomplete> <https://launchpad.net/bugs/1250285>17:38
hazmatokay.. still fails with test tear/setup commented.. moving on to mgo comments in suite tear/setup17:38
hazmatand still fails with mgo commented in connsuite tear/setup17:39
rogpeppehazmat: given that there are no tests in that suite, i wouldn't expect test setup/teardown to make a difference17:39
rogpeppehazmat: in connsuite suite setup/teardown?17:39
hazmatrogpeppe, yeah.. i suspect its actually an issue DeployLocalSuite, and running with any additional catches it.17:39
sinzuihazmat, I will test that bug again, oh and I think you and rogpeppe are looking at the mgo test teardown that affects me17:40
rogpeppehazmat: i think so too, but i can't see how running LoggingSuite.SetUpTest and TearDownTest could affect anything17:40
hazmatrogpeppe, for ref here's my current connsuite http://paste.ubuntu.com/6480040/17:40
hazmatConnSuite is basically empty with only suite tear/setup methods that do nothing17:41
rogpeppehazmat: oh, i thought you were skipping NewConnSuite (and the other suites)17:41
hazmatrogpeppe, i'm only running go test -v -gocheck.vv -gocheck.f 'ConnSuite|DeployLocalSuite'17:41
rogpeppehazmat: that will still run NewConnSuite17:41
rogpeppehazmat: could you comment out or delete or skip NewConnSuite?17:42
rogpeppehazmat: or just comment out line 4617:42
hazmatoh..17:42
hazmatrogpeppe, sorry for the confusion then.. okay back tracking17:42
rogpeppehazmat: np, it's so easy to do when trying to search for bugs blindly like this.17:43
hazmatrogpeppe, so correctly running just ConnTestSuite and DeployLocalSuite works17:46
rogpeppehazmat: ok, so... you know what to do :-)17:47
hazmatindeed17:47
rogpeppehazmat: thanks a lot for going at this BTW17:47
rogpeppehazmat: it's much appreciated17:47
hazmatrogpeppe, np.. its annoying have intermittent test failures, esp with async ci merges17:48
rogpeppehazmat: absolutely17:48
mgznatefinch, fwereade: I have pushed juju tagged 1.16.2 plus the juju-update-bootstrap command to lp:~juju/juju-core/1.16.2+update17:51
fwereademgz, great, thanks -- I've got to be off, I'm afraid, would you please reply to the mail so ian knows where to go? and nate, please test when you get a mo17:53
fwereadenatefinch, I'll try to be back on to hand over to ian at least17:53
natefinchfwereade: no problem17:53
mgzfwereade: replying to your hotfix branch email now17:54
hazmatrogpeppe, so its not an exact test failure, its some subset of the newconnsuite .. still playing with it, but this is the current minimal set of tests to failure http://pastebin.ubuntu.com/6480107/...17:56
rogpeppehazmat: if you could get to a stage where you can't remove any more tests without it passing, that would be great17:57
rogpeppehazmat: actually, i have a glimmer of suspicion. each time you run the tests, could you pipe the output through timestamp (go get code.google.com/p/rog-go/cmd/timestamp). i'm wondering if there's something time related going on in the background.18:00
rogpeppehazmat: it's probably nothing though18:01
hazmatthere a certain amount of randomness to it.. so it quite possible18:02
hazmatrogpeppe, so i think i have some progress. i can get both suites running reliably minus one test..  TestConnStateSecretsSideEffect18:21
rogpeppehazmat: cool18:22
rogpeppehazmat: so if you skip that test and revert everything else, everything passes reliably for you?18:23
hazmatjust leaving that one test commented out the entire package test suite succeeds (running everything 5 times to account for intermittent)18:25
hazmatyeah.. reliably passes minus that test18:25
rogpeppehazmat: great18:25
* hazmat files a bug to capture18:26
rogpeppehazmat: out of interest, what happens if you comment out the SetAdminMongoPassword line?18:26
hazmatfwiw filed as https://bugs.launchpad.net/juju-core/+bug/125520718:27
_mup_Bug #1255207: intermittent test failures on package juju-core/juju <juju-core:New> <https://launchpad.net/bugs/1255207>18:27
hazmatrogpeppe, that seems to do the trick, . still verifying.. found a random panic.. on Panic: local error: bad record MAC (PC=0x414311) but unrelated i think18:29
rogpeppehazmat: i *think* that's unrelated, but i have also seen that.18:29
hazmatrogpeppe, yeah.. passed 20 runs with that one liner fix18:31
rogpeppehazmat: could you paste the output of go test -gocheck.vv with that fix please?18:31
rogpeppepwd18:31
hazmatalso verified i can still get the error with the line back in.. output coming up18:32
hazmatrogpeppe, http://paste.ubuntu.com/6480306/18:33
rogpeppehazmat: ok, line 667 is what i was expecting18:34
rogpeppehazmat: there's something odd going on with the mongo password logic18:35
rogpeppehazmat: what version of mongod are you using, BTW?18:35
hazmat2.4.618:35
rogpeppehazmat: ahhh, maybe that's the difference18:35
rogpeppehazmat: where did you get it from?18:36
rogpeppehazmat: i'm using 2.2.4 BTW18:36
hazmatrogpeppe, 2.4.6 is everywhere i think..18:36
hazmatrogpeppe, its the package in saucy and its in cloud-archive18:36
hazmattools pocket18:36
rogpeppehazmat: ah, i'm still on raring18:36
hazmatcloud-archive tools pocket means that's what we use in prod setups on precise..18:37
fwereaderogpeppe, driveby: it's what we install everywhere and should be using ourselves18:37
rogpeppefwereade: i know, but i had an awful time upgrading to raring (took me weeks to recover) and i've heard that saucy has terrible battery life probs18:38
rogpeppefwereade: and i really rely on my battery a lot18:38
hazmatnot really noticed anything bad18:38
hazmatthe btrfs improvements are very nice18:38
hazmatwith the new kernel18:38
hazmatbattery life impact seems pretty minimal but maybe a few percent18:39
hazmatrogpeppe, alternatively you can just install latest mongodb18:39
rogpeppehazmat: for the moment, i'd like to do that.18:40
rogpeppehazmat: i can't quite bring myself to jump of the high board into the usual world of partially completed and broken OS installs18:40
natefinchrogpeppe: for one data point - my battery life isn't terrible.... it's hard for me to judge on the new laptop, but it seems within range of what is expected.  perhaps slightly lower than what people were seeing on windows for my laptop, but not drastically so.18:42
rogpeppenatefinch: that's useful to know. i currently get about 10 hours, and a little more usage can end up as a drastic reduction in life18:42
rogpeppenatefinch: and certainly at one point in the past (in quantal, i think) i only got about 2 hours, and i really wouldn't like to go back there18:43
rogpeppenatefinch: still, my machine has been horribly flaky recently18:43
hazmatrogpeppe, understood, i used to feel that way.. atm. i tend to jump onto the new version during the beta cycle.. the qa process around distro has gotten *much* better, things are generally pretty stable during the beta/rc cycles.... i don't generally tolerate losing work do to desktop flakiness.18:43
rogpeppenatefinch: perhaps saucy might improve that18:43
hazmatrogpeppe, what's your battery info like?18:43
rogpeppehazmat: battery info?18:43
hazmatrogpeppe, upower -d18:45
hazmatit will show design capacity vs current capacity on your battery if your battery reports it through acpi18:45
rogpeppehazmat: cool, didn't know about that18:46
rogpeppehazmat: http://paste.ubuntu.com/6480352/18:46
hazmatummm.. you should be getting way more than 2hrs18:46
rogpeppehazmat: i do, currently18:47
hazmatrogpeppe, i use powertop  to get a guage of where my battery usage is going18:47
rogpeppehazmat: but some time in the past i didn't18:47
rogpeppehazmat: currently i get about 10h18:47
hazmatand i have some script i use when i unplug to get extra battery life by shutting down extraneous things.18:47
rogpeppehazmat: which means i can hack across the atlantic, for example18:47
rogpeppehazmat: usually i shut down everything and dim the screen, which gets me a couple more hours18:48
hazmatyeah.. getting off topic.. but switching out to saucy really shouldn't do much harm to battery life, i havent really noticed anything significant (intel graphics / x220)18:50
rogpeppehazmat: do you use a second monitor?18:54
natefinchrogpeppe: multi montitor support is not ubuntu's strong suit.  I just had to putmy laptop to sleep and then open it back up after unplugging two monitors, otherwise my laptop screen was blank :/18:56
natefinchrogpeppe: or at least, it's not a strong suit on the two recent laptops I've had18:56
rogpeppenatefinch: it works ok for me usually, except the graphics driver acceleration goes kaput about once a day18:57
natefinchrogpeppe: it's only really a problem for me when I add or remove monitors.  Steady state works fine for me.18:57
rogpeppenatefinch: adding and removing works ok usually. i was really interested to see if hazmat had the same issue as me, 'cos his hardware is pretty similar18:58
natefinchrogpeppe: ahh18:59
natefinchrogpeppe: what laptop do you have, anyway?  10 hours is impressive18:59
rogpeppenatefinch: lenovo x22018:59
natefinchvery nice.  I get about 4-5 hours on battery... I probably should have gone for the bigger battery in this thing that would have given me 6-8.19:01
rogpeppenatefinch: you've got a much bigger display, i think19:01
hazmatrogpeppe, i do use  a second monitor19:01
natefinchrogpeppe: yeah, mine's 15.6" and hi res19:02
hazmatrogpeppe, i typically only use one external screen and turn off internal.. i used to do two internal screens (with docking station)19:02
hazmater.. two external19:02
hazmatworks pretty well for me19:02
natefinchone screen, wow, I wouldn't be able to do it :)19:02
rogpeppehazmat: hmm, i think i'm the only person that ever sees the issue19:02
hazmatnatefinch, one .. 24 inch screen works well enough for me.19:03
rogpeppehazmat: i reported the bug ages ago,  but i probably reported it to the wrong place. never saw any feedback.19:03
hazmatnatefinch, i've had that issue, the screen is still there, though.. i just enter a password to get past the unrendered screen saver password, and i'm back to the desktop.. its basically a wake from monitor shutdown..19:04
hazmater.. monitor power saving mode19:04
natefinchhazmat: yeah, if I close the laptop lid and reopen it, it seems to sort itself out.  Just kind of annoying.19:04
hazmatnot very common anymore, but still annoying.. and led to me accidentally doing password into active window (irc ) a few weeks ago.19:04
hazmatrogpeppe, the x220 tricks out quite nicely.. i added an msata card for lxc containers and 16gb of ram as upgrades this year.. also picked up the slice battery, but not clear that was as useful.. but with it roughly 16hrs of battery life (my is bit more degraded then yours on capacity)19:06
natefinchhazmat: haha, I did the same thing, into an IT-specific facebook group, no less19:06
hazmata bit annoyed there moving to max 12gb of ram on the x240 and x44019:06
hazmatnatefinch, the m3800 / xps looks pretty nice, just not sure about that screen res issue on the os level. i assume your just playing around with the scaling to make things usable?19:07
natefinchhazmat: yeah, I set the OS font 150%, set the cursor to be like triple normal size, and zoom in on web pages.... it's actually not terrible19:08
natefinchhazmat: and it is a really really sharp display19:09
natefinchhazmat: and the build quality overall is exceedingly nice. It feels really sturdy, but surprisingly thin and light for being a pretty beefy machine19:10
natefinchbtw, is there a way to get ubuntu to turn off the touchpad while I'm typing?  I palm-click constantly19:10
rogpeppenatefinch: msata is just a solid state drive, right?19:10
natefinchrogpeppe: msata is just the interface type and size, but yes, there's no spinning msatas that I know of.19:11
hazmatyeah.. too small for spinning rust19:11
natefinchrogpeppe: electrically, it's just a different shaped plug from regular sata.... exact same specs etc, you can mount an msata in a regular sata drive by just hooking up the wires correctly19:12
rogpeppehazmat: so there's room an an x220 for one of those in addition to the usual drive?19:12
hazmatrogpeppe, yes19:12
natefinchahh, cool, yeah, my xps15 has that too19:12
natefinchthough at the expense of the larger battery19:12
hazmatrogpeppe, i dropped a 128gb plextor m5 in.. needs a keyboard removal though, but its pretty straightforward, youtube videos cover it19:12
natefincher rather, the 2.5" drive is at the expense19:12
rogpeppehazmat: cool. i'm a little surprised there's space in there!19:13
hazmatrogpeppe, there's some additional battery draw, in terms of finding a perf compromise.. the msatas are super tiny19:13
hazmatrogpeppe, http://www.google.com/imgres?imgurl=http://www9.pcmag.com/media/images/357982-will-ngff-replace-msata.jpg%3Fthumb%3Dy&imgrefurl=http://www.pcmag.com/article2/0,2817,2409710,00.asp&h=275&w=275&sz=64&tbnid=D6nAHdfDO9YioM:&tbnh=127&tbnw=127&zoom=1&usg=__fRuk3l4RfCrNCEY6gQ32RZaHaA8=&docid=uliVfmMKZbEonM&sa=X&ei=3fKUUrXUDaiusASxiYCYDw&ved=0CDwQ9QEwAw19:13
hazmatugh.. google links19:13
natefinchheh19:14
sinzuiI reported Bug #1255242 about a CI failure that relates to an old revision. Upgrading juju on hp cloud consistently breaks mysql19:24
_mup_Bug #1255242: upgrade-juju on HP cloud broken in devel <ci> <hp-cloud> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1255242>19:24
natefinchdammit, my mouse cursor disappeared.19:43
jamsinzui: a comment posted to bug #125524219:47
_mup_Bug #1255242: upgrade-juju on HP cloud broken in devel <ci> <hp-cloud> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1255242>19:47
jamI need to go to bed now19:47
jamsinzui: I don't doubt we have a problem, but from all indications this isn't an *upgrade* bug, because Upgrade is never triggered in that log file19:48
sinzuijam, yes, the issue is confusing, which is why we spent so long looking into it ourselves19:49
jamLine 50 is: 50:juju-test-release-hp-machine-0:2013-11-26 15:06:39 DEBUG juju.state.apiserver apiserver.go:102 <- [1] machine-0 {"RequestId":6,"Type":"Upgrader","Request":"SetTools","Params":{"AgentTools":[{"Tag":"machine-0","Tools":{"Version":"1.17.0-precise-amd64"}}]}}19:50
jamwhich is machine-0 telling itself that its version is 1.17.019:50
jamsinzui: ERROR juju runner.go:220 worker: exited "environ-provisioner": no state server machines with addresses found19:51
jamis probably a red herring19:52
jamI think it is the environ-provisioner waking up before the addresser19:52
sinzuijam, thank you for the comment. I think I see a clue. The bucket has a date str in it and we increment it because I think it can contain cruft. That date is not  even close to now. So out HP tests might be dirty. It also relates to our concern that we want juju clients to bootstrap matching servers.19:52
jamso it tries to see what API servers to connect to, but the addresser hasn't set up the IP address yet19:52
* sinzui arranges for a test with a new bucket19:52
jamsinzui: 2013-10-10 does look a bit old19:53
* jam goes to bed19:53
jamsinzui: ok, I thought I was going.... I'm all for being able to specify what version you want to bootstrap "juju bootstrap --agent-version=1.16.3" or something like that. I don't think users benefit from it over getting the latest patch (1.16.4) when their client is out of date.19:55
sinzuijam, fab. I will arrange another play of the test with a clean bucket19:56
rogpeppewallyworld_, fwereade: i've sent an email containing a branch and some comment on my progress19:59
* rogpeppe is done for the day20:06
rogpeppeg'night all20:06
hazmatwoot just got 666666 otp 2fa20:36
rick_h_sinzui: abentley do you guys have a good jenkins backup/restore config setup in place?20:49
rick_h_hazmat: lol, now if only it was fri-13th20:49
abentleyrick_h_: No.20:50
rick_h_abentley: ok so much for cribbing :P20:50
abentleyjam: We never released 1.16.4 because it would have introduced an API incompatibility.  It's not safe to assume that agent 1.16.4 is compatible with client 1.16.3.  This is not a theoretical risk.  It very nearly happened.21:02
natefinchmgz: you around?21:48

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!