[01:19] menn0: https://github.com/juju/gomaasapi/pull/53 [01:19] thumper: will look shortly [01:19] kk [01:42] Hi all, have an issue deploying mediawiki-single as per the 'getting started' guide (that I'm going through). The mysql unit (lxd) fails to start, looks like it cannot setup the storage (Fatal error: cannot allocate memory for the buffer pool). Appears there is no swap available etc. too. Any thoughts on how to debug this? [01:46] ugh, I was wrong about swap it seems. plenty there and free memory too [01:54] thumper: LGTM [01:56] veebers: so the machine for the mediawiki unit has come up but mediawiki isn't starting? [01:57] menn0: correct [01:57] menn0: status was "Hook start failed", I ssh-ed in to check error logs, mysql failed when trying to start the storage backend [01:58] menn0: I've, uh, since blown it away and I'm trying to just deploy a mysql unit now [01:58] veebers: i'm no mysql expert I'm afraid. it would be useful to see more of the logs around the failure though. [02:00] veebers: maybe one of the mysql options set by the mediawiki-single bundle isn't right? (https://api.jujucharms.com/charmstore/v5/bundle/mediawiki-single-9/archive/bundle.yaml) [02:02] menn0: Hmm, My initial thought on the error may be a red herring. The single mysql deploy failed too, similar logs. It appears something bad happens at the start and it keeps trying then at some point the storage backend can no longer even be intantiated. [02:03] menn0: logs: http://paste.ubuntu.com/16184547/ (line 52 there seems like the first bad thing, aborts on the next line) [02:03] veebers: weird... all I can suggest is to look at the mysql logs for the machine hosting the unit and the logs for the juju unit itself. [02:03] menn0: aye, posted is the mysql error log. [02:11] veebers: I just did some digging. it seems like mysql is wanting to allocate 12.5GB for it's buffer pool. This comes from the "dataset-size: 80%" (I'm guessing your machine has 16GB of RAM) [02:13] veebers: this is fine on a completely isolated machine but with lxd, they're all seeing the same available memory so there's probably not enough memory left after mediawiki is installed (and whatever else is running on your machine) [02:16] veebers: it would be interesting to see what happens when you deploy mysql with "dataset-size: 20%" or something [02:22] veebers: actually, better idea. before deploy into the model do this: lxc profile set juju- limits.memory 1GB [02:22] that will cause all lxd containers for the model to be limited to 1GB of RAM [02:22] then mysql will only attempt to grab 80% of 1GB [02:22] veebers: I bet that would do the trick [02:23] tweak the limit as you like of course [02:29] menn0: interesting, I've learned someting new. I'll have a crack at trying that. Thanks :-) [02:30] veebers: I learned a few things too :) [02:31] menn0: is that 'lxc profile' or 'lxd profile' [02:31] veebers: lxc [02:31] the command to interact with lxd is lxc (confusingly) [02:32] the lxd binary is the daemon itself [02:32] hah awesome, thanks for clarifying. [04:35] Bug #1551141 changed: Juju bootstrap local - cannot get replset config: not authorized for query on local.system.replset [04:40] thumper: review please https://github.com/juju/juju/pull/5322 [04:41] * thumper looks [04:45] menn0: done [04:45] * thumper afk for a bit [04:45] off to the storage unit [05:12] thumper: cheers [05:22] thumper: interestingly this was filed very recently: https://github.com/juju/juju/pull/5322 [05:22] thumper: this even: https://bugs.launchpad.net/juju-core/+bug/1576851 [05:22] Bug #1576851: juju debug-log -i unit-rabbitmq-server-0 is unfriendly [05:23] * thumper looks [05:23] yeah [13:30] katco, when you are in please ping me [13:33] now that is an existencial request [13:34] oh, missed the in :p [13:35] perrito666, :) === cmars` is now known as cmars [14:14] mgz: looking at the curl windows/centos SSL bug [14:20] mgz: what version of curl are we using on centos? [14:20] mgz: a quick google says older versions didn't have tls 1.1 or 1.2 enabled by default [14:21] sinzui: ^ alternatively, do can you give me access to an example centos machine? [14:21] RHEL-7 (lib)curl does not enable TLS > 1.0 by default. Please use the --tlsv1 option of curl to negotiate the highest TLS version supported by client/server. [14:22] natefinch: I think I can give you one based on a snapshot of the current host that runs unit tests. It will take a while because I need to make a snapshot of the current one. [14:23] natefinch: in direct answer to your question [14:23] curl --version [14:23] curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.19.1 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3 [14:25] natefinch: one moment I got confused. your working on bug 1576873 in our masses [14:25] ? [14:25] Bug #1576873: Juju2 cannot deploy centos or windows workloads on maas 1.9 [14:26] sinzui: yes [14:26] natefinch: the curl version might be different. Probably not because I think yum is used to install curl [14:27] sinzui: it probably doesn't matter. The error is fairly specific. [14:29] sinzui: pretty sure if we just add --tlsv1, it'll work. But having a real centos machine to poke at would help ensure the fix is correct without having to go through a full commit & CI run [14:30] I guess I could always just fire up my own VM [14:30] Bug #1577415 opened: resource-get hangs when trying to deploy a charm with resource from the store [14:42] natefinch: Our centos are stock centos7 with some yum packages installed. [14:42] sinzui: hmm, ok. from starting the stock centos image on GCE, it looks like it is using a new enough version of curl to support TLS 1.2 === thedac is now known as dames [14:46] natefinch: I am doploying a cento7 on the maas 1.9. I don't expect it to be different, but I want to make sure. [14:51] sinzui: thanks [14:51] sinzui: I'd like to take a look when it's ready [14:52] natefinch: okay, this will be an aventure. have you got ssh rules to get into munna? [14:52] sinzui: probably not, since I don't even know what munna is [14:53] natefinch: okay. once I am in, I will send you several ssh stanzas that will allow you to hop though all the inermediate hosts [14:53] sinzui: good times [15:02] ericsnow: standup time [15:18] alexisb: i think cherylj is out, so pinging you :) can we make this a blocker for 2.0 overall? 1577415 [15:18] alexisb: bug 1577415 [15:18] Bug #1577415: resource-get hangs when trying to deploy a charm with resource from the store [15:20] yes we can add it to the list of blockers [15:21] alexisb: also, we're trying to get 1-pagers complete for a review tomorrow. can we work on that, or do we still need to be working on blocker bugs? === marlinc_ is now known as marlinc === cargonza_ is now known as cargonza === arosales_ is now known as arosales === hazmat_ is now known as hazmat [15:32] katco: we all just bailed. basically done anyway [15:39] Does anyone know how to get to the controller machine using Juju commands? One machine didn't provision for me, and I am trying to get the logs from the controller, but I don't know how to refer to it using juju commands. [15:45] ericsnow: natefinch: redir: hilarious timing. right after you asked that, my power went out === katco` is now known as katco [15:52] mbruzek: juju switch controller:admin && juju ssh 0 [15:53] thanks natefinch I am going to add that to the developer documentation [15:53] mbruzek: (where controller is the name of the controller, obviously) [15:53] Yes. [16:19] ericsnow: natefinch: ok plan for specs vs. bugs [16:19] ericsnow: please time-box your work on the spec to lunch, and then switch over to bugs [16:19] natefinch: please just keep working on bugs [16:19] perrito666, when you have a second I would like to chat with you [16:19] katco: cool thanks [16:19] katco: k [16:19] ericsnow: natefinch: i'll send out another email to ian letting him know the priority call [16:19] alexisb: is now ok? [16:20] ericsnow: natefinch: ta! [16:20] perrito666, of course [16:20] hangout or irc? [16:26] ericsnow: when you do pick up another bug, bug 1576913 looked related to what you were working on friday. [16:26] Bug #1576913: StatusHistorySuite.TestPruneStatusHistory [16:26] katco: k, thanks [16:28] natefinch: and this looked related to the area you're in: bug 1577415 [16:28] Bug #1577415: resource-get hangs when trying to deploy a charm with resource from the store [16:28] Bug #1576705 changed: cloudImageMetadataSuite.TestSaveDiffMetadataConcurrentlyAndOrderByDateCreated wrong order [16:28] natefinch: err... wrong bug: 1576695 [16:28] natefinch: bug 1576695 [16:28] Bug #1576695: Deployer cannot talk to Juju2 (on maas2) because :tlsv1 alert protocol version [16:28] * katco spams the channel [16:29] * natefinch breaks everything [16:29] honestly, I'm pretty happy breaking people who aren't supporting the most secure connection possible.. .especially when it's not exactly a bleeding edge configuration [16:31] I wish we could have aliases for clouds, so when I type juju bootstrap gce gce or ec2 ec2, it actually worked [16:31] also, defaulting to the name of the cloud would be nice [16:34] katco: yeah, that's definitely due to the TLS change. Oddly enough, the error message is ssl.SSLError: [Errno 1] _ssl.c:510: error:1409442E:SSL routines:SSL3_READ_BYTES:tlsv1 alert protocol version .... SSL3??? definitely not secure. We weren't even supposed to be supporting that previously. If it worked with SSL3 before, that was a bug [16:35] natefinch: glad it is in capable hands :) [16:36] katco: do you know who controls the deployer code? I don't even know where it lives or who to talk to about it [16:36] natefinch: that is ecosystems, i.e. marcoceppi [16:46] sinzui: FWIW, I can run run that curl using the same flags etc from a generic GCE Centos7 VM to a server deployed from master's Juju... so not sure what's different about the CentOS that I'm running vs. what CI is deploying. [16:51] natefinch: yeah. I don't know either. that last two I tried to deploy just failed to come up. I need to look into the health ot the maas. [16:52] natefinch: I could just re-run the failing job with --keep-env so that we can get tot the actualy machine that failed [16:52] sinzui: that would be useful [16:53] * sinzui starts job [16:58] Bug #1576728 changed: ConnectSuite.TestLocalConnectError: windows cannot connect to local lxd server [17:08] natefinch: I am in the centos instance on the maas. I see [17:08] curl --version [17:08] curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.15.4 zlib/1.2.7 libidn/1.28 libssh2/1.4.3 [17:08] * sinzui prepares connection info [17:10] sinzui: that is a slightly different version, at least of NSS [17:10] natefinch: yeah. almost got you the info [17:11] natefinch: maas centos images have a different origin. They officially come from maas [17:11] by way of something [17:16] natefinch: check you email for connection info [17:19] sinzui: thanks [17:19] natefinch: ha ha. since munna can only access ubuntu/canonical machines. It cannot get updates. I wonder if curl has an update, but we cannot get it [17:22] natefinch: damn, I can see the centos images in the maas are current from http://images.maas.io/ephemeral-v2/daily/ [17:26] gah, is there a way to get the model UUID from juju? [17:27] usually I just peel it off the instanceID, but I guess maas doesnt' do that [17:29] doesn't matter... I get error even with a bad URL, makes sense. [17:29] sinzui: gotta run for lunch, back in an hour [17:30] sinzui: I'm logged in, and would like to continue after lunch, but feel free to kick me off if you need to === natefinch is now known as natefinch-lunch [17:30] natefinch: The machines are yours for now. I don't think CI will miss them today [17:48] natefinch-lunch: what you need for deployer? [17:55] ericsnow: got a second to explain your last comment on that review? [17:55] redir: sure [17:56] k. I'll be in moonstone when you get to a stopping point. === natefinch-lunch is now known as natefinch [18:35] marcoceppi: detailed in #juju, but essentially core disabled everything but TLS1.2 and the python deployer chokes on that for some reason [18:36] marcoceppi: https://bugs.launchpad.net/juju-core/+bug/1576695 [18:36] Bug #1576695: Deployer cannot talk to Juju2 (on maas2) because :tlsv1 alert protocol version [18:43] sinzui: it's definitely the NSS version... the version oni the CI machine is from April 2014. The one on my GCE instance is from June 2015. In between there, TLS 1.1 and 1.2 got enabled by default... they weren't before that [18:47] natefinch: do we need to report a bug against maas. the images crom from them [18:48] wow, maas centos images are 2 years stale? [18:48] sinzui: I have no idea if their image is "incorrect" [18:48] sinzui: this one library seems significantly out of date in a way that happens to screw us [18:49] natefinch: I think an old image is being adopted. I am going to have a chat with some parties [18:54] sinzui: if we add --ciphers ecdhe_rsa_aes_256_sha --tlsv1 to the curl command, it works [18:54] (in theory we could add all the ciphers that the server supports, but I know it supports that one so I just chose one) [18:55] natefinch: oh, nice, I am still hoping for a talk with others about fresh images. Surely something else will fail this year [18:59] sinzui: defaulting to having tls 1.1 and 1.2 disabled is kind of crazy. [19:00] natefinch: agreed [19:07] Bug #1577524 opened: Error calling ''lxd forkstart juju-machine-2-lxd-0 /var/lib/lxd/containers /var/log/lxd/juju-machine-2-lxd-0/lxc.conf'': err=''exit status 1'' [19:13] Bug #1577524 changed: Error calling ''lxd forkstart juju-machine-2-lxd-0 /var/lib/lxd/containers /var/log/lxd/juju-machine-2-lxd-0/lxc.conf'': err=''exit status 1'' [19:28] Bug #1577524 opened: Error calling ''lxd forkstart juju-machine-2-lxd-0 /var/lib/lxd/containers /var/log/lxd/juju-machine-2-lxd-0/lxc.conf'': err=''exit status 1'' [19:33] natefinch: lp:juju-deployer is the code [19:48] fwereade_: hey, I'm around early if you want to jump on the hangout [19:50] sinzui: do you know what version of python is running for the deployer tests? Do we log that anywhere? I see 2.7... but 2.7.what? [19:51] sinzui: sounds like some older versions of python 2.7 don't have TLS 1.2 support [19:52] (yay for runtime dependencies :/) [19:52] natefinch: as always, that depends on the ubuntu version of the host. We test deployer with xenia, wily, trusty [19:53] sinzui: np, I'll check.... I'm guessing trusty comes with 2.7.6 [19:54] sinzui: yep, that's it [19:54] natefinch: trusty is Python 2.7.6. xenial is 2.7.11+ [19:54] fantastic [19:55] Python 2.7.6 was released on November 10, 2013 [19:55] sinzui: is there any way the deployer can be made to require a newer version of python? [19:56] natefinch: That is unlikely. deployer in trusty needs to work with trusty [19:57] sinzui: and trusty can't be updated with a version of python newer than 2.5 years old? [19:57] natefinch: Security updates are made from time to time. this is the reality that users have https://launchpad.net/ubuntu/+source/python2.7 [19:58] sinzui: I'd call "not having support for tls 1.2" a security issue :/ [19:58] natefinch, if there is an issue with deployer we should open a bug against it [19:58] sinzui: sorry, don't mean to be grumpy [19:58] alexisb: yep [19:58] we have plenty of our own bugs to work [19:58] marcoceppi and team are more then capable :) [19:58] alexisb: really, it's just the version of python on trusty that is the problem [19:59] natefinch: https://launchpad.net/ubuntu/+source/python3.4 is in trusty. If that is suitable, then deployer needs to require it [19:59] alexisb: maybe there's a code work around, I dunno [20:00] I am just reading back scroll but it looks to me that deployer needs to learn about the right version of python in trusty? [20:00] alexisb: only if the right version is in trusty [20:00] alexisb: well, it sounds like their choices are 2.7.6, which is flawed, or 3.x ... which may be a non-trivial change [20:01] natefinch, either way looks like a bug against deployer needs to be opened so discussion can start there and we can get the right eyes on the problem [20:02] Bug #1577550 opened: juju fails to provision machine and will not retry [20:02] alexisb: absolutely [20:02] sinzui: should we move the current bug to deployer? https://bugs.launchpad.net/juju-core/+bug/1576695 [20:02] Bug #1576695: Deployer cannot talk to Juju2 (on maas2) because :tlsv1 alert protocol version [20:02] sinzui: btw, pretty sure the fact it's on Maas is a red herring [20:03] natefinch: I think we will add them so that we can track the issue too. [20:03] sinzui: sounds good [20:03] what [20:05] sinzui: just saying, the bug title mentions maas2, but I think that's not an interesting data point, other than it's trusty [20:05] natefinch: something is amiss. The bug is about deployer 2.x, not maas 2. [20:05] * sinzui fixes bug and issue [20:05] there are two bugs [20:05] one is maas centos images [20:06] one is python 2.7.6 [20:06] ...and one is windows that I'm just starting to look at, and I don't really understand what the failure is say: http://reports.vapour.ws/releases/3935/job/maas-1_9-deploy-centos-amd64/attempt/364#highlight [20:08] mgz: those two bugs you mentioned are filed separately [20:08] natefinch: I know, but they were the two I was just looking at [20:09] mgz: ahh ok [20:09] natefinch: windows is likely similar issue to centos, but for some reason we don't have the cloud-init logs from the machine [20:09] katco: you were right that it looked like the same thing: http://reviews.vapour.ws/r/4752/ [20:10] the winrm log collection times out trying to get in [20:10] it worked when the test was passing [20:11] mgz: maybe also a python 2.7 problem? [20:12] ericsnow: awesome! [20:13] natefinch: yeah, could be, not sure what version the image includes [20:13] katco: quick review? [20:17] ericsnow: sure sec [20:17] the last successful run cloudbase-init log is has rather a lot of non-confidence inspiring tracebacks [20:18] ericsnow: is the diff reversed? you removed your time.Sleep? [20:19] we probably want to regenerate the windows image in our maas anyway, but I'm not sure if tls stuff is fixed in newer cloudbase bits [20:19] katco: moved it over to the common helper [20:19] ericsnow: ahh [20:21] our image has 0.9.8.dev74, there's a 0.9.9 at least [20:22] ericsnow: ship it [20:22] katco: thanks [20:26] mgz: any thoughts on how I can debug the windows problem, or are you willing to look into that? [20:29] natefinch: with the machine setup failing hard enough to break winrm log collection it's hard === mpontillo_ is now known as mpontillo [20:30] we could boot one and leave it up, see if it's possible to get in manually [20:31] mgz: sounds good [20:33] sinzui: ^do you remember if it's possible to rdp into a maas-booted windows image somehow? === Tribaal_ is now known as Tribaal [20:49] mgz: I think it is possible but the path is mad. I think we need to sshutlle through ci-gateway => munna like we do to see maas via https. I expect the maas to be on at least one of the networks we would try to access the window's instance [20:50] mgz: oh, that assumes we know the Administrator password or ubuntu if we create an ubuntu user [20:50] sinzui: yeah, I'm not sure I've actually tried it before [20:51] sinzui: it seems regardless we want updated images, which is a whole process [20:51] but we did write that down [20:54] natefinch: see addDownloadToolsCmds in cloudconfig/userdatacfg_win.go for how the tools are being fetched [20:58] mgz: thanks [20:59] which boils down to System.Net.Http.HttpClient [20:59] mgz: I think http://wiki.cloudbase.it/maas have been used [20:59] mgz: I recall this was also tried https://maas.ubuntu.com/docs/os-support.html [21:01] mgz: looks like it's probably an easy enough fix.... I think it's just a matter of setting it to try TLS 1.2 first [21:02] natefinch: that might mean a patch to cloudbase code outside of juju though [21:03] going by some stackoverflow bits that suggest Tls11 and Tls12 are not in the default list [21:03] hm, maybe that can be done by juju passed cloudinit steps too [21:04] mgz: I think so [21:04] mgz: seems like we can add it right into our code, just one line [21:04] probably another "do it in both places for now" thing [21:04] yeah [21:04] gotta go, dinner time === natefinch is now known as natefinch-afk [21:05] natefinch-afk: if you come up with a speculative branch, we can run the test with a binary and see [21:06] mgz: I think we also need to ask what to azure win images have? [21:09] sinzui: I guess, but we don't have an azure windows deploy test [21:10] this is also going to be windows version dependent as the default behaviour changes with .NET releases [21:11] Bug #1577567 opened: relation output in juju status is ambiguous [21:11] Bug #1577568 opened: juju 1.25.5 problems with bonded nics [21:11] Bug #1577569 opened: 1.25.5: failed to retrieve the template to clone - error executing "lxc-start" [21:12] mgz: We don't have an windows streams for azure to tests. Could we contive to use the azure-arm's inbuilt support for windows? It might take 30 minutes to setup such a job [21:13] sinzui: probably, worth poking axw about it === jillr_ is now known as jillr [22:53] Bug #1577587 opened: Status public members should not be preceded by Status [22:53] Bug #1577589 opened: Valid in status package needs signature change. [22:53] Bug #1577590 opened: Status History Logs Squasher needs extra testing. [23:02] Bug #1577593 opened: status sitory pruner needs to remove only once [23:11] Bug #1577593 changed: status sitory pruner needs to remove only once [23:20] Bug #1577593 opened: status sitory pruner needs to remove only once [23:32] Bug #1577594 opened: params.Status->status.Status should happen in status history api layer not command [23:53] Bug #1577594 changed: params.Status->status.Status should happen in status history api layer not command [23:53] Bug #1577598 opened: Use testing.FakeHomeSuite instead of utils.SetHome().