[02:51] <pmatulis> ping me please if anyone knows why i cannot get past this.  ssh key added via web ui.
[02:51] <pmatulis> $ juju -v status
[02:51] <pmatulis> 2012-06-06 22:49:17,153 DEBUG Initializing juju status runtime
[02:51] <pmatulis> 2012-06-06 22:49:17,162 INFO Connecting to environment...
[02:51] <pmatulis> 2012-06-06 22:49:17,227 DEBUG Connecting to environment using node-e4115b138176.local...
[02:51] <pmatulis> 2012-06-06 22:49:17,227 DEBUG Spawning SSH process with remote_user="ubuntu" remote_host="node-e4115b138176.local" remote_port="2181" local_port="37322".
[02:51] <pmatulis> 2012-06-06 22:49:17,312 ERROR Invalid SSH key
[02:59] <pmatulis> .
[02:59] <pmatulis> disregard.  i think i know what's wrong
[03:00] <bigjools> pmatulis: do tell, for future reference
[03:01] <pmatulis> bigjools: well, i've been having a lot of WOL trouble on my nodes.  i re-read the docs (wiki) and it does say that after 'juju bootstrap' WOL is supposed to kick in so ubuntu gets installed, *then* i do 'juju status'
[03:02] <bigjools> yeah, it takes a while to install
[03:02] <pmatulis> but i don't even get that far.  WOL doesn't work for me
[03:03] <bigjools> that's a bizarre error from juju status then
[03:04] <pmatulis> unless i'm not reading the docs right, does WOL kick in after 'juju bootstrap' or does the OS get installed right away?
[03:04] <pmatulis> and then WOL
[03:04] <bigjools> bootstrap will power up a node and then install ZK on it
[03:05] <pmatulis> so by power up you mean WOL?
[03:05] <pmatulis> and what is ZK?
[03:05] <bigjools> WoL if you are using it yes.  ZK is zookeeper
[03:06] <pmatulis> k
[03:06] <bigjools> well, the OS is installed first
[03:06] <bigjools> can take an hour or so :(
[03:06] <bigjools> I got it down to 10m with an SSD
[03:06] <pmatulis> an hour??
[03:07] <pmatulis> you have SSD on your nodes?  what h/w is that?
[03:07] <bigjools> the dev environment on my laptop :)
[03:07] <pmatulis> ah
[03:07] <bigjools> I use VMs to test
[03:08] <pmatulis> i didn't realize you could do that
[03:08] <bigjools> yeah it needs some trickery to set up
[03:08] <pmatulis> feel like sharing?
[03:09] <bigjools> there's a vdenv directory in the source tree and it contains a readme
[03:09] <bigjools> it pokes stuff directly into cobbler
[03:09] <bigjools> you can use that or use it as inspiration
[03:11] <pmatulis> thanks.  i'll consider that
[13:01] <pmatulis> so the server team wiki says 'juju bootstrap' will install ubuntu on a node yet after following the doc very closely all my nodes in 'Ready' state (prior to bootstrap) already have ubuntu installed
[13:03] <pmatulis> going ahead anyway, 'juju bootstrap' does not 'boot the nodes' (why plural?) and says if WoL is not working i should reset the power manually, which i did but to no avail.  nothing seems to happen and 'juju status' still give errors:
[13:04] <pmatulis> 2012-06-07 08:40:55,867 DEBUG Initializing juju status runtime
[13:04] <pmatulis> 2012-06-07 08:40:55,875 INFO Connecting to environment...
[13:04] <pmatulis> 2012-06-07 08:40:55,992 DEBUG Connecting to environment using node-e4115b138176.local...
[13:04] <pmatulis> 2012-06-07 08:40:55,993 DEBUG Spawning SSH process with remote_user="ubuntu" remote_host="node-e4115b138176.local" remote_port="2181" local_port="47339".
[13:04] <pmatulis> 2012-06-07 08:40:56,690 ERROR Invalid SSH key
[13:05] <roaksoax> pmatulis: is this the same cluster alexmoldovan was using yesterday?
[13:05] <pmatulis> roaksoax: yup
[13:05] <roaksoax> pmatulis: so I believe he got them installed without using juju
[13:05] <roaksoax> to try to gfix the wol issue
[13:06] <pmatulis> roaksoax: yes, like i said above, ubuntu gets installed on all nodes
[13:06] <roaksoax> pmatulis: right, but juju bootstrap only selects 1 node and sends a message to only 1 node and that's it
[13:07] <roaksoax> pmatulis: it doesn't WoL all the nodes
[13:08] <pmatulis> roaksoax: that's fine, it still doesn't work
[13:09] <roaksoax> pmatulis: is cloud init correctly importing the keys on the node?
[13:09] <pmatulis> roaksoax: no idea
[13:09] <roaksoax> pmatulis: well that might be the problme then
[13:09] <pmatulis> roaksoax: how to check?
[13:10] <roaksoax> pmatulis: ssh into the node and check cloud-init logs in /var/log
[13:10] <pmatulis> roaksoax: credentials?
[13:11] <roaksoax> pmatulis: did you guys set up an SSH key inMAAS?
[13:11] <pmatulis> roaksoax: yes
[13:11] <roaksoax> smoser: MAAS still has the bug for which it doesn't update cloud-init meta-server right?
[13:12] <roaksoax> pmatulis: so if you added a public ssh key in MAAS, then you should be able to ssh using that (from the node the private key is of course)
[13:13] <smoser> roaksoax, i dont know. we saw that in the ephemeral (commissioning) environment
[13:13] <smoser> but surely it does it on deployment
[13:13] <smoser> as because up until the deployment request, the user to do that wasn't yet known
[13:14] <roaksoax> smoser: right, but remember that during the demo, if we added the ssh public key in maas *after* enlisting a node, once it got deployed, it didn't obtain the ssh key
[13:14] <pmatulis> roaksoax: after grepping syslog for dhcp assignment (ip address), dns is broke, and trying ssh i am prompted for a p/w.  is the node user 'ubuntu'?
[13:14] <roaksoax> so we had to add the public key to maas before enlistment
[13:14] <smoser> roaksoax, that was the commissioning environment i think.
[13:15] <roaksoax> pmatulis: yes
[13:15] <smoser> otherwise,t hats completely broken
[13:15] <roaksoax> smoser: nope, that was maas not updating the meta server
[13:15] <pmatulis> roaksoax: add the ssh key before enlistment?
[13:16] <smoser> well, roaksoax yes, but the meta-data for the commissioning environment. not for the install.
[13:16] <smoser> otherwise it is 100% completely and stupidly broken
[13:16] <smoser> as how could it possibly guess which user is going to deploy?
[13:16] <smoser> or i'm just missing something
[13:16] <roaksoax> smoser: ubuntu?
[13:17] <smoser> no. the MAAS user
[13:17] <smoser> a maas user has ssh keys
[13:17] <smoser> ssh keys are in the metadata provided to the instance after it iinstalls
[13:17] <smoser> so maas has to present the ssh keys of the correct maas user
[13:18] <roaksoax> smoser: right, so that's why each node is assigned to a maas user right?
[13:20] <smoser> roaksoax, right.
[13:20] <smoser> but when does that assignment happen?
[13:20] <smoser> and how does it happen?
[13:21] <pmatulis> alexmoldovan: hi
[13:21] <roaksoax> smoser: idk if this was removed, but you can manually do that in the WebUI, and if no maas user was selected, defaults to the admin user
[13:22] <roaksoax> pmatulis: so we found an isue on which the meta-data server was not being updated correctly, I just can't recall this affect the ssh keys, but we had to made changes in the meta-data server first, and then enlist so that these get to the machines when commissioned
[13:25] <alexmoldovan> pmatulis: hi
[13:26] <smoser> pmatulis, if you have the ability, you could re-install setting the password as i suggested a couple days ago
[13:26] <smoser> so it is 'ubuntu' and then ssh in that way
[13:26] <roaksoax> pmatulis: yeah you can also do that
[13:31] <smoser> pmatulis, i'm suggesting that not as a fix.
[13:31] <smoser> but as a way to figure out what is going wrong.
[13:32] <smoser> because i thikn this is a serious issue and want it to be fixed.
[13:32] <pmatulis> smoser: acknowledged
[14:20] <cheez0r> question: where does the ssh key that is embedded into the node originate? I am getting 'Invalid SSH Key' on juju status even though I have ssh keys generated; when I get into the node I see no entries in ubuntu user's ~/.ssh/authorized_keys file.
[14:37] <cheez0r> hello
[14:37] <m_3> hey.. look at that!
[14:45] <hazmat> cheez0r, hi
[14:46] <hazmat> cheez0r, if you have a moment, i'd like to debug that with you
[14:46] <cheez0r> hazmat: I do.
[14:46] <hazmat> cheez0r, if your on the machine, do you see any  files in /var/log/cloud
[14:46] <hazmat> cheez0r, cloudinit is what actually puts the keys into place
[14:46] <cheez0r> What I'm doing now is that I've populated an SSH key in the MAAS gui for my user, and I'm about to go and re-commission all of my nodes in an attempt to make it embed that key in them.
[14:47] <hazmat> cheez0r, it would be nice to look at those files to understand why its not able to put the files into place
[14:47] <hazmat> cheez0r, if you could pastebin those files that would helpful
[14:47] <cheez0r> on the MAAS node, there is no /var/log/cloud
[14:47] <cheez0r> the bootstrap node is rebooting.
[14:47] <hazmat> sudo apt-get install pastebin && pastebin /var/log/cloud-init.log
[14:48] <hazmat> and pastebin /var/log/cloud-init-output.log
[14:48] <hazmat> sorry its not /var/log/cloud .. there just files in /var/log
[14:48] <cheez0r> neither of those files exist on the MAAS node, I'll look on the bootstrap node when it's done rebooting
[14:48] <hazmat> yeah.. it won't be on the maas node, but the nodes maas boots as a result of juju commands
[14:49] <hazmat> cheez0r, thanks
[14:49] <cheez0r> no, thank you ;)
[14:53] <m_3> hazmat's da man
[14:54] <hazmat> i have to step out for 15m, my dog desperately needs a walk, back soon
[14:54] <cheez0r> have fun with that ;)
[14:58] <roaksoax> smoser: the commissioning image now cleanly shutdowns the servers right?
[14:59] <roaksoax> smoser: or was that a hack?
[14:59] <cheez0r> http://pastebin.ubuntu.com/1028750 is the cloud-init.log
[14:59] <smoser> roaksoax, it does a shutdown.
[15:00] <roaksoax> pmatulis: so after commissioning, it should WoL just fine ^^
[15:00] <roaksoax> pmatulis: since it does a shutdown
[15:00] <roaksoax> as stated above
[15:01] <smoser> hazmat, well, its not getting the key.
[15:01] <smoser> and we can't see that (unfortunately) in this output
[15:01] <smoser> because i suspect its just not there, and cloud-init is allowing that.
[15:01] <cheez0r> where does cloud-init pull the key from?
[15:01] <smoser> now.... juju would insert the keys itself, so that would work around the issue.
[15:01] <smoser> cheez0r, it pulls it from the metadata service.
[15:01] <cheez0r> smoser: juju bootstrap is not inserting my keys at all
[15:01] <cheez0r> smoser: how do they get populated there?
[15:02] <smoser> and you can get that data from somewhere other than the node, but you ahve to get the nodes ccredentials
[15:02] <smoser> which rae in the db
[15:02] <smoser> its less than simple
[15:02] <cheez0r> yeah I can dig that
[15:02] <smoser> ok.
[15:02] <cheez0r> I'm just trying to figure out in the standard maas build-up process where that ssh key originates from
[15:02] <smoser> well, the node has some Oauth creds associated with it.
[15:02] <smoser> those creds are given to it, and then it uses it to get userdata and metadata (including ssh keys)
[15:02] <cheez0r> the step-by-step just says run ssh-keygen and then juju bootstrap and that's the sum total of work to embed the ssh key
[15:03] <smoser> well, if you used juju, then it doesn't need ssh keys from maas
[15:03] <cheez0r> how does juju embed them, then?
[15:03] <smoser> it shoves them into user-data
[15:03] <cheez0r> my authorized_keys file on the target node remains empty
[15:03] <smoser> in which case, if the node you showed cloud-init log of was delpoyed by juju i would have thought it would get your ssh keys.
[15:04] <cheez0r> that's what the pastebin is- a juju bootstrapped node
[15:04] <smoser> cheez0r, /home/ubuntu/.ssh/authorized_keys is empty?
[15:04] <cheez0r> yes
[15:04] <smoser> can you pastebin the /var/log/cloud-init-output.log ?
[15:04] <cheez0r> there isn't one.
[15:04] <cheez0r> only cloud-init.log exists
[15:06] <smoser> if there isnt one, then it did not get user-data from maas.
[15:06] <smoser> so the debugging path here then is to get the oauth creds fro that node from the maas server
[15:06] <smoser> and then use cloud-init's MAAS datasource as a client to see what it is seeing
[15:06] <cheez0r> ok so I just rebooted the bootstrapped node
[15:06] <smoser> and it seems to me that it is not seeing anything :)
[15:06] <cheez0r> and cloud-init-output.log exists
[15:07] <smoser> roaksoax, ^
[15:07] <smoser> was it you, cheez0r that aw this "reboot fixes it" issue?
[15:07] <smoser> before?
[15:07] <smoser> someone did. maybe it was pmatulis
[15:07] <cheez0r> the total content of the file, however, is "cloud-init boot finished at Thu, 07 Jun 2012 15:06:34 +0000. Up 15.03 seconds
[15:07] <cheez0r> "
[15:07] <cheez0r> yeah it was me, but it only did once
[15:08] <cheez0r> authorized_keys remains empty
[15:08] <roaksoax> cheez0r: pastebin apache error and access log
[15:08] <cheez0r> from maas node?
[15:08] <roaksoax> yep
[15:08] <cheez0r> working
[15:10] <cheez0r> access.log is 45k lines
[15:10] <cheez0r> want all of that?
[15:10] <hazmat> back
[15:10] <hazmat> so we should look at the output of the metadata service then
[15:10] <hazmat> to verify what cloud-init is seeing
[15:10] <hazmat> or poke at the instance user-data on disk
[15:11] <hazmat> smoser, does the maas data source cache that to /var/lib/cloud/instance/user-data?
[15:11] <smoser> hazmat, it shoudl, but that will be empty.
[15:11] <smoser> oh, cheez0r i forgot that you can get to the instance (duh, you were pastebining from it)
[15:11] <hazmat> yeah.. its at http://192.168.1.1/MAAS/metadata//2012-03-01/meta-data/instance-id
[15:12] <cheez0r> we still after the apache2 logs then?
[15:12] <smoser> well, its not really at that.
[15:13] <smoser> cheez0r, can you.
[15:13] <hazmat> smoser, well there's a user-data and public-keys at the same address
[15:13] <hazmat> smoser, that's how cloudinit gets the data from my read of maasdatasource
[15:13] <smoser> hazmat, they're oauth protected
[15:13] <smoser>  * open up the file in /etc/cloud/cloud.cfg.d/91*.cfg
[15:13] <smoser>  * there is a 'url' setting there
[15:13] <cheez0r> http://paste.ubuntu.com/1028772 and http://paste.ubuntu.com/1028774 for the apache2 access.log and error.log
[15:14] <smoser>  * run: python /usr/share/pyshared/cloudinit/DataSourceMAAS.py --config /etc/cloud/cloud.cfg.d/91*.cfg crawl <htat-url>
[15:14] <smoser> or even just "get" on that url (it should show metadata and userdata)
[15:14] <smoser> and then you can 'get' the same url with those appended
[15:14] <smoser> but i'm pretty sure you're not getting anything there.
[15:17] <cheez0r> so from the bootstrapped node, I open that file, then run that python command against the URL?
[15:17] <cheez0r> there is no 91*.cfg in that directory
[15:17] <smoser> what files are in that directory?
[15:18] <smoser> this is wierd, btw :)
[15:18] <cheez0r> 05_logging.cfg; 90_dpkg.cfg; 90_dpkg_local_cloud_config.cfg; 90_dpkg_maas.cfg
[15:18] <cheez0r> README
[15:18] <cheez0r> that's it
[15:18] <smoser> one of the other 90_. probably _dpkg-maas
[15:18] <smoser> sorry
[15:18] <cheez0r> yes, I agree, since I'm following the most basic process possible to build up this config
[15:18] <smoser> i was just going from memory
[15:18] <smoser> look at those, one of them will have yaml with oauth creds
[15:19] <cheez0r> there's a consumer_key metadata_url token_key and token_secret in the file
[15:20] <cheez0r> so I ran "python /usr/share/pyshared/cloudinit/DataSourceMAAS.py --config /etc/cloud/cloud.cfg.d/90_dpkg_maas.cfg crawl http://192.168.1.1/MAAS/metadata/
[15:20] <cheez0r> "
[15:21] <cheez0r> response was "== http://192.168.1.1/MAAS/metadata/2012-03-01 ==
[15:21] <cheez0r> 2012-03-01
[15:21] <cheez0r> latest
[15:21] <cheez0r>  
[15:21] <cheez0r> == http//192.168.1.1/MAAS/metadata/latest ==
[15:21] <cheez0r> 2012-03-01
[15:21] <cheez0r> latest
[15:21] <cheez0r>  
[15:21] <cheez0r> "
[15:25] <hazmat> that's a serious bug in maas then
[15:25] <hazmat> smoser, what's the difference between crawl and check-seed?
[15:26] <cheez0r> should the ssh key be in the cobbler system variables?
[15:26] <cheez0r> I don't see anything regarding SSH in it.
[15:28] <hazmat> cheez0r, could you run the same command with check-seed instead of crawl
[15:28] <hazmat> ie. python /usr/share/pyshared/cloudinit/DataSourceMAAS.py --config /etc/cloud/cloud.cfg.d/90_dpkg_maas.cfg check-seed http://192.168.1.1/MAAS/metadata/
[15:28] <cheez0r> k
[15:28] <cheez0r> let me pastebin output, more significant
[15:29] <smoser> crawl would potentiall do more.
[15:29] <smoser> check-seed would only verify that the data it wants is there.
[15:29] <hazmat> smoser, well thats sort of what i want to see
[15:29] <smoser> right. i just hadn'tremembered the correct usage of that tool
[15:30] <cheez0r> I see the public-keys in the output
[15:30] <hazmat> cheez0r, can you pastebin the output of that command
[15:30] <smoser> hm..
[15:30] <hazmat> the check-seed one
[15:30] <cheez0r> http://pastebin.ubuntu.com/1028800
[15:30] <cheez0r> yeah, that's it
[15:31] <cheez0r> that's my correct ssh key at the bottom
[15:31] <hazmat> hmm
[15:31] <hazmat> okay.. so the data is there
[15:32] <hazmat> that's good
[15:32] <cheez0r> so if the data is there, what would make it not be utilized by cloud-init?
[15:32] <hazmat> cheez0r, a bug in cloud-init
[15:33] <cheez0r> hrm, ok
[15:34] <hazmat> smoser, can we get anymore verbose out of cloud-init for this
[15:34] <smoser> well...
[15:34] <smoser> kindo f no
[15:34] <smoser> because i'm suspecting right now
[15:34] <smoser> that if he reboots
[15:34] <smoser> if cheez0r does:
[15:34] <smoser>  * rm -Rf /var/lib/cloud
[15:34] <smoser>  * sudo reboot
[15:34] <hazmat> smoser, but before he reboots can we determine what went wrong
[15:35] <smoser> then it will "just work"
[15:35] <hazmat> smoser, sure, that will make it work..
[15:35] <smoser> what i think is wrong is that nothing was there.
[15:35] <smoser> and now it is.
[15:35] <hazmat> smoser, but how do we figure out what's wrong
[15:35] <hazmat> smoser, yeah.. that would do it
[15:35] <hazmat> smoser, it would be nice if cloud init captured that data that its using to disk
[15:35] <hazmat> smoser, i thought it did that into /var/lib/cloud/instance
[15:35] <smoser> well, it does
[15:35] <smoser> :)
[15:36] <hazmat> smoser, doh
[15:36] <smoser> and it captured a nice empty cloud-config, right?
[15:36] <hazmat> smoser, we haven't verifeid tyat
[15:36] <smoser> cheez0r, do you have a /var/lib/cloud/instance dir ?
[15:36] <hazmat> cheez0r, could you pastebin  /var/lib/cloud/instance/user-data.txt
[15:36] <hazmat> you'll probably need a sudo on that
[15:37] <smoser> actually. thi sis really wierd.
[15:37] <smoser> just for fun, can you also tell me what is in 'date -R' on the node?
[15:37] <hazmat> cheez0r, ^
[15:37] <smoser> (and compared against that on the maas server)
[15:37] <cheez0r> give me a sec
[15:38] <cheez0r> shows current date in EDT
[15:38] <smoser> the issue i'm thinking of (bug 978127) doesn't usually expose itself here.
[15:38] <smoser> but rather in commissioning.
[15:38] <hazmat> ChanServ, could you pastebin user-data.txt
[15:38] <smoser> cheez0r, i'm interested in seeing if they're off
[15:39] <smoser> so 'date -R' on one and 'date -R' on the other.
[15:39] <hazmat> smoser, let's see the data first
[15:39] <smoser> to see if they're reasonably in sync
[15:41] <hazmat> smoser, date out of sync would have hit us running the datasource by hand as well
[15:41] <cheez0r> yes, the bootstrapped node is in EDT and the MAAS node is in EDT
[15:41] <cheez0r> err.
[15:41] <cheez0r> CDT.
[15:42] <cheez0r> bootstrapped node == EDT; MAAS node == CDT
[15:42] <smoser> nah. i'm not worried about timezone
[15:42] <smoser> thats why i said date -R
[15:42] <hazmat> cheez0r, please pastebin  /var/lib/cloud/instance/user-data.txt
[15:42] <hazmat> smoser, it got some of the data, it picked up hostname for example and set that
[15:42] <hazmat> smoser, per the cloud-init output
[15:42] <cheez0r> that is output of date -R
[15:42] <smoser> $ date -R
[15:42] <smoser> Thu, 07 Jun 2012 11:42:53 -0400
[15:43] <smoser> thats what my date -R shows.
[15:43] <cheez0r> there is one hour difference, MAAS node is accurate to CDT, bootstrapped node is accurate to EDT
[15:43] <smoser> but yes, plesae go ahead and pastebin what hazmat is asking for.
[15:43] <cheez0r> MAAS node == -0500; bootstrapped node == -0400
[15:43] <smoser> cheez0r, ok. thats good enough then, that they're reasonably accurate. (they need to be within 5 minutes i think)
[15:44] <cheez0r> they're an hour different but the same UTC
[15:44] <smoser> ok. then, yeah, get what hazmat asked for.
[15:44] <cheez0r> date -u shows same on both
[15:44] <smoser> thakns.
[15:45] <cheez0r> accurate to within 10 sec
[15:45] <cheez0r> http://pastebin.ubuntu.com/1028816
[15:45] <hazmat> yeah.. instead of playing smoke and mirrors games theorizing about time, we already know the dates are in sync because we can fetch the data in the first place. so please see what its actually running first..
[15:45] <hazmat> so its a different problem
[15:46] <smoser> hazmat, well, not really.
[15:46] <hazmat> smoser, so that data is correct
[15:46] <smoser> something could have fixed the time since
[15:46] <hazmat> smoser, not if the user-data.txt is in place
[15:46] <smoser> correct.
[15:46] <smoser> then i'm really confused.
[15:47] <hazmat> smoser, did running the datasource overwrite the instance cache?
[15:47] <smoser> no. its just an oauth client
[15:47] <cheez0r> well, the preseed does set an NTP server
[15:47] <smoser> but, sure, cheez0r can you verify the timestamps on that file?
[15:47] <cheez0r> and it is accessible from the bootstrapped node
[15:47] <smoser> actually..
[15:47] <cheez0r> today at 11:06am
[15:48] <cheez0r> that's 41 minutes ago
[15:48] <hazmat> cheez0r, that's roughly when you commissioned it?
[15:48] <smoser> could you :
[15:48] <smoser>  sudo ls -lR /var/lib/cloud/ | pastebinit
[15:48] <cheez0r> yeah, roughly
[15:48] <smoser> cheez0r, but http://pastebin.ubuntu.com/1028750/
[15:48] <smoser> says that stuff happened on Jun 6
[15:48] <smoser> (i htink that was your pastebin of /var/log/cloud-init.log
[15:49] <cheez0r> http://pastebin.ubuntu.com/1028826
[15:49] <hazmat> smoser, ah..
[15:49] <smoser> the reboot already happend.
[15:49] <hazmat> its a recomissioned node maybe?
[15:49] <smoser> -rw-r--r-- 1 root root 13 Jun  7 11:06 config-scripts-per-boot.always
[15:49] <smoser> -rw-r--r-- 1 root root 14 Jun  6 16:09 config-scripts-per-once.once
[15:50] <cheez0r> I can destroy-environment and rebootstrap if you like
[15:50] <hazmat> and it never cleared out the old data when it recomissions?
[15:50] <hazmat> i thought maas would do a fresh install on a recommissioned node
[15:50] <cheez0r> it was never recommissioned.
[15:50] <cheez0r> it was bootstrapped and then rebooted.
[15:50] <cheez0r> I think, at least.
[15:51] <hazmat> ic
[15:54] <smoser> ok
[15:54] <smoser> so here is my hypothesis:
[15:55] <smoser>  * on initial first boot (yesterday) instance got no user-data from maas server, so it came up happily and did nothing.
[15:56] <smoser>  * in order to remove the dependency on the maas MD server on every boot, maas configures cloud-init to only ever look on the first boot
[15:56] <smoser> hm..
[15:56] <smoser> nah
[15:56] <smoser> my hypothesis is flawed.
[15:56] <smoser> i'm confused.
[16:05] <hazmat> smoser, it seems to be me that the issue is basically what your surmise.. effectively cloud-init is being rerun with existing contents of /var/lib/cloud and only parts are executing because of the pre-existing state
[16:05] <hazmat> but specifically parts like putting keys into place and user scripts are not being run
[16:06] <smoser> oh.
[16:06] <smoser> crap
[16:06] <smoser> yeah.
[16:06] <smoser> you're right.
[16:06] <smoser> so the first time it came up
[16:06] <smoser> it got its instance ID
[16:06] <smoser> because that is constant for the node (it is not per "instance")
[16:06] <smoser> then, second boot, it came up and got different user data
[16:06] <smoser> but it had already run that stuff once per the given instance-id
[16:06] <smoser> so it did not run again
[16:07] <smoser> i'm still kind of confused as to why we dont see evidence of cloud-init running *at all* on second boot in http://pastebin.ubuntu.com/1028750/
[16:13] <hazmat> smoser, that is curious
[16:13] <hazmat> it should still have out there just running through the handlers
[16:13] <hazmat> smoser, hmm.. well
[16:13] <hazmat> smoser, what if it never actually rebooted the machine
[16:14] <smoser> and actually, we have evidence that it *did*
[16:14] <smoser> in the timestamps
[16:14] <smoser> http://pastebin.ubuntu.com/1028826/
[16:14] <hazmat> hmm.. yeah
[16:14] <smoser> see '.always' timestamps
[16:14] <smoser> so i'm jsut really confused
[16:16] <smoser> cheez0r, one more thing...
[16:17] <smoser> acutally never mind.
[16:17] <hazmat> cheez0r, is there additional content at bottom of /var/log/cloud-init.log
[16:17] <smoser> i'm almost certain maas gave cloud-init empty user data on first boot
[16:17] <hazmat> that didn't get to the pastebin?
[16:18] <hazmat> smoser, so what's the maas usage here.. does it clean out the disk/reinstall when we destroy/bootstrap again?
[16:18] <hazmat> destroy-environment that is
[16:18] <smoser> hazmat, yes, its a full install.
[16:18] <hazmat> but it doesn't do that when doing the first boot dance
[16:18] <smoser> its slow as molasses
[16:19] <hazmat> so the first time you setup, the nodes have pre-existing state
[16:19] <hazmat> and never fully initialize with cloud-init
[16:19] <hazmat> but subsequent to destroy-environment, bootstrap it gets a clean disk
[16:20] <hazmat> adam_g, ping
[16:20] <smoser> no. there is never any pre-existing state.
[16:21] <hazmat> smoser, so when/why is it doing a reboot?
[16:21] <smoser> that was human
[16:21] <smoser> actually, nothing in maas has the ability to reboot :)
[16:22] <hazmat> ah, ic
[16:25] <hazmat> smoser, so our working hypothesis atm then is that when it first comes up it doesn't get any init metadata
[16:25] <hazmat> but subsequently it does
[16:25] <hazmat> but its too late, since cloud-init already ran with a mostly empty metadata
[16:26] <hazmat> cheez0r, smoser okay.. i think we've exhausted the debug info we can get from that system
[16:27] <hazmat> cheez0r, you should be able to destroy-environment/bootstrap and have things work
[16:27] <smoser> hazmat, yeah.
[16:27] <smoser> thats my hypothesis.
[16:28] <smoser> cloud-init isn't very friendly to maas MD either.
[16:28] <smoser> it doesn't even retry.
[16:28] <smoser> but it sure seems that it must have gotten a 200 response with empty data to me
[16:28] <hazmat> smoser, for juju's purposes, it would be fine for it to re run the user data
[16:28] <hazmat> scripts
[16:29] <smoser> well, thats neither here nor there.
[16:29] <smoser> runcmd by design is run only once per instance.
[16:29] <hazmat> we'd have to make one minor adjustment to the initialize stuff, but then it would be idempotent
[16:35] <cheez0r> sorry guys I had to step away
[16:36] <cheez0r> will read scroll and see what's doing in a bit
[16:40] <smoser> hazmat, even if it was idempotent, you wouldn't want to do it.
[16:40] <smoser> you dont want to apt-get update && apt-get install on every boot
[16:41] <smoser> thats just a silly waste of time
[16:52] <hazmat> smoser, i sent out a reply
[16:52] <hazmat> smoser, i'm pretty sure that the root of this is a timing issue getting metadata to the api within maas
[16:53] <hazmat> and i wonder if it works subsequently because its returning old metadta
[16:53] <hazmat> i'd have to dig into the code of maas to verify though
[16:56] <smoser> ah. you're right
[16:56] <smoser> its race
[16:57] <hazmat> smoser, going through maas code its not clear why
[16:57] <hazmat> it updates the db before it starts the nodes
[16:57] <hazmat> so the userdata is in place
[16:57] <hazmat> are there any maas developers up?
[16:57] <smoser>  "before it starts the nodes"
[16:58] <hazmat> allenap, ping
[16:58] <smoser> flacoste
[16:58] <hazmat> roaksoax, ping
[16:58] <hazmat> the more the merrier
[16:59] <roaksoax> hazmat: pong
[17:02] <hazmat> egads i hate brighttalk for videos.. why should folks have to signup
[17:02] <hazmat> roaksoax, trying to understand why there might be a delay for userdata showing up for a node
[17:02] <hazmat> roaksoax, we're trying to debug the invalid ssh key thing
[17:02] <hazmat> roaksoax, i just sent email to the maas-dev list which outlines our hypothesis
[17:03] <roaksoax> ok
[17:03] <hazmat> roaksoax, looking at the code i'm not sure how that could be the case
[17:04] <hazmat> it looks like the api will update the userdata before launching the node
[17:04] <hazmat> is there some celery task that will try to overwrite that? or i'm misunderstanding this bit..
[17:04] <roaksoax> hazmat: in precise maas we are not using celery
[17:05] <hazmat> roaksoax, gotcha..
[17:05] <hazmat> roaksoax, but does delay availability of metadata sound familiar or possible?
[17:05] <adam_g> hazmat: pong
[17:06] <roaksoax> hazmat: sounds familiar. I've been in a situation on which a node is enlisted, and then a ssh public key gets added and the node should have had the meta-data update so than on deployment it would have imported that ssh key. however, it didn't happen, so I had to enlist the nodes after the key was added.
[17:06] <hazmat> adam_g, there's a question up on stack overflow/askubuntu.. which i think you might be the only who can respond effectively..
[17:06] <hazmat> adam_g, http://askubuntu.com/questions/141552/creating-volume-group-in-nova-volume-juju-charm
[17:06] <roaksoax> hazmat: while it is not the same issue (and this is a MAAS SSH key), it might be related
[17:07] <adam_g> hazmat: ill take a look, thanks
[17:12] <Daviey> hazmat: metadata issues?  check the server apache logs.
[17:13] <hazmat> Daviey, does it print out the full response?
[17:13] <Daviey> The common hunch that has hit most of us so far, is bad localtime offset.. meaning that the oauth token was expired
[17:13] <Daviey> hazmat: you can ask apache nicely to do that.
[17:16] <smoser> Daviey, that might be useful to help prove this.
[17:16] <smoser> adam_g, i'm interested in seeing your answer to that question
[17:18] <hazmat> smoser, Daviey so how does the ntp get setup?
[17:18] <smoser> it does not.
[17:18] <smoser> nothing sets up ntp
[17:18] <smoser> this is not a time related issue.
[17:18] <smoser> i'm virtually certain of that.
[17:19] <smoser> virtually
[17:19] <adam_g> smoser: yeah, me too. :) i need to setup a local provider to test this. you added the loopback stuff to that charm, wasn't there some namespacing / insmod'ing that needed to be done on the host before LVM would work?
[17:20] <smoser> adam_g, you can't do it.
[17:20] <smoser> there is no device name space
[17:20] <smoser> so you have to grant the container access to loopback devices
[17:20] <smoser> its a rathole
[17:20] <adam_g> smoser: right, the loopback as the PV worked IIRC, but i thought it required something to be done on the host too
[17:21] <smoser> http://bazaar.launchpad.net/~smoser/+junk/jstack/view/head:/jstack.txt
[17:22] <adam_g> thanks
[17:22] <smoser> you have to allow the lxc  container to write to /dev/loop-control
[17:22] <smoser> and thats fine... maybe that even makes nova-volume charm work
[17:22] <smoser> but the issue just keeps popping its head up.
[17:23] <smoser> you have to basically turn lxc into chroot if you want it to work. which defeats most of the purpose of lxc
[17:23] <adam_g> right
[17:53] <pmatulis> i just juju-deployed the mysql/wordpress stuff and pulling up wordpress site yields an error.  it's looking for file /etc/wordpress/config-localhost.php but i see that i have /etc/wordpress/config-node-e4115b137b36.local.php .
[18:47] <cheez0r> smoser/hazmat/roaksoax: I'm re-commissioning my nodes- I went ahead and populated a SSH public key for my user in MAAS and I want to see if maybe that's what I'm missing.
[18:58] <adam_g> smoser: jeez, i can't even get loop-control allowed into the container anymore
[18:59] <smoser> you mgiht be fighting app armour now?
[19:00] <adam_g> probably