[02:40] <niemeyer> Night all
[11:53] <_mup_> txzookeeper/session-event-handling r40 committed by kapil.foss@gmail.com
[11:53] <_mup_> allow connection using existing session, test session expiration, additional symbol name translation for exceptions.
[13:06] <_mup_> txzookeeper/session-event-handling r41 committed by kapil.foss@gmail.com
[13:06] <_mup_> pep8isms
[13:17] <kim0> do I really need to type "yes" to the ssh authenticity question
[13:21] <niemeyer> kim0: How do you mean?
[13:22] <kim0> ensemble status
[13:22] <kim0> I get the ssh yes/no prompt
[13:22] <kim0> same for debug-hooks
[13:23] <kim0> niemeyer: that is normal right ?
[13:24] <niemeyer> kim0: You mean the prompt asking you if the fingerprint for the server is valid?
[13:24] <kim0> yes
[13:24] <niemeyer> kim0: Yeah, that's usual when connecting to a new server
[13:24] <kim0> Would be great if Ensemble would get the machine log .. and verify it for me 
[13:24]  * kim0 grabs his wish list bag
[13:25] <kim0> Also if ensemble had a presistent connection to bootstrap node :) and perhaps run locally under a screen session with "watch status" and debug-log ..etc all running
[13:27] <kim0> niemeyer: I think I am see'ing strange behaviour which I hope someoen can help me with since I really want to write this "write a formula doc". I just launched a mysql SU, and a drupal SU (based on an almost empty new formula)
[13:28] <kim0> fired debug-hooks drupal/0 .. works
[13:28] <kim0> add-relation drupal mysql
[13:28] <kim0> I am not getting any new windows in the debug-hooks screen
[13:29] <niemeyer> Hmm
[13:30] <niemeyer> kim0: Thinking
[13:30] <kim0> sure
[13:30] <kim0> I think I saw this yesterday
[13:30] <kim0> when I closed the debug-hooks screen ..
[13:30] <kim0> hooks suddenly started firing
[13:30] <kim0> it's like it was stuck 
[13:30] <niemeyer> kim0: That's normal
[13:30] <kim0> but I always blame myself :)
[13:30] <niemeyer> kim0: Hooks are serially executed
[13:31] <kim0> well there were not opening new windows in screen session
[13:31] <kim0> they should right ?
[13:31] <niemeyer> kim0: You won't get another hook window until you stop the existing one
[13:31] <kim0> there was no existing one .. I was waiting for it
[13:31] <kim0> just like now .. there's only window 0 in screen
[13:32] <niemeyer> kim0: and what's 0?
[13:32] <hazmat> kim0, it was disabled (the ssh fingerprint confirm prompt)
[13:32] <kim0> niemeyer: just a shell
[13:32] <hazmat> but it leaves things open to man in the middle
[13:32] <hazmat> we should pull it down though automatically 
[13:32] <niemeyer> hazmat: Do you have any ideas of what might be going on for kim0?
[13:32] <kim0> hazmat: yeah .. check my wish list :) we could ec2-get-console-output and verify it :)
[13:33] <niemeyer> kim0: WE can do better than that
[13:33] <niemeyer> kim0: We should inject the host key
[13:33] <niemeyer> kim0: That's in our wishlist already :)
[13:33] <hazmat> niemeyer, man in the middle was the primary reason fingerprint checking was renabled yes?
[13:33]  * hazmat reads through log
[13:33] <niemeyer> hazmat: That's right
[13:33] <kim0> niemeyer: cool !
[13:34] <kim0> niemeyer: cloud-init can inject host key already indeed .. that's even better
[13:34] <hazmat> indeed, we should probably make use of that, but we need to store in zk for multi-client access
[13:35] <hazmat> kim0, okay.. so you've got a debug hook session on drupal or mysql?
[13:35] <hazmat> when doing the add relation
[13:35] <kim0> drupal
[13:35] <kim0> hazmat: debug-hooks drupal/0
[13:35] <kim0> hazmat: add-relation drupal mysql
[13:35] <kim0> that's it .. no new window in screen
[13:35] <hazmat> kim0, okay.. so you do debugs for  install & start? 
[13:36] <hazmat> or are you debugging after start?
[13:36] <kim0> the sequence was
[13:36] <kim0> deploy mysql
[13:36] <kim0> deploy drupal
[13:36] <kim0> debug-hooks drupal/0
[13:36] <kim0> debug-log
[13:36] <kim0> add-relation mysql drupal
[13:36] <hazmat> kim0, could you paste your debug-log
[13:36] <kim0> I got a "install" or "start" hook here can't remember .. which I closed
[13:37] <kim0> I expected to get the db-relation-changed one after it .. but didn't
[13:37] <kim0> sure
[13:37] <hazmat> kim0, by closing are you exiting the shell or just closing the window?
[13:37] <hazmat> hmm
[13:37] <hazmat> i don't think i've tested closing the window instead of exiting the shell
[13:37] <kim0> hazmat: ctrl + d
[13:37] <kim0> exit shell
[13:37] <kim0> hazmat: log http://paste.ubuntu.com/616692/
[13:37] <hazmat> hmm that should be fine
[13:38] <niemeyer> hazmat: Should be equivalent
[13:38] <kim0> hazmat: status → http://paste.ubuntu.com/616694/
[13:38] <hazmat> niemeyer, yeah.. but on the close window case, there is still a callback to screen to close the window after the process exit
[13:38] <hazmat> but ctrl +d vs. exit is equiv
[13:39] <niemeyer> hazmat: I don't understand that distinction
[13:39] <hazmat> kim0, odd it seems like the unit hasn't picked up the relation
[13:39] <kim0> :s
[13:39] <kim0> can u connect to the env ?
[13:39] <niemeyer> hazmat: The callback will execute after the shell process exits, right?  Either option would kill it
[13:40] <niemeyer> hazmat: IOW, closing the window also terminates the shell
[13:40] <niemeyer> hazmat: This would happen if the hook wasn't executed
[13:40] <hazmat> niemeyer, right, but we have  another process checking on the shell and then instructing screen to kill the window, which is probably just a noop at that point
[13:40] <hazmat> niemeyer, its unrelated to what kim0 is seeing
[13:41] <niemeyer> hazmat: I mean, the relation not showing up
[13:41] <niemeyer> kim0: Can you please paste ps auxw for that machine?
[13:41] <niemeyer> kim0: The drupal one
[13:41] <kim0> niemeyer: from the debug-hooks screen is ok right ?
[13:41] <niemeyer> hazmat: Hmm.. unless we're running the shell script with -e, and screen exits with 1 because the window wasn't there?
[13:42]  * niemeyer doing guess work
[13:42] <niemeyer> kim0: Yeah
[13:42] <kim0> http://paste.ubuntu.com/616696/
[13:43] <niemeyer> hazmat: "install".. there's an old hook running still
[13:44] <kim0> I hope I didn't do something stupid at the end :)
[13:44] <niemeyer> kim0: I suspect your window 0 has the install hook running
[13:44] <niemeyer> kim0: Can you please paste "env" from that window
[13:45] <kim0> http://paste.ubuntu.com/616698/
[13:45] <hazmat> niemeyer, window 0 is never used for hooks its .. its always a shell
[13:45] <kim0> window 0 is always there 
[13:45] <kim0> yeah 
[13:45] <niemeyer> hazmat: It's trivial to shift windows around
[13:46] <hazmat> niemeyer, but the names are distinct on the windows
[13:46] <kim0> http://paste.ubuntu.com/616699/ is the install hook itself
[13:46] <niemeyer> Ok, but that's not the case either way
[13:46] <niemeyer> Still, we have a hook running
[13:46] <niemeyer> hazmat: Ok
[13:46] <hazmat> the debug stuff names the windows by hook , except window 0 which is named 'shell' afaicr
[13:47]  * kim0 nods
[13:47] <niemeyer> kim0: What's in /tmp/tmpLjxVDG-install
[13:47] <kim0> niemeyer: http://paste.ubuntu.com/616700/
[13:47] <kim0> scary script 
[13:47] <hazmat> so it seems somehow the debug window was ended but the underlying debug process is still alive.
[13:48] <niemeyer> hazmat: Yeah, it's still in the sleep loop
[13:48] <niemeyer> hazmat: Which confirms your initial theory
[13:49] <kim0> I probably closed the window too fast, if you think it needs time to do anything
[13:49] <hazmat> it might be a different signal gets sent besides HUP that needs to be caught here
[13:49] <hazmat> kim0, it shouldn't matter
[13:50] <hazmat> we should never rely on user timing
[13:50] <kim0> yeah I know 
[13:50] <niemeyer> hazmat: TERM, KILL
[13:51] <niemeyer> hazmat: Wait.. the HUP is catching the outside signal
[13:51] <niemeyer> hazmat: That's not the problem.. that script is still running
[13:51] <hazmat> yeah.. its not in the screen process
[13:52] <niemeyer> kim0: One more: /proc/1585/environ
[13:53] <kim0> http://paste.ubuntu.com/616708/
[13:53] <kim0> niemeyer: not sure why it has no newlines
[13:53] <kim0> doh
[13:53] <niemeyer> hazmat: We should monitor it from outside instead of expecting it to do stuff before it dies
[13:53] <kim0> sorry .. pastebinit error
[13:53] <niemeyer> kim0: That's the file format indeed
[13:54] <kim0> niemeyer: http://paste.ubuntu.com/616709/
[13:54] <kim0> this is complete
[13:54] <niemeyer> hazmat: e.g. writing to hook.pid when the process starts
[13:55]  * kim0 probably just uncovered a pastebinit bug
[13:55] <hazmat> niemeyer, yeah.. and then just doing something like kill -0 `cat hook.pid`  for the sleep condition
[13:55] <niemeyer> hazmat: RIght
[13:56]  * hazmat files a bug
[13:56] <niemeyer> hazmat: Another handy issue for a brain breaker.. will paste that conversation in a bug.
[13:56] <niemeyer> hazmat: Oh, ok :)
[13:56] <niemeyer> hazmat: Please paste the log for context
[13:56] <niemeyer> hazmat: Thanks
[13:56] <niemeyer> kim0: Alright.. we know what's wrong
[13:56] <kim0> great :)
[13:56] <niemeyer> kim0: For fixing your problem right now,
[13:57] <niemeyer> kim0: kill 1585
[13:57] <kim0> got it 
[13:57] <kim0> thanks
[13:57] <niemeyer> kim0: np
[13:57] <kim0> wonder why no one else is hitting this
[13:57] <niemeyer> kim0: Thanks a lot for your help uncovering the bug
[13:57] <niemeyer> kim0: It's the way the debug-hook window was closed
[13:58] <kim0> ah so you close it clike ctrl-a c
[13:58] <kim0> ok
[13:58] <kim0> not c .. whatever closes windows :)
[14:00] <kim0> ew
[14:00] <kim0> ok probably hitting a new one
[14:01] <kim0> I killed the process .. got the window for db-relation-changed
[14:01] <kim0> relation-get inside it says →  No ENSEMBLE_AGENT_SOCKET/-s option found
[14:01] <kim0> env dump http://paste.ubuntu.com/616713/
[14:02] <kim0> hazmat: could you please as well
[14:03]  * hazmat looks
[14:03] <hazmat> kim0, that's the env from window 0 ?
[14:03] <hazmat> or the debug window?
[14:03] <kim0> hazmat: no win 1
[14:03] <kim0> the db-relation-changed window
[14:03] <kim0> db-relation-joined actually
[14:04] <hazmat> it doesn't look like it has the debug environment variables sourced
[14:04] <niemeyer> That's the shell isn't it?
[14:05] <niemeyer> SUDO_COMMAND=/usr/bin/byobu -xRS drupal-0-hook-debug -t shell
[14:06] <niemeyer> kim0: I think the paste is bogus
[14:07] <niemeyer> kim0: Or maybe I just misunderstand what the variables mean
[14:07] <kim0> I can repaste manually
[14:07] <kim0> anything to look for ?
[14:08] <niemeyer> kim0: Just thinking how to get the pid for the parent shell
[14:08] <kim0> ps -elf | grep $$ ?
[14:08] <kim0> it'd be listed in ppid field
[14:08] <niemeyer> kim0: echo $BASHPID
[14:09] <kim0> 9229
[14:09] <niemeyer> kim0: echo $PPID
[14:09] <kim0> 935
[14:09]  * kim0 feels like a shell
[14:09] <niemeyer> Ok, cool
[14:09] <niemeyer> kim0: Hehe :-)
[14:09] <kim0> :)
[14:10] <niemeyer> kim0: Yeah, looks like a failure in source it indeed
[14:10] <niemeyer> sourcing
[14:10] <kim0> we don't log those steps somewhere ?
[14:10] <niemeyer> Can't imagine how that could happen, though, even if we killed the process
[14:11] <niemeyer> kim0: Nope, this is the bootstrapping of debugging itself.. we might indeed have to log it in the future
[14:11] <niemeyer> kim0: Can you please paste the new process list so we can reach the new hook
[14:12] <kim0> http://paste.ubuntu.com/616722/
[14:13] <kim0> guess no one uses debug-hooks really :)
[14:13]  * kim0 afk for 5 mins
[14:15] <niemeyer> kim0: We do, but we don't generally kill processes in the middle
[14:15] <hazmat> kim0, just me ;-)
[14:15] <hazmat> need to run a quick errand, back in a bit
[14:23] <niemeyer> kim0: Let me know when you're back.. we can follow on a bit if you're interested
[14:24] <kim0> niemeyer: back
[14:25] <niemeyer> kim0: Ok, let's see /tmp/tmpR1UhMY-db-relation-joined then
[14:25] <kim0> niemeyer: http://paste.ubuntu.com/616737/
[14:26] <niemeyer> kim0: Ok, please figure ENSEMBLE_DEBUG from /proc/9187/environ, and list $ENSEMBLE_DEBUG/env.sh
[14:28] <kim0> niemeyer: can't see ENSEMBLE_DEBUG .. paste http://paste.ubuntu.com/616742/
[14:30] <niemeyer> kim0: Hmm.. I guess it wasn't exported
[14:31]  * niemeyer htinks
[14:31] <niemeyer> thinks
[14:34] <niemeyer> kim0: Ok, let's try to find by force: cd /tmp && find -name env.sh
[14:34] <niemeyer> kim0: Will probably see more than one
[14:35] <kim0> niemeyer: 3 of em
[14:35] <niemeyer> kim0: Ok, let's fine the one with db-relation-joined
[14:36] <kim0> http://paste.ubuntu.com/616746/
[14:36] <kim0> http://paste.ubuntu.com/616747/
[14:36] <kim0> http://paste.ubuntu.com/616748/
[14:37] <kim0> niemeyer: I tried grep'ing .. doesn't have joined in them
[14:37] <niemeyer> kim0: Well, that's likely the issue then.. let me check
[14:39] <niemeyer> kim0: That's weird..
[14:39] <niemeyer> kim0: All of them have ENSEMBLE_AGENT_SOCKET
[14:39] <kim0> maybe none of them is sourced
[14:40] <niemeyer> kim0: This is the right one for db-relation-joined: http://paste.ubuntu.com/616747/
[14:41] <niemeyer> kim0: Can you please paste the hook.sh file living in the same directory?
[14:41] <niemeyer> kim0: Well, that's the thing
[14:41] <niemeyer> kim0: There's no easy way for bash to be executed without this being sourced
[14:41] <kim0> niemeyer: http://paste.ubuntu.com/616754/
[14:41] <niemeyer> kim0: As you can see..
[14:42] <niemeyer> kim0: That prior paste is from /tmp/tmp.Sj2ilkd53B/env.sh, right?
[14:42] <kim0> double checking 
[14:43] <kim0> should be yes
[14:43] <niemeyer> kim0: Ok, so there's really no way for bash to be executed without it being sourced, which is awkward..
[14:44] <niemeyer> kim0: You got a bash, without the env variables, but the only way for that bash to have come up, was through the sourcing line
[14:44] <niemeyer> Hmmm
[14:44] <kim0> thinking as well
[14:46] <kim0> niemeyer: parent process for the Window1 shell, is byobu, not the hook.sh ?
[14:47] <niemeyer> kim0: Yeah, that's strange
[14:47] <kim0> hook.sh should still be running if it fired us right
[14:47] <niemeyer> kim0: Indeed
[14:47] <niemeyer> kim0: This would also justify the previous issue as well, interestingly
[14:48] <niemeyer> kim0: Hmm
[14:48] <niemeyer> kim0: Let me do a local test, hold on
[14:49] <kim0> niemeyer: pstree -p .. if that's helpful http://paste.ubuntu.com/616757/
[14:51] <hazmat>   nice.. half way to my walking desk finished
[14:51] <kim0> hazmat: walking desk ?
[14:52] <hazmat> kim0, treadmill with keyboard tray and monitor stand
[14:52] <kim0> oh that's new to me ... sounds cool indeed :)
[14:52] <hazmat> kim0,  http://opinionator.blogs.nytimes.com/2010/02/23/stand-up-while-you-read-this/ ... http://www.nytimes.com/2008/09/18/health/nutrition/18fitness.html
[14:53] <hazmat> kim0, lots of a good evidence for the benefits vs sitting in a chair all day
[14:53] <kim0> yeah that's intuitive
[14:53] <hazmat> the only problem is that the treadmill weighs 250 pounds.. just carried it up the stairs.. so most of the way done on the setup
[14:54]  * hazmat catches up the irc log to get up to speed on debug-hooks
[14:54] <kim0> hazmat: congrats :) send pics to warthogs :)
[14:56] <kim0> niemeyer: I'm going for a late lunch .. I have inserted your ssh key into ec2-67-202-22-46.compute-1.amazonaws.com (drupal/0) should you want to login to it
[14:56] <niemeyer> kim0: Cheers
[14:56] <niemeyer> kim0: Will check it out
[14:56] <kim0> cool
[15:16] <niemeyer> hazmat: I suspect both issues likely boil down to the way the shell is being executed
[15:16] <niemeyer> hazmat: It's doing a two-step execution, and it's not entirely clear why
[15:16] <hazmat> niemeyer, what's strange is that it works sometimes
[15:16] <hazmat> writing up a reply to tom for his questions on list
[15:16] <niemeyer> hazmat: It first creates a window, which spawns an outside shell by screen itself
[15:17] <niemeyer> hazmat: then overwrites a shell onto it
[15:17] <niemeyer> hazmat: I suspect we may be hitting some race within screen itself
[15:17] <niemeyer> hazmat: Is there a reason why you coded it like that, or is it safe to change?
[15:18] <hazmat> niemeyer, its safe to change, i thought that creation was per your suggestion
[15:19] <niemeyer> hazmat: It's unrelated to my suggestion
[15:19] <hazmat> the openstack nova screen setup does a similiar setup
[15:19] <niemeyer> hazmat: It's executing two shells for no reason
[15:19] <niemeyer> hazmat: Rather than only hook.sh
[15:19] <hazmat> right
[15:19] <hazmat> so instead of creating the window it should just exec in the named window?
[15:20] <niemeyer> hazmat: The "screen" command of screen takes an executable as an argument
[15:20] <niemeyer> hazmat: -X screen -t .. hook.sh
[15:20] <niemeyer> hazmat: The shell I'm seeing in kim0's is the shell from the first screen command, not the one from the exec
[15:21] <hazmat> niemeyer, that sounds good to me 
[16:01] <kim0> hope that bug got caught
[16:03] <niemeyer> kim0: Sounds like so..
[16:03] <niemeyer> kim0: Will try something this afternoon
[16:03] <kim0> awesome
[16:03] <niemeyer> kim0: Thanks for all your help
[16:03] <kim0> All thanks to you :)
[16:15]  * hazmat lunches
[16:20]  * niemeyer too
[17:42] <kim0> hmm
[17:42] <kim0>         state: install_error
[17:43] <kim0> if service is having install_error .. any facility to figure out what went wrong
[17:46] <kim0> ok I could figure it out
[17:51] <niemeyer> kim0: I was kind of expecting that..
[17:51] <niemeyer> kim0: Is that the service we were debugging?
[17:51] <kim0> niemeyer: I shutdown the env and started a fresh
[17:51] <niemeyer> kim0: Oh, ok
[17:51] <niemeyer> kim0: ensemble log, ensemble debug-hook, etc
[17:52] <kim0> niemeyer: does the log not provide hooks stdout any more ?
[17:52] <kim0> is it supressed by default
[17:52] <niemeyer> kim0: It does, but you have to turn it on earlier
[17:52] <niemeyer> We should really have a feature where it logs by default
[17:52] <niemeyer> and rotates them out after a while
[17:52] <niemeyer> kim0: Otherwise, the best bet is logging in the machine and checking logs
[17:53] <niemeyer> kim0: You should be able to retry, though
[17:53] <niemeyer> kim0: Run debug-hook
[17:53] <niemeyer> kim0: and then run ensemble resolved with the --retry argument
[17:59] <kim0> niemeyer: hmm .. service unit is stuck somehow
[17:59] <kim0> here is status http://paste.ubuntu.com/616891/
[17:59] <kim0> I hope you don't mind all the questions 
[17:59] <niemeyer> kim0: Not at all.. I'm actually going to fix some of the issues you found today
[18:00] <kim0> yeah .. the basic workflow should be smoother ..
[18:01] <kim0> so, I had an error in a hook, now I have no idea how to nudge things and get them back
[18:01] <niemeyer> kim0: resolved, as I mentioned above
[18:02] <kim0> bin/ensemble resolved --retry drupal/1
[18:02] <kim0> tried this
[18:02] <niemeyer> kim0: Ok, what happened next?
[18:02] <kim0> debug-log only shows mysql related messages
[18:02] <kim0> and status is the same
[18:03] <niemeyer> kim0: Have you actually fixed the original reason why your hook failed?
[18:03] <kim0> niemeyer: yes I did
[18:04] <niemeyer> kim0: Why was it failing before?
[18:04] <kim0> niemeyer: ssh'ed into the machine .. and ran it 
[18:04] <kim0> niemeyer: some cd to a non existent directoyy
[18:04] <kim0> niemeyer: I ran the script inside the instance .. it is fine now
[18:04] <niemeyer> kim0: Is it returning successfully now? (exit status 0)
[18:04] <kim0> checking again
[18:06] <kim0> can't really check again accurately
[18:06] <kim0> ensemble-log giving errors because it's running outside the environment
[18:06] <kim0> apt-get saying packages already installed ..etc
[18:06] <kim0> but yeah it seems correct
[18:07] <niemeyer> kim0: So how did you run it befor?
[18:07] <niemeyer> e
[18:07] <kim0> niemeyer: in a debug-hooks session
[18:07] <kim0> /var/lib/ensemble/units/drupal-1/formula/hooks/install
[18:07] <niemeyer> kim0: If apt-get install is failing, it won't work as a hook either
[18:08] <kim0> it's just saying .. the package is already installed
[18:08] <kim0> the script is fine trust me :)
[18:08] <niemeyer> kim0: :)
[18:09] <niemeyer> kim0: If you have already executed it by hand, you can just say "resolved"
[18:09] <niemeyer> kim0: Without --retry
[18:09] <niemeyer> kim0: But I suspect that fuzzing may have triggered something else.. that "state: null" isn't really great 
[18:09] <kim0> yeah
[18:09] <niemeyer> kim0: Try the resolved trick
[18:10] <kim0> did that
[18:10] <niemeyer> kim0: This just states to Ensemble "I have resolved the problem"
[18:10] <kim0> still null
[18:11] <niemeyer> kim0: Ok, try redeploying the fixed formula then
[18:11] <niemeyer> kim0: We'll have to investigate a bit that scenario
[18:11] <niemeyer> kim0: (--retry with a broken script, etc)
[18:11] <kim0> how do I redeploy
[18:12] <niemeyer> kim0: But first, I'll fix the debug-hook stuff we debugged this morning
[18:12] <kim0> ok np ..
[18:12] <niemeyer> kim0: Same thing you did earlier?
[18:12] <kim0> I'll pick this up later
[18:12] <kim0> fresh environment .. ok
[18:13] <kim0> In the tutorial .. I'm assuming the formula is going to have errors
[18:13] <niemeyer> kim0: Nope
[18:13] <niemeyer> kim0: Just remove the unit
[18:13] <niemeyer> kim0: and add it again
[18:13] <kim0> ok
[18:13] <niemeyer> kim0: Yeah, that's a good thing
[18:13] <niemeyer> kim0: You can also upgrade the formula in general
[18:13] <kim0> says, error state cannot be upgraded
[18:13] <niemeyer> kim0: This is what we're preparing Ensemble to be able to do
[18:14] <kim0> or so
[18:14] <niemeyer> kim0: Error?  Change, upgrade..
[18:14] <kim0> says like, formula is in error state .. so it cannot be upgraded
[18:14] <kim0> I lost the exact message though
[18:15] <niemeyer> kim0: Yeah, you have to resolve it first..
[18:15] <kim0> Ok .. I'll need to try this again (recovering from a formula with errors )
[18:15] <kim0> and will discuss again with you
[18:15] <niemeyer> kim0: Because otherwise we can't assume a known state
[18:15] <niemeyer> kim0: Imagine an install hook failed in the middle
[18:16] <kim0> that's what it was :D
[18:16] <kim0> so I just need to know the recommended recovery steps
[18:16] <niemeyer> kim0: Right.. simply upgrading won't really yield a working system necessarily 
[18:16] <niemeyer> kim0: Because half of it executed
[18:16] <kim0> how to know what went wrong .. release a fix .. start recovering
[18:17] <niemeyer> kim0: What we want is this:
[18:17] <niemeyer> kim0: install failed: check logs, fix it or retry, upgrade if wanted
[18:18] <niemeyer> kim0: With debug-hook if desired, to understand what's going on
[18:18] <kim0> then use "resolved" right ?
[18:18] <niemeyer> kim0: Right, after the "fix it or retry"
[18:18] <niemeyer> kim0: Or during it actually.. retry is done with resolved
[18:19] <kim0> is there a way to upload the fixed new hook ?
[18:20] <niemeyer> kim0: upgrade-formula
[18:20] <kim0> which refuses to work in error state ?
[18:20] <niemeyer> kim0: Yes, which is the right thing to do
[18:21] <kim0> ok .. so I'd still need to upload my fixed hook
[18:21] <niemeyer> kim0: The error state must be acknowledged by the administrator
[18:21] <niemeyer> kim0: and if an install hook blows up in the middle, upgrading a new install hook with the error fixed won't necessarily make it work
[18:22] <niemeyer> kim0: mkdir foo, run twice, breaks
[18:22] <kim0> so our recommended approach is kill the instance, and start a fresh machine ?
 kim0: What we want is this:
 kim0: install failed: check logs, fix it or retry, upgrade if wanted
 kim0: With debug-hook if desired, to understand what's going on
[18:22] <kim0> the "fixing and trying" cycle is what I'm trying to grasp
[18:23] <niemeyer> kim0: Fixing it means fixing the actual problem within the formula..
[18:23] <kim0> and what about the trying
[18:23] <niemeyer> kim0: Sorry, within the service unit
[18:23] <kim0> on a new instance ?
[18:23] <niemeyer> kim0: If there's nothing to do.. you just run "resolved"
[18:23] <kim0> ah
[18:24] <kim0> so I fix the problem manually .. then run resolved
[18:24] <niemeyer> kim0: Yes, that's one way to do it
[18:24] <niemeyer> kim0: The other way to do it is to code an idempotent hook
[18:24] <niemeyer> kim0: This enables you to run resolved and upgrade a new formula
[18:24] <kim0> and use, resolved --retry
[18:24] <kim0> right ?
[18:25] <niemeyer> kim0: If that's what you want to do
[18:25] <niemeyer> kim0: The problem is really quite simple
[18:25] <niemeyer> kim0: When a hook fails, Ensemble will stop running hooks until the admin acknowledges it
[18:26] <niemeyer> kim0: If you run ensemble resolved, it forgets about the old hook, and continues execution
[18:26] <niemeyer> kim0: If you want to run the old hook again before continuing, you run resolved --retry
[18:26] <niemeyer> kim0: and that's it
[18:26] <niemeyer> kim0: Whether you upgrade the formula, change the hook in place to try things out, run debug-hook, run ensemble log, etc, is really up to you
[18:27] <niemeyer> kim0: We're coding tools to give you everything you need to understand how things are behaving
[18:27] <niemeyer> kim0: and fixing them
[18:27] <kim0> I guess the workflow I had in mind is .. install hook blows up .. I ssh into machine .. figure out why it blew up .. then I fix the hook *locally* .. then somehow ensemble would upload the new version and run that
[18:27] <kim0> but the workflow you explained is perfectly fine
[18:27] <kim0> thanks
[18:28] <niemeyer> kim0: Well, maybe there are further options we can develop around this
[18:29] <kim0> Yeah, recovering from broken formulas should be smooth .. since people are going to make all sorts of mistakes :D
[18:29] <kim0> niemeyer: thanks for all the explanation and patience :)
[18:29] <niemeyer> kim0: No worries
[18:29] <niemeyer> kim0: Good to talk about that stuff.. it's important to learn how other people feel about the system too
[18:30]  * kim0 nods
[18:45]  * SpamapS just discovered resolved yesterday btw
[18:45] <SpamapS> would have saved me quite a few remove-relation/add-relation cycles ;)
[18:47] <niemeyer> SpamapS: Sorry about that :-)
[18:49] <SpamapS> Yes shame on you guys for making the thing work welle nough to survive hundreds and hundreds of remove/adds.
[18:50] <_mup_> ensemble/close-zk-port r240 committed by gustavo@niemeyer.net
[18:50] <_mup_> Do not open zk port on AWS firewall.
[18:51] <niemeyer> SpamapS: Yeah, good thing we are in a polishing cycle.. :)
[18:58] <_mup_> Bug #791973 was filed: Ensemble shouldn't open the EC2 firewall for zk access <Ensemble:Confirmed> < https://launchpad.net/bugs/791973 >
[19:11] <niemeyer> hazmat:   sudo -u ubuntu screen -dmS $SESSION_NAME
[19:11] <niemeyer> hazmat: -u ubuntu?  Shouldn't this be root?
[19:12] <niemeyer> Also, this command seems to create a new session, irrespective of whether there's an existing one with the same name
[19:14] <hazmat> its connecting to the ubuntu user's screen session from the login shell
[19:15] <niemeyer> hazmat: Yes, but isn't the ubuntu user connecting to root's screen using sudo?
[19:15] <hazmat> niemeyer, no its not that requires a setuid binary screen program
[19:15] <niemeyer> hazmat: Using sudo?
[19:16] <hazmat> niemeyer, that would be fine
[19:16] <hazmat> i thought you where referring to multi-user screen
[19:16] <niemeyer> hazmat: ssh -t ubuntu@%s sudo byobu -xRS %s-hook-debug -t shell
[19:16] <hazmat> sounds good
[19:16] <niemeyer> hazmat: The debug-hook command connects a session from root
[19:17] <niemeyer> hazmat: Which means doing -u ubuntu would put the session in another place 
[19:17] <niemeyer> Ok, I'll fix that as well
[19:24] <niemeyer> So there's apparently no way to start a screen session unless it doesn't yet exist.. :(
[19:33] <niemeyer> Ok, I'll give tmux a try..
[19:38] <niemeyer> smoser!
[19:38] <niemeyer> smoser: Was just reminding of you while hacking a shell script
[19:39] <niemeyer> smoser: exec &> foo.. that's an awesome trick I learned with you recently :)
[19:39] <smoser> that is bash only
[19:39] <smoser> exec > foo 2>&1
[19:40] <smoser> would be the posix shell equivalent
[19:40] <niemeyer> smoser: Nice
[19:40] <niemeyer> smoser: Will use that
[19:53] <niemeyer> hazmat: If  a sigspec is EXIT (0) the command arg is executed on exit from the shell.
[19:53] <niemeyer> hazmat: Looks handy
[19:53] <niemeyer> hazmat: Sorry, that was a quote
[19:53] <niemeyer> Wonder if that will *always* execute
[19:54]  * niemeyer tests
[19:55] <niemeyer> Yeah, works
[20:29] <_mup_> ensemble/expose-hook-commands r242 committed by jim.baker@canonical.com
[20:29] <_mup_> Implemented open-port, close-port hook commands
[20:29] <jimbaker>  biab
[20:42] <_mup_> ensemble/debug-hook-fixes r240 committed by gustavo@niemeyer.net
[20:42] <_mup_> Fixed several issues in the debug-hook shell payload, and replaced
[20:42] <_mup_> screen with tmux to handle concurrent session creation without races.
[20:42] <_mup_> Let's see if this works in real test cases now.
[20:45] <niemeyer> What's the proper format to put ensemble-branch in the environment's file again?  I recall we had a weird issue with one of the url forms, and I killed my commented option by mistake.
[20:46] <niemeyer> jimbaker, hazmat, bcsaller: ?
[20:46] <bcsaller> niemeyer: ensemble-branch: lp:~bcsaller/ensemble/config-set-lifecycle
[20:47] <niemeyer> bcsaller: Does that work?  Nice, I think that was one of the formats which was not working
[20:47] <bcsaller> its been working for me
[20:47] <niemeyer> bcsaller: Super, thanks
[20:48] <niemeyer> bcsaller: It was a crazy issue we had in the sprint..  Launchpad was just barfing on https or http or lp, can't recall which one..
[20:50] <SpamapS> niemeyer: I'm thinking of uploading principia-tools to the ensemble ppa... thoughts before I do that?
[20:50] <niemeyer> SpamapS: No, it sounds good
[20:51] <niemeyer> 2011-06-02 16:50:32,617 ERROR ProviderError: Interaction with machine provider failed: ConnectionTimeoutException('could not connect before timeout after 1 retries',)
[20:51] <niemeyer> Feels like a regression on the waiting behavior
[21:23] <niemeyer> kim0: You were right.. debug-hook needed some good debugging by itself
[21:40] <_mup_> ensemble/debug-hook-fixes r241 committed by gustavo@niemeyer.net
[21:40] <_mup_> - Add additional hook names to valid list on debug-hook.
[21:40] <_mup_> - Fix debug-hook shell template.
[22:06] <_mup_> Bug #792071 was filed: relation-get blowing up badly during install hook <Ensemble:New> < https://launchpad.net/bugs/792071 >
[22:16] <niemeyer> Observing debug-hooks actually working is beautiful!
[22:17] <niemeyer> Do changes on one side, exit.. boom! The other side pops up!
[22:23]  * niemeyer ponders about how to execute a script by piping it on stdin
[22:24] <niemeyer> Hah, /bin/bash -
[22:58] <kim0> niemeyer: great news!
[22:59] <kim0> so it's working as it should now .. woohoo
[22:59] <kim0> niemeyer: Would you think it'd be better for debug-hooks to open the hook code in vim in the new screen window, instead of dropping me in a blank shell ?
[23:03] <niemeyer> kim0: Hmm
[23:03] <niemeyer> kim0: No, probably not.. this would likely pass the wrong idea about what you can do within the debug hooks session
[23:04] <niemeyer> kim0: It's useful to look at the script, but you're free to do pretty much anything
[23:04] <kim0> hmm
[23:04] <kim0> to debug the hook .. I had to figure out where it was
[23:04] <kim0> and for that I used "find /"
[23:04] <kim0> which sux ofc
[23:05] <niemeyer> kim0: Yeah, I noticed this as well
[23:05] <bcsaller> it should cd you into the formula directory I think
[23:05] <niemeyer> kim0: That's a failure in the hook execution logic
[23:05] <bcsaller> I think we have a bug for that 
[23:05] <niemeyer> kim0: Hooks should always be executed within the formula directory
[23:05] <niemeyer> kim0: When debugging or not
[23:05] <niemeyer> bcsaller: Hey!
[23:06] <bcsaller> :)
[23:06] <kim0> bcsaller: the cd into hooks dir, sounds like a good compromise .. is it planned
[23:06] <bcsaller> kapil and I agreed it was a good idea when we talked about it 
[23:07] <niemeyer> Agreed, that's important even for plain hooks
[23:07] <niemeyer> I mean, when executing the real ones rather than debugging
[23:11] <_mup_> ensemble/expose-hook-commands r243 committed by jim.baker@canonical.com
[23:11] <_mup_> Testing on args and logging for port commands
[23:21] <niemeyer> Woohay
[23:31] <kim0> expose merged ?
[23:42] <niemeyer> kim0: Not yet
[23:42] <niemeyer> jimbaker: Is hard at work on it
[23:46] <_mup_> ensemble/debug-hook-fixes r242 committed by gustavo@niemeyer.net
[23:46] <_mup_> - Use the real unit name as the session, since tmux is happy with that.
[23:46] <_mup_> - Send an initialization script with a simple tmux.conf when firing
[23:46] <_mup_>   ssh through the debug-hooks command.  Use screen shortcuts since
[23:46] <_mup_>   people will be happier with that.
[23:46] <_mup_> [WIP]
[23:47] <niemeyer> Okay, time to step out and do something else..
[23:47] <niemeyer> See y'all tomorrow!
[23:54] <jimbaker> kim0, the expose work is getting close - hook commands are almost ready for review, i have most of the remaining provisioning work already done from a spike branch, and the ec2 group authorization model maps readily against what is necessary for a provider
[23:55] <kim0> jimbaker: sounds like great news .. rock on :)
[23:56] <SpamapS> Hmm.. I've been thinking about proposing a specification for 'machine-info-get' .. Teyo from Puppet suggested that they'd be interested in collaborating on a library to collect info about the current machine.. and it would be really useful to have this...
[23:57] <SpamapS> So I'm thinking the machine agent should have some of this information available.. some from the machine provider, some from this library
[23:58] <SpamapS> That would solve the 'Whats my private IP? Whats my public IP?' case.
[23:58] <SpamapS> How would I propose such a spec?