=== _mup__ is now known as _mup_ | ||
niemeyer | Night all | 02:40 |
---|---|---|
_mup_ | txzookeeper/session-event-handling r40 committed by kapil.foss@gmail.com | 11:53 |
_mup_ | allow connection using existing session, test session expiration, additional symbol name translation for exceptions. | 11:53 |
_mup_ | txzookeeper/session-event-handling r41 committed by kapil.foss@gmail.com | 13:06 |
_mup_ | pep8isms | 13:06 |
kim0 | do I really need to type "yes" to the ssh authenticity question | 13:17 |
niemeyer | kim0: How do you mean? | 13:21 |
kim0 | ensemble status | 13:22 |
kim0 | I get the ssh yes/no prompt | 13:22 |
kim0 | same for debug-hooks | 13:22 |
kim0 | niemeyer: that is normal right ? | 13:23 |
niemeyer | kim0: You mean the prompt asking you if the fingerprint for the server is valid? | 13:24 |
kim0 | yes | 13:24 |
niemeyer | kim0: Yeah, that's usual when connecting to a new server | 13:24 |
kim0 | Would be great if Ensemble would get the machine log .. and verify it for me | 13:24 |
* kim0 grabs his wish list bag | 13:24 | |
kim0 | Also if ensemble had a presistent connection to bootstrap node :) and perhaps run locally under a screen session with "watch status" and debug-log ..etc all running | 13:25 |
kim0 | niemeyer: I think I am see'ing strange behaviour which I hope someoen can help me with since I really want to write this "write a formula doc". I just launched a mysql SU, and a drupal SU (based on an almost empty new formula) | 13:27 |
kim0 | fired debug-hooks drupal/0 .. works | 13:28 |
kim0 | add-relation drupal mysql | 13:28 |
kim0 | I am not getting any new windows in the debug-hooks screen | 13:28 |
niemeyer | Hmm | 13:29 |
niemeyer | kim0: Thinking | 13:30 |
kim0 | sure | 13:30 |
kim0 | I think I saw this yesterday | 13:30 |
kim0 | when I closed the debug-hooks screen .. | 13:30 |
kim0 | hooks suddenly started firing | 13:30 |
kim0 | it's like it was stuck | 13:30 |
niemeyer | kim0: That's normal | 13:30 |
kim0 | but I always blame myself :) | 13:30 |
niemeyer | kim0: Hooks are serially executed | 13:30 |
kim0 | well there were not opening new windows in screen session | 13:31 |
kim0 | they should right ? | 13:31 |
niemeyer | kim0: You won't get another hook window until you stop the existing one | 13:31 |
kim0 | there was no existing one .. I was waiting for it | 13:31 |
kim0 | just like now .. there's only window 0 in screen | 13:31 |
niemeyer | kim0: and what's 0? | 13:32 |
hazmat | kim0, it was disabled (the ssh fingerprint confirm prompt) | 13:32 |
kim0 | niemeyer: just a shell | 13:32 |
hazmat | but it leaves things open to man in the middle | 13:32 |
hazmat | we should pull it down though automatically | 13:32 |
niemeyer | hazmat: Do you have any ideas of what might be going on for kim0? | 13:32 |
kim0 | hazmat: yeah .. check my wish list :) we could ec2-get-console-output and verify it :) | 13:32 |
niemeyer | kim0: WE can do better than that | 13:33 |
niemeyer | kim0: We should inject the host key | 13:33 |
niemeyer | kim0: That's in our wishlist already :) | 13:33 |
hazmat | niemeyer, man in the middle was the primary reason fingerprint checking was renabled yes? | 13:33 |
* hazmat reads through log | 13:33 | |
niemeyer | hazmat: That's right | 13:33 |
kim0 | niemeyer: cool ! | 13:33 |
kim0 | niemeyer: cloud-init can inject host key already indeed .. that's even better | 13:34 |
hazmat | indeed, we should probably make use of that, but we need to store in zk for multi-client access | 13:34 |
hazmat | kim0, okay.. so you've got a debug hook session on drupal or mysql? | 13:35 |
hazmat | when doing the add relation | 13:35 |
kim0 | drupal | 13:35 |
kim0 | hazmat: debug-hooks drupal/0 | 13:35 |
kim0 | hazmat: add-relation drupal mysql | 13:35 |
kim0 | that's it .. no new window in screen | 13:35 |
hazmat | kim0, okay.. so you do debugs for install & start? | 13:35 |
hazmat | or are you debugging after start? | 13:36 |
kim0 | the sequence was | 13:36 |
kim0 | deploy mysql | 13:36 |
kim0 | deploy drupal | 13:36 |
kim0 | debug-hooks drupal/0 | 13:36 |
kim0 | debug-log | 13:36 |
kim0 | add-relation mysql drupal | 13:36 |
hazmat | kim0, could you paste your debug-log | 13:36 |
kim0 | I got a "install" or "start" hook here can't remember .. which I closed | 13:36 |
kim0 | I expected to get the db-relation-changed one after it .. but didn't | 13:37 |
kim0 | sure | 13:37 |
hazmat | kim0, by closing are you exiting the shell or just closing the window? | 13:37 |
hazmat | hmm | 13:37 |
hazmat | i don't think i've tested closing the window instead of exiting the shell | 13:37 |
kim0 | hazmat: ctrl + d | 13:37 |
kim0 | exit shell | 13:37 |
kim0 | hazmat: log http://paste.ubuntu.com/616692/ | 13:37 |
hazmat | hmm that should be fine | 13:37 |
niemeyer | hazmat: Should be equivalent | 13:38 |
kim0 | hazmat: status → http://paste.ubuntu.com/616694/ | 13:38 |
hazmat | niemeyer, yeah.. but on the close window case, there is still a callback to screen to close the window after the process exit | 13:38 |
hazmat | but ctrl +d vs. exit is equiv | 13:38 |
niemeyer | hazmat: I don't understand that distinction | 13:39 |
hazmat | kim0, odd it seems like the unit hasn't picked up the relation | 13:39 |
kim0 | :s | 13:39 |
kim0 | can u connect to the env ? | 13:39 |
niemeyer | hazmat: The callback will execute after the shell process exits, right? Either option would kill it | 13:39 |
niemeyer | hazmat: IOW, closing the window also terminates the shell | 13:40 |
niemeyer | hazmat: This would happen if the hook wasn't executed | 13:40 |
hazmat | niemeyer, right, but we have another process checking on the shell and then instructing screen to kill the window, which is probably just a noop at that point | 13:40 |
hazmat | niemeyer, its unrelated to what kim0 is seeing | 13:40 |
niemeyer | hazmat: I mean, the relation not showing up | 13:41 |
niemeyer | kim0: Can you please paste ps auxw for that machine? | 13:41 |
niemeyer | kim0: The drupal one | 13:41 |
kim0 | niemeyer: from the debug-hooks screen is ok right ? | 13:41 |
niemeyer | hazmat: Hmm.. unless we're running the shell script with -e, and screen exits with 1 because the window wasn't there? | 13:41 |
* niemeyer doing guess work | 13:42 | |
niemeyer | kim0: Yeah | 13:42 |
kim0 | http://paste.ubuntu.com/616696/ | 13:42 |
niemeyer | hazmat: "install".. there's an old hook running still | 13:43 |
kim0 | I hope I didn't do something stupid at the end :) | 13:44 |
niemeyer | kim0: I suspect your window 0 has the install hook running | 13:44 |
niemeyer | kim0: Can you please paste "env" from that window | 13:44 |
kim0 | http://paste.ubuntu.com/616698/ | 13:45 |
hazmat | niemeyer, window 0 is never used for hooks its .. its always a shell | 13:45 |
kim0 | window 0 is always there | 13:45 |
kim0 | yeah | 13:45 |
niemeyer | hazmat: It's trivial to shift windows around | 13:45 |
hazmat | niemeyer, but the names are distinct on the windows | 13:46 |
kim0 | http://paste.ubuntu.com/616699/ is the install hook itself | 13:46 |
niemeyer | Ok, but that's not the case either way | 13:46 |
niemeyer | Still, we have a hook running | 13:46 |
niemeyer | hazmat: Ok | 13:46 |
hazmat | the debug stuff names the windows by hook , except window 0 which is named 'shell' afaicr | 13:46 |
* kim0 nods | 13:47 | |
niemeyer | kim0: What's in /tmp/tmpLjxVDG-install | 13:47 |
kim0 | niemeyer: http://paste.ubuntu.com/616700/ | 13:47 |
kim0 | scary script | 13:47 |
hazmat | so it seems somehow the debug window was ended but the underlying debug process is still alive. | 13:47 |
niemeyer | hazmat: Yeah, it's still in the sleep loop | 13:48 |
niemeyer | hazmat: Which confirms your initial theory | 13:48 |
kim0 | I probably closed the window too fast, if you think it needs time to do anything | 13:49 |
hazmat | it might be a different signal gets sent besides HUP that needs to be caught here | 13:49 |
hazmat | kim0, it shouldn't matter | 13:49 |
hazmat | we should never rely on user timing | 13:50 |
kim0 | yeah I know | 13:50 |
niemeyer | hazmat: TERM, KILL | 13:50 |
niemeyer | hazmat: Wait.. the HUP is catching the outside signal | 13:51 |
niemeyer | hazmat: That's not the problem.. that script is still running | 13:51 |
hazmat | yeah.. its not in the screen process | 13:51 |
niemeyer | kim0: One more: /proc/1585/environ | 13:52 |
kim0 | http://paste.ubuntu.com/616708/ | 13:53 |
kim0 | niemeyer: not sure why it has no newlines | 13:53 |
kim0 | doh | 13:53 |
niemeyer | hazmat: We should monitor it from outside instead of expecting it to do stuff before it dies | 13:53 |
kim0 | sorry .. pastebinit error | 13:53 |
niemeyer | kim0: That's the file format indeed | 13:53 |
kim0 | niemeyer: http://paste.ubuntu.com/616709/ | 13:54 |
kim0 | this is complete | 13:54 |
niemeyer | hazmat: e.g. writing to hook.pid when the process starts | 13:54 |
* kim0 probably just uncovered a pastebinit bug | 13:55 | |
hazmat | niemeyer, yeah.. and then just doing something like kill -0 `cat hook.pid` for the sleep condition | 13:55 |
niemeyer | hazmat: RIght | 13:55 |
* hazmat files a bug | 13:56 | |
niemeyer | hazmat: Another handy issue for a brain breaker.. will paste that conversation in a bug. | 13:56 |
niemeyer | hazmat: Oh, ok :) | 13:56 |
niemeyer | hazmat: Please paste the log for context | 13:56 |
niemeyer | hazmat: Thanks | 13:56 |
niemeyer | kim0: Alright.. we know what's wrong | 13:56 |
kim0 | great :) | 13:56 |
niemeyer | kim0: For fixing your problem right now, | 13:56 |
niemeyer | kim0: kill 1585 | 13:57 |
kim0 | got it | 13:57 |
kim0 | thanks | 13:57 |
niemeyer | kim0: np | 13:57 |
kim0 | wonder why no one else is hitting this | 13:57 |
niemeyer | kim0: Thanks a lot for your help uncovering the bug | 13:57 |
niemeyer | kim0: It's the way the debug-hook window was closed | 13:57 |
kim0 | ah so you close it clike ctrl-a c | 13:58 |
kim0 | ok | 13:58 |
kim0 | not c .. whatever closes windows :) | 13:58 |
kim0 | ew | 14:00 |
kim0 | ok probably hitting a new one | 14:00 |
kim0 | I killed the process .. got the window for db-relation-changed | 14:01 |
kim0 | relation-get inside it says → No ENSEMBLE_AGENT_SOCKET/-s option found | 14:01 |
kim0 | env dump http://paste.ubuntu.com/616713/ | 14:01 |
kim0 | hazmat: could you please as well | 14:02 |
* hazmat looks | 14:03 | |
hazmat | kim0, that's the env from window 0 ? | 14:03 |
hazmat | or the debug window? | 14:03 |
kim0 | hazmat: no win 1 | 14:03 |
kim0 | the db-relation-changed window | 14:03 |
kim0 | db-relation-joined actually | 14:03 |
hazmat | it doesn't look like it has the debug environment variables sourced | 14:04 |
niemeyer | That's the shell isn't it? | 14:04 |
niemeyer | SUDO_COMMAND=/usr/bin/byobu -xRS drupal-0-hook-debug -t shell | 14:05 |
niemeyer | kim0: I think the paste is bogus | 14:06 |
niemeyer | kim0: Or maybe I just misunderstand what the variables mean | 14:07 |
kim0 | I can repaste manually | 14:07 |
kim0 | anything to look for ? | 14:07 |
niemeyer | kim0: Just thinking how to get the pid for the parent shell | 14:08 |
kim0 | ps -elf | grep $$ ? | 14:08 |
kim0 | it'd be listed in ppid field | 14:08 |
niemeyer | kim0: echo $BASHPID | 14:08 |
kim0 | 9229 | 14:09 |
niemeyer | kim0: echo $PPID | 14:09 |
kim0 | 935 | 14:09 |
* kim0 feels like a shell | 14:09 | |
niemeyer | Ok, cool | 14:09 |
niemeyer | kim0: Hehe :-) | 14:09 |
kim0 | :) | 14:09 |
niemeyer | kim0: Yeah, looks like a failure in source it indeed | 14:10 |
niemeyer | sourcing | 14:10 |
kim0 | we don't log those steps somewhere ? | 14:10 |
niemeyer | Can't imagine how that could happen, though, even if we killed the process | 14:10 |
niemeyer | kim0: Nope, this is the bootstrapping of debugging itself.. we might indeed have to log it in the future | 14:11 |
niemeyer | kim0: Can you please paste the new process list so we can reach the new hook | 14:11 |
kim0 | http://paste.ubuntu.com/616722/ | 14:12 |
kim0 | guess no one uses debug-hooks really :) | 14:13 |
* kim0 afk for 5 mins | 14:13 | |
niemeyer | kim0: We do, but we don't generally kill processes in the middle | 14:15 |
hazmat | kim0, just me ;-) | 14:15 |
hazmat | need to run a quick errand, back in a bit | 14:15 |
niemeyer | kim0: Let me know when you're back.. we can follow on a bit if you're interested | 14:23 |
kim0 | niemeyer: back | 14:24 |
niemeyer | kim0: Ok, let's see /tmp/tmpR1UhMY-db-relation-joined then | 14:25 |
kim0 | niemeyer: http://paste.ubuntu.com/616737/ | 14:25 |
niemeyer | kim0: Ok, please figure ENSEMBLE_DEBUG from /proc/9187/environ, and list $ENSEMBLE_DEBUG/env.sh | 14:26 |
kim0 | niemeyer: can't see ENSEMBLE_DEBUG .. paste http://paste.ubuntu.com/616742/ | 14:28 |
niemeyer | kim0: Hmm.. I guess it wasn't exported | 14:30 |
* niemeyer htinks | 14:31 | |
niemeyer | thinks | 14:31 |
niemeyer | kim0: Ok, let's try to find by force: cd /tmp && find -name env.sh | 14:34 |
niemeyer | kim0: Will probably see more than one | 14:34 |
kim0 | niemeyer: 3 of em | 14:35 |
niemeyer | kim0: Ok, let's fine the one with db-relation-joined | 14:35 |
kim0 | http://paste.ubuntu.com/616746/ | 14:36 |
kim0 | http://paste.ubuntu.com/616747/ | 14:36 |
kim0 | http://paste.ubuntu.com/616748/ | 14:36 |
kim0 | niemeyer: I tried grep'ing .. doesn't have joined in them | 14:37 |
niemeyer | kim0: Well, that's likely the issue then.. let me check | 14:37 |
niemeyer | kim0: That's weird.. | 14:39 |
niemeyer | kim0: All of them have ENSEMBLE_AGENT_SOCKET | 14:39 |
kim0 | maybe none of them is sourced | 14:39 |
niemeyer | kim0: This is the right one for db-relation-joined: http://paste.ubuntu.com/616747/ | 14:40 |
niemeyer | kim0: Can you please paste the hook.sh file living in the same directory? | 14:41 |
niemeyer | kim0: Well, that's the thing | 14:41 |
niemeyer | kim0: There's no easy way for bash to be executed without this being sourced | 14:41 |
kim0 | niemeyer: http://paste.ubuntu.com/616754/ | 14:41 |
niemeyer | kim0: As you can see.. | 14:41 |
niemeyer | kim0: That prior paste is from /tmp/tmp.Sj2ilkd53B/env.sh, right? | 14:42 |
kim0 | double checking | 14:42 |
kim0 | should be yes | 14:43 |
niemeyer | kim0: Ok, so there's really no way for bash to be executed without it being sourced, which is awkward.. | 14:43 |
niemeyer | kim0: You got a bash, without the env variables, but the only way for that bash to have come up, was through the sourcing line | 14:44 |
niemeyer | Hmmm | 14:44 |
kim0 | thinking as well | 14:44 |
kim0 | niemeyer: parent process for the Window1 shell, is byobu, not the hook.sh ? | 14:46 |
niemeyer | kim0: Yeah, that's strange | 14:47 |
kim0 | hook.sh should still be running if it fired us right | 14:47 |
niemeyer | kim0: Indeed | 14:47 |
niemeyer | kim0: This would also justify the previous issue as well, interestingly | 14:47 |
niemeyer | kim0: Hmm | 14:48 |
niemeyer | kim0: Let me do a local test, hold on | 14:48 |
kim0 | niemeyer: pstree -p .. if that's helpful http://paste.ubuntu.com/616757/ | 14:49 |
hazmat | nice.. half way to my walking desk finished | 14:51 |
kim0 | hazmat: walking desk ? | 14:51 |
hazmat | kim0, treadmill with keyboard tray and monitor stand | 14:52 |
kim0 | oh that's new to me ... sounds cool indeed :) | 14:52 |
hazmat | kim0, http://opinionator.blogs.nytimes.com/2010/02/23/stand-up-while-you-read-this/ ... http://www.nytimes.com/2008/09/18/health/nutrition/18fitness.html | 14:52 |
hazmat | kim0, lots of a good evidence for the benefits vs sitting in a chair all day | 14:53 |
kim0 | yeah that's intuitive | 14:53 |
hazmat | the only problem is that the treadmill weighs 250 pounds.. just carried it up the stairs.. so most of the way done on the setup | 14:53 |
* hazmat catches up the irc log to get up to speed on debug-hooks | 14:54 | |
kim0 | hazmat: congrats :) send pics to warthogs :) | 14:54 |
kim0 | niemeyer: I'm going for a late lunch .. I have inserted your ssh key into ec2-67-202-22-46.compute-1.amazonaws.com (drupal/0) should you want to login to it | 14:56 |
niemeyer | kim0: Cheers | 14:56 |
niemeyer | kim0: Will check it out | 14:56 |
kim0 | cool | 14:56 |
niemeyer | hazmat: I suspect both issues likely boil down to the way the shell is being executed | 15:16 |
niemeyer | hazmat: It's doing a two-step execution, and it's not entirely clear why | 15:16 |
hazmat | niemeyer, what's strange is that it works sometimes | 15:16 |
hazmat | writing up a reply to tom for his questions on list | 15:16 |
niemeyer | hazmat: It first creates a window, which spawns an outside shell by screen itself | 15:16 |
niemeyer | hazmat: then overwrites a shell onto it | 15:17 |
niemeyer | hazmat: I suspect we may be hitting some race within screen itself | 15:17 |
niemeyer | hazmat: Is there a reason why you coded it like that, or is it safe to change? | 15:17 |
hazmat | niemeyer, its safe to change, i thought that creation was per your suggestion | 15:18 |
niemeyer | hazmat: It's unrelated to my suggestion | 15:19 |
hazmat | the openstack nova screen setup does a similiar setup | 15:19 |
niemeyer | hazmat: It's executing two shells for no reason | 15:19 |
niemeyer | hazmat: Rather than only hook.sh | 15:19 |
hazmat | right | 15:19 |
hazmat | so instead of creating the window it should just exec in the named window? | 15:19 |
niemeyer | hazmat: The "screen" command of screen takes an executable as an argument | 15:20 |
niemeyer | hazmat: -X screen -t .. hook.sh | 15:20 |
niemeyer | hazmat: The shell I'm seeing in kim0's is the shell from the first screen command, not the one from the exec | 15:20 |
hazmat | niemeyer, that sounds good to me | 15:21 |
kim0 | hope that bug got caught | 16:01 |
niemeyer | kim0: Sounds like so.. | 16:03 |
niemeyer | kim0: Will try something this afternoon | 16:03 |
kim0 | awesome | 16:03 |
niemeyer | kim0: Thanks for all your help | 16:03 |
kim0 | All thanks to you :) | 16:03 |
* hazmat lunches | 16:15 | |
* niemeyer too | 16:20 | |
kim0 | hmm | 17:42 |
kim0 | state: install_error | 17:42 |
kim0 | if service is having install_error .. any facility to figure out what went wrong | 17:43 |
kim0 | ok I could figure it out | 17:46 |
niemeyer | kim0: I was kind of expecting that.. | 17:51 |
niemeyer | kim0: Is that the service we were debugging? | 17:51 |
kim0 | niemeyer: I shutdown the env and started a fresh | 17:51 |
niemeyer | kim0: Oh, ok | 17:51 |
niemeyer | kim0: ensemble log, ensemble debug-hook, etc | 17:51 |
kim0 | niemeyer: does the log not provide hooks stdout any more ? | 17:52 |
kim0 | is it supressed by default | 17:52 |
niemeyer | kim0: It does, but you have to turn it on earlier | 17:52 |
niemeyer | We should really have a feature where it logs by default | 17:52 |
niemeyer | and rotates them out after a while | 17:52 |
niemeyer | kim0: Otherwise, the best bet is logging in the machine and checking logs | 17:52 |
niemeyer | kim0: You should be able to retry, though | 17:53 |
niemeyer | kim0: Run debug-hook | 17:53 |
niemeyer | kim0: and then run ensemble resolved with the --retry argument | 17:53 |
kim0 | niemeyer: hmm .. service unit is stuck somehow | 17:59 |
kim0 | here is status http://paste.ubuntu.com/616891/ | 17:59 |
kim0 | I hope you don't mind all the questions | 17:59 |
niemeyer | kim0: Not at all.. I'm actually going to fix some of the issues you found today | 17:59 |
kim0 | yeah .. the basic workflow should be smoother .. | 18:00 |
kim0 | so, I had an error in a hook, now I have no idea how to nudge things and get them back | 18:01 |
niemeyer | kim0: resolved, as I mentioned above | 18:01 |
kim0 | bin/ensemble resolved --retry drupal/1 | 18:02 |
kim0 | tried this | 18:02 |
niemeyer | kim0: Ok, what happened next? | 18:02 |
kim0 | debug-log only shows mysql related messages | 18:02 |
kim0 | and status is the same | 18:02 |
niemeyer | kim0: Have you actually fixed the original reason why your hook failed? | 18:03 |
kim0 | niemeyer: yes I did | 18:03 |
niemeyer | kim0: Why was it failing before? | 18:04 |
kim0 | niemeyer: ssh'ed into the machine .. and ran it | 18:04 |
kim0 | niemeyer: some cd to a non existent directoyy | 18:04 |
kim0 | niemeyer: I ran the script inside the instance .. it is fine now | 18:04 |
niemeyer | kim0: Is it returning successfully now? (exit status 0) | 18:04 |
kim0 | checking again | 18:04 |
kim0 | can't really check again accurately | 18:06 |
kim0 | ensemble-log giving errors because it's running outside the environment | 18:06 |
kim0 | apt-get saying packages already installed ..etc | 18:06 |
kim0 | but yeah it seems correct | 18:06 |
niemeyer | kim0: So how did you run it befor? | 18:07 |
niemeyer | e | 18:07 |
kim0 | niemeyer: in a debug-hooks session | 18:07 |
kim0 | /var/lib/ensemble/units/drupal-1/formula/hooks/install | 18:07 |
niemeyer | kim0: If apt-get install is failing, it won't work as a hook either | 18:07 |
kim0 | it's just saying .. the package is already installed | 18:08 |
kim0 | the script is fine trust me :) | 18:08 |
niemeyer | kim0: :) | 18:08 |
niemeyer | kim0: If you have already executed it by hand, you can just say "resolved" | 18:09 |
niemeyer | kim0: Without --retry | 18:09 |
niemeyer | kim0: But I suspect that fuzzing may have triggered something else.. that "state: null" isn't really great | 18:09 |
kim0 | yeah | 18:09 |
niemeyer | kim0: Try the resolved trick | 18:09 |
kim0 | did that | 18:10 |
niemeyer | kim0: This just states to Ensemble "I have resolved the problem" | 18:10 |
kim0 | still null | 18:10 |
niemeyer | kim0: Ok, try redeploying the fixed formula then | 18:11 |
niemeyer | kim0: We'll have to investigate a bit that scenario | 18:11 |
niemeyer | kim0: (--retry with a broken script, etc) | 18:11 |
kim0 | how do I redeploy | 18:11 |
niemeyer | kim0: But first, I'll fix the debug-hook stuff we debugged this morning | 18:12 |
kim0 | ok np .. | 18:12 |
niemeyer | kim0: Same thing you did earlier? | 18:12 |
kim0 | I'll pick this up later | 18:12 |
kim0 | fresh environment .. ok | 18:12 |
kim0 | In the tutorial .. I'm assuming the formula is going to have errors | 18:13 |
niemeyer | kim0: Nope | 18:13 |
niemeyer | kim0: Just remove the unit | 18:13 |
niemeyer | kim0: and add it again | 18:13 |
kim0 | ok | 18:13 |
niemeyer | kim0: Yeah, that's a good thing | 18:13 |
niemeyer | kim0: You can also upgrade the formula in general | 18:13 |
kim0 | says, error state cannot be upgraded | 18:13 |
niemeyer | kim0: This is what we're preparing Ensemble to be able to do | 18:13 |
kim0 | or so | 18:14 |
niemeyer | kim0: Error? Change, upgrade.. | 18:14 |
kim0 | says like, formula is in error state .. so it cannot be upgraded | 18:14 |
kim0 | I lost the exact message though | 18:14 |
niemeyer | kim0: Yeah, you have to resolve it first.. | 18:15 |
kim0 | Ok .. I'll need to try this again (recovering from a formula with errors ) | 18:15 |
kim0 | and will discuss again with you | 18:15 |
niemeyer | kim0: Because otherwise we can't assume a known state | 18:15 |
niemeyer | kim0: Imagine an install hook failed in the middle | 18:15 |
kim0 | that's what it was :D | 18:16 |
kim0 | so I just need to know the recommended recovery steps | 18:16 |
niemeyer | kim0: Right.. simply upgrading won't really yield a working system necessarily | 18:16 |
niemeyer | kim0: Because half of it executed | 18:16 |
kim0 | how to know what went wrong .. release a fix .. start recovering | 18:16 |
niemeyer | kim0: What we want is this: | 18:17 |
niemeyer | kim0: install failed: check logs, fix it or retry, upgrade if wanted | 18:17 |
niemeyer | kim0: With debug-hook if desired, to understand what's going on | 18:18 |
kim0 | then use "resolved" right ? | 18:18 |
niemeyer | kim0: Right, after the "fix it or retry" | 18:18 |
niemeyer | kim0: Or during it actually.. retry is done with resolved | 18:18 |
kim0 | is there a way to upload the fixed new hook ? | 18:19 |
niemeyer | kim0: upgrade-formula | 18:20 |
kim0 | which refuses to work in error state ? | 18:20 |
niemeyer | kim0: Yes, which is the right thing to do | 18:20 |
kim0 | ok .. so I'd still need to upload my fixed hook | 18:21 |
niemeyer | kim0: The error state must be acknowledged by the administrator | 18:21 |
niemeyer | kim0: and if an install hook blows up in the middle, upgrading a new install hook with the error fixed won't necessarily make it work | 18:21 |
niemeyer | kim0: mkdir foo, run twice, breaks | 18:22 |
kim0 | so our recommended approach is kill the instance, and start a fresh machine ? | 18:22 |
niemeyer | <niemeyer> kim0: What we want is this: | 18:22 |
niemeyer | <niemeyer> kim0: install failed: check logs, fix it or retry, upgrade if wanted | 18:22 |
niemeyer | <niemeyer> kim0: With debug-hook if desired, to understand what's going on | 18:22 |
kim0 | the "fixing and trying" cycle is what I'm trying to grasp | 18:22 |
niemeyer | kim0: Fixing it means fixing the actual problem within the formula.. | 18:23 |
kim0 | and what about the trying | 18:23 |
niemeyer | kim0: Sorry, within the service unit | 18:23 |
kim0 | on a new instance ? | 18:23 |
niemeyer | kim0: If there's nothing to do.. you just run "resolved" | 18:23 |
kim0 | ah | 18:23 |
kim0 | so I fix the problem manually .. then run resolved | 18:24 |
niemeyer | kim0: Yes, that's one way to do it | 18:24 |
niemeyer | kim0: The other way to do it is to code an idempotent hook | 18:24 |
niemeyer | kim0: This enables you to run resolved and upgrade a new formula | 18:24 |
kim0 | and use, resolved --retry | 18:24 |
kim0 | right ? | 18:24 |
niemeyer | kim0: If that's what you want to do | 18:25 |
niemeyer | kim0: The problem is really quite simple | 18:25 |
niemeyer | kim0: When a hook fails, Ensemble will stop running hooks until the admin acknowledges it | 18:25 |
niemeyer | kim0: If you run ensemble resolved, it forgets about the old hook, and continues execution | 18:26 |
niemeyer | kim0: If you want to run the old hook again before continuing, you run resolved --retry | 18:26 |
niemeyer | kim0: and that's it | 18:26 |
niemeyer | kim0: Whether you upgrade the formula, change the hook in place to try things out, run debug-hook, run ensemble log, etc, is really up to you | 18:26 |
niemeyer | kim0: We're coding tools to give you everything you need to understand how things are behaving | 18:27 |
niemeyer | kim0: and fixing them | 18:27 |
kim0 | I guess the workflow I had in mind is .. install hook blows up .. I ssh into machine .. figure out why it blew up .. then I fix the hook *locally* .. then somehow ensemble would upload the new version and run that | 18:27 |
kim0 | but the workflow you explained is perfectly fine | 18:27 |
kim0 | thanks | 18:27 |
niemeyer | kim0: Well, maybe there are further options we can develop around this | 18:28 |
kim0 | Yeah, recovering from broken formulas should be smooth .. since people are going to make all sorts of mistakes :D | 18:29 |
kim0 | niemeyer: thanks for all the explanation and patience :) | 18:29 |
niemeyer | kim0: No worries | 18:29 |
niemeyer | kim0: Good to talk about that stuff.. it's important to learn how other people feel about the system too | 18:29 |
* kim0 nods | 18:30 | |
* SpamapS just discovered resolved yesterday btw | 18:45 | |
SpamapS | would have saved me quite a few remove-relation/add-relation cycles ;) | 18:45 |
niemeyer | SpamapS: Sorry about that :-) | 18:47 |
SpamapS | Yes shame on you guys for making the thing work welle nough to survive hundreds and hundreds of remove/adds. | 18:49 |
_mup_ | ensemble/close-zk-port r240 committed by gustavo@niemeyer.net | 18:50 |
_mup_ | Do not open zk port on AWS firewall. | 18:50 |
niemeyer | SpamapS: Yeah, good thing we are in a polishing cycle.. :) | 18:51 |
_mup_ | Bug #791973 was filed: Ensemble shouldn't open the EC2 firewall for zk access <Ensemble:Confirmed> < https://launchpad.net/bugs/791973 > | 18:58 |
=== deryck is now known as deryck[lunch] | ||
niemeyer | hazmat: sudo -u ubuntu screen -dmS $SESSION_NAME | 19:11 |
niemeyer | hazmat: -u ubuntu? Shouldn't this be root? | 19:11 |
niemeyer | Also, this command seems to create a new session, irrespective of whether there's an existing one with the same name | 19:12 |
hazmat | its connecting to the ubuntu user's screen session from the login shell | 19:14 |
niemeyer | hazmat: Yes, but isn't the ubuntu user connecting to root's screen using sudo? | 19:15 |
hazmat | niemeyer, no its not that requires a setuid binary screen program | 19:15 |
niemeyer | hazmat: Using sudo? | 19:15 |
hazmat | niemeyer, that would be fine | 19:16 |
hazmat | i thought you where referring to multi-user screen | 19:16 |
niemeyer | hazmat: ssh -t ubuntu@%s sudo byobu -xRS %s-hook-debug -t shell | 19:16 |
hazmat | sounds good | 19:16 |
niemeyer | hazmat: The debug-hook command connects a session from root | 19:16 |
niemeyer | hazmat: Which means doing -u ubuntu would put the session in another place | 19:17 |
niemeyer | Ok, I'll fix that as well | 19:17 |
niemeyer | So there's apparently no way to start a screen session unless it doesn't yet exist.. :( | 19:24 |
niemeyer | Ok, I'll give tmux a try.. | 19:33 |
niemeyer | smoser! | 19:38 |
niemeyer | smoser: Was just reminding of you while hacking a shell script | 19:38 |
niemeyer | smoser: exec &> foo.. that's an awesome trick I learned with you recently :) | 19:39 |
smoser | that is bash only | 19:39 |
smoser | exec > foo 2>&1 | 19:39 |
smoser | would be the posix shell equivalent | 19:40 |
niemeyer | smoser: Nice | 19:40 |
niemeyer | smoser: Will use that | 19:40 |
niemeyer | hazmat: If a sigspec is EXIT (0) the command arg is executed on exit from the shell. | 19:53 |
niemeyer | hazmat: Looks handy | 19:53 |
niemeyer | hazmat: Sorry, that was a quote | 19:53 |
niemeyer | Wonder if that will *always* execute | 19:53 |
* niemeyer tests | 19:54 | |
=== deryck[lunch] is now known as deryck | ||
niemeyer | Yeah, works | 19:55 |
_mup_ | ensemble/expose-hook-commands r242 committed by jim.baker@canonical.com | 20:29 |
_mup_ | Implemented open-port, close-port hook commands | 20:29 |
jimbaker | biab | 20:29 |
_mup_ | ensemble/debug-hook-fixes r240 committed by gustavo@niemeyer.net | 20:42 |
_mup_ | Fixed several issues in the debug-hook shell payload, and replaced | 20:42 |
_mup_ | screen with tmux to handle concurrent session creation without races. | 20:42 |
_mup_ | Let's see if this works in real test cases now. | 20:42 |
niemeyer | What's the proper format to put ensemble-branch in the environment's file again? I recall we had a weird issue with one of the url forms, and I killed my commented option by mistake. | 20:45 |
niemeyer | jimbaker, hazmat, bcsaller: ? | 20:46 |
bcsaller | niemeyer: ensemble-branch: lp:~bcsaller/ensemble/config-set-lifecycle | 20:46 |
niemeyer | bcsaller: Does that work? Nice, I think that was one of the formats which was not working | 20:47 |
bcsaller | its been working for me | 20:47 |
niemeyer | bcsaller: Super, thanks | 20:47 |
niemeyer | bcsaller: It was a crazy issue we had in the sprint.. Launchpad was just barfing on https or http or lp, can't recall which one.. | 20:48 |
SpamapS | niemeyer: I'm thinking of uploading principia-tools to the ensemble ppa... thoughts before I do that? | 20:50 |
niemeyer | SpamapS: No, it sounds good | 20:50 |
niemeyer | 2011-06-02 16:50:32,617 ERROR ProviderError: Interaction with machine provider failed: ConnectionTimeoutException('could not connect before timeout after 1 retries',) | 20:51 |
niemeyer | Feels like a regression on the waiting behavior | 20:51 |
niemeyer | kim0: You were right.. debug-hook needed some good debugging by itself | 21:23 |
_mup_ | ensemble/debug-hook-fixes r241 committed by gustavo@niemeyer.net | 21:40 |
_mup_ | - Add additional hook names to valid list on debug-hook. | 21:40 |
_mup_ | - Fix debug-hook shell template. | 21:40 |
=== negronjl_ is now known as negronjl | ||
_mup_ | Bug #792071 was filed: relation-get blowing up badly during install hook <Ensemble:New> < https://launchpad.net/bugs/792071 > | 22:06 |
niemeyer | Observing debug-hooks actually working is beautiful! | 22:16 |
niemeyer | Do changes on one side, exit.. boom! The other side pops up! | 22:17 |
* niemeyer ponders about how to execute a script by piping it on stdin | 22:23 | |
niemeyer | Hah, /bin/bash - | 22:24 |
kim0 | niemeyer: great news! | 22:58 |
kim0 | so it's working as it should now .. woohoo | 22:59 |
kim0 | niemeyer: Would you think it'd be better for debug-hooks to open the hook code in vim in the new screen window, instead of dropping me in a blank shell ? | 22:59 |
niemeyer | kim0: Hmm | 23:03 |
niemeyer | kim0: No, probably not.. this would likely pass the wrong idea about what you can do within the debug hooks session | 23:03 |
niemeyer | kim0: It's useful to look at the script, but you're free to do pretty much anything | 23:04 |
kim0 | hmm | 23:04 |
kim0 | to debug the hook .. I had to figure out where it was | 23:04 |
kim0 | and for that I used "find /" | 23:04 |
kim0 | which sux ofc | 23:04 |
niemeyer | kim0: Yeah, I noticed this as well | 23:05 |
bcsaller | it should cd you into the formula directory I think | 23:05 |
niemeyer | kim0: That's a failure in the hook execution logic | 23:05 |
bcsaller | I think we have a bug for that | 23:05 |
niemeyer | kim0: Hooks should always be executed within the formula directory | 23:05 |
niemeyer | kim0: When debugging or not | 23:05 |
niemeyer | bcsaller: Hey! | 23:05 |
bcsaller | :) | 23:06 |
kim0 | bcsaller: the cd into hooks dir, sounds like a good compromise .. is it planned | 23:06 |
bcsaller | kapil and I agreed it was a good idea when we talked about it | 23:06 |
niemeyer | Agreed, that's important even for plain hooks | 23:07 |
niemeyer | I mean, when executing the real ones rather than debugging | 23:07 |
_mup_ | ensemble/expose-hook-commands r243 committed by jim.baker@canonical.com | 23:11 |
_mup_ | Testing on args and logging for port commands | 23:11 |
niemeyer | Woohay | 23:21 |
kim0 | expose merged ? | 23:31 |
niemeyer | kim0: Not yet | 23:42 |
niemeyer | jimbaker: Is hard at work on it | 23:42 |
_mup_ | ensemble/debug-hook-fixes r242 committed by gustavo@niemeyer.net | 23:46 |
_mup_ | - Use the real unit name as the session, since tmux is happy with that. | 23:46 |
_mup_ | - Send an initialization script with a simple tmux.conf when firing | 23:46 |
_mup_ | ssh through the debug-hooks command. Use screen shortcuts since | 23:46 |
_mup_ | people will be happier with that. | 23:46 |
_mup_ | [WIP] | 23:46 |
niemeyer | Okay, time to step out and do something else.. | 23:47 |
niemeyer | See y'all tomorrow! | 23:47 |
jimbaker | kim0, the expose work is getting close - hook commands are almost ready for review, i have most of the remaining provisioning work already done from a spike branch, and the ec2 group authorization model maps readily against what is necessary for a provider | 23:54 |
kim0 | jimbaker: sounds like great news .. rock on :) | 23:55 |
SpamapS | Hmm.. I've been thinking about proposing a specification for 'machine-info-get' .. Teyo from Puppet suggested that they'd be interested in collaborating on a library to collect info about the current machine.. and it would be really useful to have this... | 23:56 |
SpamapS | So I'm thinking the machine agent should have some of this information available.. some from the machine provider, some from this library | 23:57 |
SpamapS | That would solve the 'Whats my private IP? Whats my public IP?' case. | 23:58 |
SpamapS | How would I propose such a spec? | 23:58 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!