=== cortana [n=sam@62-31-146-25.cable.ubr12.azte.blueyonder.co.uk] has joined #upstart === wasabi_ [n=wasabi@cpe-76-184-122-13.tx.res.rr.com] has joined #upstart === MasseR [n=masse@dsl-tkubrasgw1-fe11fa00-85.dhcp.inet.fi] has joined #upstart === Keybuk [n=scott@quest.netsplit.com] has joined #upstart [10:01] trying to code shortly after you wake up never works too well [10:09] morning [10:09] yeah I find that [10:10] i'm trying to work on this profiles code and my brain just says "uhh?" :p [10:12] have a bananananana :p [10:12] :D [10:19] oh well, bbl === Md [i=md@freenode/staff/md] has joined #upstart === Keybuk tackles the problem of namespace collision [12:08] too many structs :-/ [12:08] conf_sources -(hash)-> ConfSource -(hash)-> ConfFile -(list)-> ConfItem -> job/state/etc. [12:08] to track names, I'd need [12:08] namespace -(hash)-> Name -(list-> job/state/etc. [12:35] http://people.ubuntu.com/~scott/conf.jpg [12:35] :-( === Keybuk wonders whether objects of different types should share namespaces [12:39] e.g. should states and jobs share a namespace? === Md [i=md@freenode/staff/md] has joined #upstart === sadleder [n=sadleder@p50813612.dip0.t-ipconnect.de] has joined #upstart === sadleder [n=sadleder@p50813612.dip0.t-ipconnect.de] has left #upstart [] === sadleder [n=sadleder@p50813612.dip0.t-ipconnect.de] has joined #upstart === sadleder [n=sadleder@p50813612.dip0.t-ipconnect.de] has left #upstart [] === asdaf [n=Ack@213-140-11-128.fastres.net] has joined #upstart === tale [n=tale@207.235.54.1] has joined #upstart === ion_ [i=ion@heh.fi] has joined #upstart [04:10] states and jobs? [04:10] states are jobs, no? [04:11] no [04:11] you mean something other than what I was thinking you did, then [04:12] an example of a state is the period between tty-added and tty-removed for the same $TTY [04:12] this state can have multiple concurrent instances, since you can have multiple $TTYs [04:12] Hmm. This is all getting very confusing. I like dit when you needed one file per tty. :0 [04:13] It made the idea of a state easy: while job-name [04:20] heh [04:20] I like the idea of one file for all ttys [04:20] since they're identical === Amaranth [n=travis@ubuntu/member/Amaranth] has joined #upstart === phoenix24 [i=shp@ns38066.ovh.net] has joined #upstart [05:43] Just being identically doesn't make it automatically a case to combine. [05:43] identical [05:43] What was wrong with our initial idea of jobs themselves defining named states? [05:44] doesn't work for the tty case [05:44] Actually, I guess I don't really even know what I'm talking about anymore. Ya'll have probably done a lot of work since I was last in on it. [05:44] or the network interface case [05:45] it works on paper, but not for the use cases it's actually needed for [05:45] Explain the tty case? [05:45] network interface is less controversial, so let's use that as an example [05:45] Okay, that. ;0 [05:45] we have a pair of events with a common variable [05:45] interface-up eth0 [05:45] interface-down eth0 [05:45] so we can define the pairing and name that, say, interface-is-up [05:46] interface-up ... interface-down $IFACE [05:46] so when any interface comes up, the state is true [05:46] and when that same interfaces goes down, the state becomes false [05:46] ok? [05:46] ok. [05:46] computers have multiple interfaces [05:47] we don't just want to track the first one that we see, we want to track them all [05:47] so when we see "interface-up lo", the state is true "for lo" [05:47] we might next see "interface-up eth0", now the state is true "for lo" and "for eth0" [05:47] next we might see "interface-down eth0" [05:47] this only matches the second half of the "for eth0" true state, so that state becomes false [05:47] now the state is only true "for lo" [05:48] by thinking in this way, we can answer the questions [05:48] is the state true for any interface (any network interface is up!) [05:48] I'd start with a network job, which only started when Any interfaces were up, and stopped itself when the last interface went down. You can then depend on the start/stopped of that job to define a state where any interaface is up. [05:48] is the state true for a specific interface (or any non-lo interface) [05:49] wasabi: but that involves defining a job that tracks the up/down events it receives, no? [05:49] Yes, it does. [05:49] how would you define a job that was running while any interface, apart from lo, was up? [05:49] It would start on any interface up, and stop on any interface down. And the pre-stop handler would check if ALL interfaces were down, if not, it wouldn't really stop. [05:50] In code. [05:50] the nice things about having first-class states is that you don't need to do that [05:50] upstart can track that for you [05:50] in fact, we then get to do things like [05:50] "is the set of paths that are mounted a subset of the following list?" [05:50] so upstart itself can track the problem of the writable filesystem [05:51] interesting. [05:51] Not sure if that's completely beneficial to implement that way or not. The script thing, as far as I can tell, works. [05:52] jobs implicitly have a state coupled to them [05:52] so by defining a job, you are also defining the state which defines when they are running [05:52] but I figured that since the machine is sufficiently useful, one should be able to define states in their own right, for use in the definitions of other jobs [05:53] the /etc/init/conf.d/udev example holds here; where udev ships a rule that calls initctl for appropriate events, and ships upstart states for each of the event pairs [05:53] so a job doesn't have to worry about module-loaded ... module-removed [05:54] you could do this simply by defining jobs for them all [05:54] but they would have to be all instance jobs [05:54] All the modules? Not worth it. They are very situation specific. [05:54] (an alternative viewpoint is to make all jobs instance jobs by default :p) [05:55] Why is from module-loaded foo to module-removed foo so bad? [05:56] Ya know, another thing I'm concerned about in all of this is starting a job midstream. What if you install the job file while the foo module is loaded, does it sit there waiting for foo to show up? [05:56] Or are you going to always model all system state in upstart at all times? [05:56] I don't think so. I think you'd install the job, and ask it to start, right now. And it would check if the module was loaded, in pre-start. [05:57] And this is all weird anyways because it's inherently race. You can never guarentee that the module didn't remove itself after the job was started. [05:57] s/race/racey/ [05:57] So every job still has to verify that the system is proper, it has to check if the module is loaded. [05:58] So inevitably each script will have sanity checks in pre-start. Regardless how much help upstart gives it. [06:03] All of this together makes me wonder if it's not getting just too complicated. [06:06] Maybe I'm just being negative today. [06:08] You could do network interfaces like that today. [06:08] interface-up job which fires when udev tells it an interface is up. It can itself main some state files in /var/run or something. [06:08] believe it or not, this way is simpler to implement than having raw jobs as states [06:09] And it can keep track itself what interfaces are or are not up... and emit events for specific interfaces. [06:09] I think that services will almost always describe the states in which they should be running, rather than being directly event based [06:09] hmm [06:09] ie. "while there is a network interface up, and the filesystem is writable, and dbus is running" [06:10] if true, the state graph can be evaluated when the job is created, so yes, it would start automatically [06:10] it is inherently racey, so the service should fail normally if the resources it expects are not available [06:10] It can only be evaluated if the state which is defined by events is installed before those events happen. [06:10] (this is not unreasonable) [06:10] No? [06:11] right, that is true in the current upstart model [06:11] upstart would need to record all events to avoid that [06:11] Yeah, and that's probably unreasonable. [06:11] So, somebody is still going to have to, after installing a new job with a new state, give it a push. [06:11] Write some code which checks if the state is true by evaluating the system. [06:12] I think it will be rare that this is true, no? === mbiebl [n=michael@e180071063.adsl.alicedsl.de] has joined #upstart [06:12] I'm not sure. [06:12] jobs that should be started in postinst will rarely need to define a state based on events [06:12] but yes, that is a concern [06:12] there's more interesting examples [06:12] But if those states were maintained by stateless jobs, it's not a concern. [06:13] ? [06:13] So you have a network job, which fires anytime a network interface, any interface, comes up or down. It keeps a count in /var someplace about the total number of active interfaces, by actually checking the interfaces, not reading the events. [06:14] ok [06:14] Keybuk: hi [06:14] And a job which cares about the network would need to start on any network event also, and check that file. Or something. [06:14] I don't know. [06:14] Which is still racey. [06:14] the same network-monitoring job could register the states in upstart's memory directly [06:14] It would have to check itself. [06:14] avoiding the use of filesystems [06:14] But then we're just talking about reusable scripts. [06:15] fiddly scripts :) [06:15] Explain how states in upstart are maintained? [06:16] It makes me wonder if what is being built doesn't actually solve any issue, is all. [06:16] Since hte issue is still there. [06:16] what issue do you think we're attempting to solve? [06:16] Every job that cares about network has to actually check the network and exit gracefully in pre-start. [06:16] why pre-start? [06:17] Or start. [06:17] it can exit ungracefully in main [06:17] True, you are correct, but still, every job has to do that. [06:17] and log in syslog that it was unable to bind to the interface [06:17] *shrug* every job does that already if it's checking the return codes of its syscalls like a good daemon [06:17] Yup. [06:17] Keybuk: I've got two questions. [06:17] So, by recording states, at all, in upstart or otherwise, what are you solving? They may be recorded in upstart, but jobs still have to check on their own properly. [06:18] So why not just let jobs do that? [06:18] wasabi: solving the reattempt to start the job issue [06:18] First: Is the Ubuntu udev patched to create /dev/console,null and the std* symlinks? [06:19] the job doesn't just have to check it, it has to accept all possible states can fail intermittently, and fallback to some kind of "waiting for appropriate state" inner loop [06:19] if it cannot bind() to the interface, it has to loop until it can [06:19] But doesn't the job have to reeveluate that in that loop on EVERY event that might contribute to the state? [06:19] perhaps with some kind of asynchronous notification from an interface daemon that a new interface is up, to reattempt the bind [06:19] exactly [06:20] this is the launchd model, btw [06:20] Well, no. If it can't bind, it dies. And starts again next time something that might make it work appears. [06:20] Yeah, I know. [06:20] no [06:20] it doesn't start again [06:20] because nothing will restart it [06:20] it failed, bad bad job [06:20] what upstart provides is that loop [06:20] the job defines what state it likes [06:20] Eh? If it's waiting for both the file system and network, if any of those happen, it will restart. [06:20] and upstart guarantees that it will attempt to start the job every time the system is in that state [06:20] and that it will kill the job when the system goes out of that state [06:21] And check to see if both conditions are acceptable. So, when the file system comes up, it will start and look for the network. No network? Okay die. Network comes up a few minutes later and it starts again. Checks again and starts successfully. === phoenix24 [i=doy@ns38066.ovh.net] has joined #upstart [06:21] it cannot guarantee that the state will remain true for any period after the initial "this is true" [06:21] but it can guarantee that the job will be killed again if it hasn't noticed [06:21] *and* it guarantees that the job will be restarted if the state should become true again [06:21] wasabi: sorry, I appear to have confused you [06:21] Perhaps. [06:21] I confuse easily these days. [06:21] wasabi: my initial description of the looping application is the upstart-less world [06:21] I think my brain has been leaking a lot lately. [06:22] the launchd model: [06:22] - all jobs are started immediately [06:22] - if a resource the job needs is not available, the job should sleep until it is available [06:22] - if a resource the job needs becomes unavailable, the job should sleep until it is available again [06:23] As far as ive understood what Keybuk has been describing, it sounds good. [06:23] ie. monitor your syscalls, if any fail due to an error (bind fails, write fails, etc.) you should fall into a kind of slumber loop [06:23] how you wake yourself up from that slumber loop is anyone's guess [06:23] asynchronous notification of the potential availability of resources? [06:23] or maybe you just use blocking writes and blocking binds? :p [06:23] Ya know, I don't really mind that model, except for the wake up part. [06:23] But upstart has the wakeup part, in events. [06:23] right [06:24] So slumber, but get poked when an event happens. [06:24] so what upstart provides is the acceptance of the reality that states come and go [06:24] Where slumber == "just exit and let upstart start you again later" [06:24] it guarantees that you'll be started when the state is true, and stopped when it becomes false [06:24] so you can assume that any failure to obtain resources is bad, and just exit(1) [06:24] because you'll get restarted again next time you can have a go [06:24] Except that those states are inherently hard to monitor, and can't be relied upon anyways. [06:25] states are easy to monitor [06:25] Since they might be untrue by the time you get around to actually running. [06:25] Keybuk: What for ressources that upstart can't easily monitor? [06:25] and can be relied on to be true [06:25] sure, they can become false again [06:25] upstart says it will stop you if that happens [06:25] ok, you might hit the failure first, but the worst thing there is a syslog entry [06:25] E.g. remote services that are required, e.g. tomcat requiring a remote sql service. [06:26] but you *will* get restarted next time the state is true for a while [06:26] mbiebl: I really don't think upstart offers anything there. You'd want tomcat and the database to be started independently of ech other... as there might be tomcat services that don't need the database. [06:26] apps become "just assume that syscalls should work, check the return value, and bail out if they don't" [06:26] upstart takes care of restarting you when the state is true again [06:26] Well, okay. Yeah. I like that... but that's how it is without upstart monitoring states too. [06:27] Well, how is upstart supposed to know the state "remote sql service available" [06:27] My point is, there are states, that upstart can't provide. [06:27] mbiebl: It's not. You'd have to implement something which feeds that state to upstart. [06:28] Keybuk: apps can function in the maner you describe whether upstart watches states or not. Since upstart *will* start it again when any event happens that might make it runnable. [06:28] ah [06:28] upstart doesn't *monitor* states [06:28] upstart just provides a state whiteboard for everything else [06:29] e.g. heartbeat could do it for the remove case [06:29] remote case [06:29] or monit [06:29] or whatever [06:29] they just emit events which upstart can combine into states [06:29] or can set states true/false directly [06:29] (registered through the usual initctl/libupstart layer) [06:29] Hmm. You can set states. [06:29] I don't see why not [06:29] Okay, so a postinst script should set appropiate states. [06:29] That solves that. [06:30] A new state, for instance. [06:30] postinsts for new things might end up having some kind of udevtrigger-a-like [06:30] in fact, since many deviceish states will come with udev and HAL, udevtrigger is all you'd run :p [06:30] postinst: "hi upstart, I know this state is valid from x to y and n to m, but I just checked x and n, and it's good. So set it right now." [06:30] right [06:31] initctl set wibble true [06:31] So actually, a postinst for a new job might in fact set all states for that job to true without checking. [06:31] mbiebl: remote services should be easy with heartbeat or monit [06:31] And the job might fail. But that's okay. [06:31] wasabi: or it could just "start" the job *shrug* [06:32] Well, if it starts the job, and the job exits, the might might want to be started again properly. [06:32] Even though events that contribute to the state the job cares about aren't set. [06:32] mbiebl: A to question 1 -- ubuntu's udev copies /lib/udev/devices into /dev before starting, that directory contains the usual console, null, etc. devices [06:32] I am jumbling all my words. I have no idea why I do that. [06:32] Let me try again. [06:32] Keybuk: Well, you'd still have to patch heartbeat to emit upstart events. [06:34] job: from X to Y and N to M. The postinst runs. Currently X is true and N is false, but upstart doens't yet know. So the postinst starts the job. The job exits because N is false (syscall fails). N becomes true, but since X isn't yet known, the job doesn't start again. [06:34] Hence the postinst has to set X and N to the proper values at the time of being installed, and let new events from that point on alter them. [06:35] yeah [06:36] postinsts for packages registering new states, or jobs that use unique states, should make an effort to check whether the requisites are true and set the state accordingly [06:36] And what sets the state? [06:36] e.g. a package installing a state that says whether users are logged in should perhaps look at utmp [06:36] In the normal case? [06:36] in the normal case, the state would be set by the daemon or by events [06:37] Where are teh events for the state defined? [06:37] And are states binary? [06:37] by some daemon or other? [06:37] Hmm. I mean for, like the network interface case. [06:37] Who watches the interfaces and sets the states? [06:37] not quite following [06:38] whoever installs the postinst [06:38] upstart does no watching [06:38] network interface case => udev [06:38] or maybe Network Manager [06:38] I can't remember whether that one comes via udev, HAL or NM [06:38] but it does come from one of them :p [06:38] So udev essentially runs initctl set network-up $IFACE true/false? [06:39] Or are the states defined in a file which contains `from X to Y`? [06:39] either is valid [06:39] in the udev case, I would have it emit events [06:40] (in fact, I think events are generally preferred) [06:40] So oddly enough, states are back to being exactly what a job is: started or stopped, true or false. :0 [06:40] Just without any executable. [06:41] yes [06:41] the implementation is very closely coupled internally [06:41] in fact, all jobs have a state associated with them [06:41] since it's that state that causes them to be started or stopped [06:41] the difference in definition is simply that one has more options than the other [06:41] Hmm. I see. [06:42] So, you might in fact have a state and a job file both in /etc/event.d. Both files would look about the same, except the state one wouldn't have any exec lines. [06:42] And why the syntax difference between 'set' and 'start/stop' in initctl? [06:42] that's the bit I'm trying to work out now :) [06:43] whether it is worth exposing the internal difference externally [06:43] Are states going to go to a separate directory than jobs according to the current plan? [06:43] Well, if jobs are internally states... [06:43] Then they belong in the same dir. [06:43] and if the difference isn't exposed, how do we avoid the bloat of every state carrying the entire Job structure with it? [06:43] Are we talking about /etc/state.d? :0 [06:43] /etc/init [06:43] /etc/init/{job.d,state.d}? [06:44] Keybuk: Well, you have a set of states, and a set of jobs. [06:44] Jobs depend on states. There is no external visibility of a job, except for the various running executables. [06:44] jobs show up in initctl list [06:44] initctl stop foo actually means "set state `foo` to false", which internally results in the job structure going through the lifecycle for termination. [06:44] should states? [06:44] Perhaps. [06:45] Im in favour of putting jobs and states to separate directories. When there are going to be a lot of files in the directories, it will be helpful. [06:45] ion_: then you have namespace collision issues [06:45] I'd not put them in seperate directories because it will intrduce some confusion. If a job is a state... then a job can depend on another job. [06:46] But also on a state. [06:46] Because there is no difference. [06:46] So if a job depends on `foo`, go find foo. [06:46] there is a difference at the moment [06:46] states are instantiable by default [06:46] jobs aren't [06:47] Anyways, we've come full circle again. Back to jobs being exactly the same as states. There being no real internal difference except one has a structure for process lifecycle management. [06:47] which is where it becomes interesting [06:47] because if the lifecycle management can be separated, then we get to interesting ideas [06:48] for example, imagine you have a state for the existance of a particular file [06:48] tied into inotify maybe [06:48] ECOMPREHEND [06:48] jobs could be run while and for /etc/site/*/apache.conf [06:48] in other words, one copy of the job is run for each of the files that exists [06:49] That is interesting. [06:49] state--- Or, not really. [06:49] Is that one state or many states? heh [06:49] Well, it's one state... effected by many files. [06:49] affected [06:49] where each file defines a job structure hanging off the state. [06:50] When using the inotify/glob thing you just said, what is the value of the state at any time? [06:51] true for a given filename [06:51] But when does it become false? [06:51] states can have one false value, or one or more truths [06:51] when there are no trues [06:51] Well, inotify is an event that says the file was altered. There is no correspondening point in time that the file "is not altered". [06:52] not 100% sure about this bit yet [06:52] So it doesn't form a timeline of any sort. [06:52] with inotify, we know when the file exists, when it is deleted and when it is modified [06:52] so you put create on the left of the state [06:52] delete on the right of the state [06:52] and put modify on both sides, so the job is restarted [06:52] So while the file exists? [06:53] Ahh. [06:53] the part of upstart that supplies this notification would set the states on startup using stat() [06:53] And now you have a true/false state that actually means file existance, but it toggles when it's modified. [06:53] But you can't detect the toggle, you can just see the fall out from it. [06:53] right [06:53] you may be more discreet, and define a file-exists state, and a file-same state or something [06:53] so the job can choose [06:55] This could be implemented outside of upstart as well, in a way. [06:55] Which might be ... better? [06:55] Upstart only knows of the state. A seperate runnable daemon watches the files and toggles the states. [06:56] An extremely simple daemon. [07:00] right [07:00] my theory is that upstart would be able to supply the answer to a request of "what arguments to the file-created event are states expecting?" [07:01] so the daemon would register that it supplies that event [07:01] and upstart would respond with the arguments that it knows, and with new ones as jobs are created [07:01] so the daemon knows what to watch [07:10] this is all my next thing to tackle, anyway [07:10] now that the config code is better === phoenix24_ [i=hemvy@ns38066.ovh.net] has joined #upstart [08:20] porting the profile code to the new config stuff is harder than I would have liked ;) [08:20] Now, what do profiles do? [08:20] I think I missed that conversation. [08:21] http://upstart.ubuntu.com/wiki/Profiles [08:21] (this code isn't in main, it's in my branch) [08:21] ahh === phoenix24 [i=pgsgrhq@ns38066.ovh.net] has joined #upstart [08:23] Seems simple, reasonable, and sane. [08:23] cool [08:23] ie not a huge subsystem, just a simple filter of state names. [08:23] at uds-mtv we talked about some sort of flag thing which you could set, and instead of using profiles (something outside of the job) to determine whether the job would run, the job itself would check the flag. === wasabi shrugs. [08:24] yeah [08:24] Which could interestingly enough be done with states. [08:24] i didn't particularly like that idea for some reason which i can't remember [08:24] where states were settable from the loader prompt. [08:28] bbl === Md [i=md@freenode/staff/md] has joined #upstart === Keybuk [n=scott@quest.netsplit.com] has joined #upstart [09:21] an interesting thought has occurred [09:21] the state in which the apache job can be run [09:21] is not the same as [09:23] the state in which the apache job is running === juergbi [n=juerg@80-219-16-162.dclient.hispeed.ch] has joined #upstart [09:44] Hmm. I barely understand that. === Amaranth [n=travis@ubuntu/member/Amaranth] has joined #upstart === cortana [n=sam@62-31-146-25.cable.ubr12.azte.blueyonder.co.uk] has joined #upstart