=== rangerpb is now known as rangerpbzzzz [10:47] thanks rharper for your mail [10:47] rharper: I was creating wrappers accordingly to show our user-stories [10:47] so far working fine [10:48] I also made some time analysis with cloud-init-analyze on case 2 I wanted to discuss with you [10:48] Also I'm not sure if we want/need to "show" user story #4, I'd more expect that to be the story that leads over to the discussion on how to stage it into Xenial [10:52] rharper: now lunch, then creating case 3 but that is essentially 2 on Openstack which should be fine [10:52] rharper: please ping me once you are around for the discussion on the #2 timings [11:24] smoser: you would have enjoyed this lunch http://johorkaki.blogspot.com/2015/10/samyang-extremely-spicy-chicken-flavor.html === blaisebool is now known as Guest58715 [12:21] rharper: comparing times on the Openstack based execution is just as non-helpful [12:22] rharper: I need to discuss if/where I should see something stable enough for the Demo other than the huge ec2 conf timeout [12:22] that one I have and I also like how we can show of the "snap config and install via cloud-init" [12:23] but everything I threw at cloud-init analyze so far didn't give me a good showcase for the timing at least [12:23] on top it seems that the older ci enabled image has not yet the improvements on the output for cloud-init analyze to be able to differ the stages more easily [12:23] rharper: ping me, I can send you data or let you to my bastion whatever we need [12:24] other than that the tech part is ready creating a few raw slides to guide along [13:31] rharper: smoser: jgrimm: I've made a draft for the ds-identify show and fully automated the demo part of it based on ryans work [13:31] rharper: thanks [13:31] I'll share the draft slide deck so you can review/modify [13:32] if nothing is fatally broken jgrimm and I will adapt depending what comes up on Monday [13:32] shared [13:33] rharper: the only blind spot that is left is my lack of a good case to show the timing - waiting for you to show up [13:33] maybe I even have all I need but just don't find it impressive enough :-) [13:46] cpaelzer: not quite in yet, but I think this should capture the delta we need (the first being , no searching, then latter how cloud-init on OpenStack looks without identify) http://paste.ubuntu.com/23746042/ [14:03] rharper: forgot to reply - I think I see what you mean - gathering that on my dmeo env now and seeing it it shows something nice [15:14] hrm [15:15] rharper: smoser: I found every now and then (about 1/8 of the cases) that the final stage hangs [15:15] thought it is part of my ssh setup, but now got in and I find it hanging [15:15] oh I see [15:15] not "us" but snap it seems [15:16] cpaelzer: ok, here now [15:16] yes [15:16] if you use my user-data [15:16] it does snap installs [15:16] which may take some time [15:16] which I want for some of the cases [15:16] but, what I mean is that this sometimes hangs [15:16] like really for minutes [15:16] snaps do that [15:17] how unfriendly [15:17] download, verify, unpack, squash mount, etc [15:17] is there anything cloud-init should do to unlock that? [15:17] the process is sleepng [15:17] apt isn't fast either [15:17] I really think it is dead some way [15:18] possible, in general, we've discussed the idea of doing some config modules in parallel but it's a non-trivial problem if one has dependencies between them [15:18] http://paste.ubuntu.com/23746560/ [15:18] what I want would be a timeout and retry [15:18] this is now crossing the 10 minute mark - it won't ever succeed [15:18] (likely) [15:18] no [15:18] see the mount [15:18] systemctl start snap-part\x2dcython-3.mount [15:18] that's not us [15:19] the x2dcython-3.mount one [15:19] right [15:19] ok let me rephrase [15:19] I understand what you suggest; [15:19] I think it is a snap issue, but should the snap module of cloud-init take care to detect and recover in those cases [15:19] config modules can certainly have timeouts [15:19] oh ok [15:19] sorry for feeling misunderstood then [15:19] we do in other places (like searching for data sources) [15:19] no [15:19] no worries [15:22] but when scripts start a bunch of them 1/8 hits often enough to feel bad :-) [15:24] yeah, maybe use different snaps [15:27] one instead of three should already help to mitigat most I hope [15:27] yeah [15:27] hello is going to be fast and easy [15:27] it's the #1 snap in the store =) [15:41] another issue seems to be that if I check /var/lib/cloud/instance/boot-finished too early that seems to conflict with the final stage (re)starting ssh - that makes the final stage hang as well it seems [15:41] which still is part of config snappy I jsut see [15:41] |`->running config-snappy with frequency once-per-instance @02.00000s +912.00000s [15:41] that is in the analyze after I hard restarted the ssh on the UC16 openstack guest [15:42] strange [15:43] that is in the case without user-data even [15:44] not following final stage restarting ssh ? [15:44] the cloud-analyze output is linear as well right? [15:44] sure, it's just sorted by event timestamp [15:45] rharper: http://paste.ubuntu.com/23746681/ line 212 [15:45] the pastebin has three cases with-data, no-data-ds-identify, no-data-old [15:46] hrm [15:46] I'm not seeing that huge time [15:46] which image? recent or the old one ? [15:47] old one [15:47] you think it might just be an old issue? [15:47] possibly [15:47] it certainly has older packages [15:47] which may get refreshed updated [15:47] which could block time [15:47] let me upload a newer image without the updated cloud-init [15:47] so it's more apples to apples w.r.t boot time [15:52] no hurry ping me or write mail once you rebuilt one - so I can exchange the image I use [15:54] thanks for the "policy can be overridden" statement - missed to add that [15:57] rharper: FYI - I have all the timing data if discussion comes up, but I think there is nothing with enough "bang" in it to get a slide [15:57] I can pull them out whenever needed - or not if not [15:57] too much data without the need could lead to deep dives on unimportant numbers [15:57] well, I think a 10x factor in time to run cloud-init-local is nice enough [15:58] the biggest win will be in the ec2 image (we don't have those times) simply because ec2 runs *last* [15:58] of course [15:58] just at least on openstack old and new image both are rather fast [15:59] or my timing granularity is bad [15:59] it's basically 00.0000 for init-local and init-network on OLD [15:59] well, you need to use the journalctl method to get subsecond resolution [15:59] I had both [15:59] ah year [15:59] it is so small [15:59] fine [16:00] I suggest trying again with the journalctl -o short,precise -u cloud-init-local | cloudinit-analyze show -i - [16:00] do I miss something that init-network is not in the journal methof? [16:00] method [16:00] I already have the journal data rharper [16:00] I just did one unit [16:00] 00.56115 -> 00.08126 [16:00] you can string them all [16:00] in total time it's small [16:00] I'll do so [16:01] but it's a rather huge reduction [16:01] -u cloud-init-local -u cloud-init -u cloud-config -u cloud-final [16:01] are the unit names to append to the journalctl command [16:01] now that I spent the automation effort for all showcases it is a minor change to get it with that on top :-) [16:01] that'll get you all stages [16:01] k [16:02] for the images using config-drive, the change will be small, ones using Openstack datasource, will be larger win as it probes local + other clouds then OpenStack datasource [16:03] so the further from the "head" of the list, the greater the win w.r.t time reduction; and you're right, it's not huge in terms of wallclock time [16:03] but the improvement scales with the speed of the system [16:04] ok, then I'm good [16:04] a follow up is that the POC only prevents cloud-init from probing other sources [16:04] it means I rad the data correctly and it isn't impressive until underlining the sweet spots :-) [16:04] we will further have cloud-init skip the specific datasource probe as well [16:04] yes, that's true [16:05] but - as we said when you showed cloud-init analyze - no matter if big or small, having the data is the important point for the discussions [16:05] yep [17:47] magicalChicken: https://paste.ubuntu.com/23747530/ [17:54] powersj: needs --upgrad [17:55] I had meant to do that yesterday, I;'ll get that done now [17:55] I added a config flag to automatically do --upgrade on linuxcontainers.org images [17:55] since they don't ship with cloud-init [17:55] ok [17:55] let me know when I can pull again (no rush) and I can continue playing with it :) [17:56] powersj: sure, should have that fixed in just a bit [18:23] o/ [18:25] before I get started again - a git question. How do I update 'my' copy - aka aixtools (aixtools ssh://aixtools@git.launchpad.net/~aixtools/cloud-init (push)) - from origin (https://git.launchpad.net/cloud-init (fetch)) [18:27] aixtools: do you have changes in your branch relative to what you currently have from cloud-init origin? [18:27] aixtools: you can add the origin as a remote and pull [18:28] and then you'd rebase your branch(es) onto the updated origin/master, normally [18:28] it depends on your workflow, and what, if any changes, you have locally [18:35] i have a seperate branch I am working (an AIX port I hope); I have 'aixtools' that is my clone of master. [18:36] what I would like to do is: 1) get my launchpad "clone" up to date; 2) use that to update (i think fetch?) my 'local' copy of 'master'; [18:36] aixtools: sorry, can you pastebin the output of `git remote; git branch; git branch -r` ? [18:36] finally,. 3: [18:37] update/merge the changes of the current status into my 'changes' for aix-port [18:37] moment [18:43] http://pastebin.com/8wfzSqTq [18:43] aixtools: ok, so this is how *I* would do it, you can choose to take/leave what you want :) [18:43] i am a noob - I shall live and learn :) [18:43] aixtools: what i would first do (for your own sanity/helpfullness) is make sure that your ~/.gitconfig has: [18:44] [log] [18:44] decorate = short [18:44] that will put in `git log` output, things like tags and head names if they are in the history [18:44] if you do that, in your current branch (aix-port), `git log` should indicate that an ancestor of HEAD is origin/master (I'm guessing, presuming you have not fetched yet) [18:45] so the way to update your tree would be then: [18:45] 1) git fetch origin [18:45] 2) git rebase -i origin/master [18:46] iiuc, (it does depend on how many commits you have relative to the old upstream you were using), this will present you with an $EDITOR window, which allows you to specify how to treat the commits that are to be carreid forward [18:46] did that also update my copy i.e., ssh://aixtools@git.launchpad.net/~aixtools/cloud-init , or is that an update of my local disks? [18:47] (step 1, that is) [18:47] just local [18:47] and step 2 - starting now... [18:47] all step 1) did was to fetch from the 'origin' remote any branches and commits [18:48] so then 3) would be `git push aixtools aix-port` (presuming your remote branch is also called aix-port, if it's master there, you would say aix-port:master) [18:48] step 2 - what is that 'trying' to do. looks like my changes coming into 'master' which I do not want. [18:49] aixtools: look at the pictures in `man git-rebase` [18:49] i want to move 'master' into aix-port and.or see conflicts, so I can review them [18:49] basically you have a topic branch (aix-port) [18:49] oh [18:49] wait, what? [18:50] maybe my thought process is 'wrong', [18:50] yes - topic branch aka aix-port [18:50] so git is just storing a DAG, right? [18:50] directed acyclic graph [18:50] DAG was shorter, still do not know the fancy verb [18:51] adjective i should say [18:51] ok, let's gloss it for now [18:51] hence, i wanted my 'local' master to be equal to the project master. [18:51] if you look at the first example in `man git-rebase` (around line 68) [18:51] aixtools: right, but you don't *need* that at all for git [18:52] git is turning into the new trick this old dog cannot learn [18:53] i read somewhere git is keeping three copies: the 'master', a local copy of the master, and then the local changes [18:53] aixtools: i mean, yes your repository's master branch can track origin/master, but you also already have origin/master :) [18:54] i feel like this would be way faster to explain on the phone :) [18:54] but let me keep going [18:54] my thought is to keep my topic-branch as close to master as I can. [18:54] yep [18:54] that's smart [18:54] you do that with regular rebases [18:55] ok, so all i have done now is step 1, step 2 I aborted. [18:55] aixtools: do you have hangouts? i can explain this quickly if you have the time? [18:55] for the rebase - would I go back to my branch and then execute 'rebase' [18:56] (imho - git has a lot of features - I wiull someday see the benefit(s) - but for now they just confuse. [18:56] right, i didn't realize you had left your branch [18:56] as your output before was that you were on the aix-port branch [18:56] and `git fetch` doesn't move your checked out state at all [18:57] well, I did git checkout master before starting irc [18:57] ah [18:57] don't do that :) [18:57] wasn't in my list of steps :) [18:57] no, it was in someone elses list (long-live google) [18:58] seemed to be the way to prepare for a merge [18:58] which is what I thought I needed to do [18:58] so, here's my opinion [18:58] you have no need for a local master branch [18:58] nods [18:58] it is of no use to you, as you're always doing topic branches forked from upstream's master [18:58] so let's just ignore the local master :) [18:58] :) [18:59] you can delete it, but git will complain sometimes, so it's easier to leave it around, but ignore it [18:59] so we're going to only work on your aix-port branch [18:59] (git checkout aix-port) [18:59] we're going to run the rebase step here, which is basically telling git (long typing to follow) [18:59] git rebase -i origin/master [18:59] (implicilty the commit to rebase is HEAD) [19:00] I was based off something in the history of origin/master [19:00] but now origin/master has moved on without me [19:00] I want git to 'remember' all the stuff that i've done from that historical fork-point (called the merge-base) and save it [19:00] then I want to fast-forward the branch I'm on (which HEAD is on) to the updated state of origin/master [19:01] and then I want to replay the 'remembered' stuff, as new commits [19:01] so, just save the file with all the 'picks' in it. [19:01] aixtools: in your case, yeah [19:01] aixtools: as you don't want to drop anything [19:01] not yet :) [19:01] Successfully rebased and updated refs/heads/aix-port. [19:02] aixtools: you can, in the future, probably, drop the -i. I like to always see what git-rebase is going to do, but your case should be a quick rebase each time, esp. if you do it often [19:02] On branch aix-port [19:02] Your branch and 'aixtools/aix-port' have diverged, [19:02] and have 9 and 10 different commits each, respectively. [19:02] right [19:02] so, rather than the pull suggested, I would do a push? [19:03] to put the local copy on the server? [19:03] yep, i said that earlier, `git push aixtools aix-port` [19:03] now, if i had to guess, that will complain saying it's not a fast forward [19:03] forgot that... my apologies [19:03] np! [19:05] few lines too many... http://pastebin.com/289U3mCR [19:05] non-fast-forward is the important bit [19:05] so the reasoning here is [19:05] imagine someone was using your branch as the basis for their work [19:06] they, just like you did, want to be able to do a `git fetch origin` (excpet their origin is your repository) [19:06] and have it make sense and be a linear history [19:06] but you just 'moved' your history by rebasing it [19:07] so for your topic branch, it's relatively likely you'll need to tell `git-push` the '-f' flag (to force), *if* you know your local branch is correct [19:07] so, just add -f [19:07] after verifying you want your local aix-port branch to be what is ont he server [19:08] (by looking at git-log, diffing against origin/master, etc) [19:08] well, as I am working solo - it is either a mess and I get to start over again, or it is okay. [19:09] I'll vote (read hope) for the latter. [19:09] ack, it's not a big deal for topic branches that are one-offs [19:09] Total 56 (delta 40), reused 0 (delta 0) [19:09] To ssh://git.launchpad.net/~aixtools/cloud-init [19:09] + f1dee34...be633b8 aix-port -> aix-port (forced update) [19:09] it's a bigger deal for origins, masters, etc [19:10] So, maybe even a good way to learn the ropes. [19:10] note that until you do a MR, there's not really even a reason to push [19:10] except if you develop in multiple places, or if you are worried about your system dying [19:10] there's also not a reason *not* to push, admittedly [19:10] thanks very much - the boss (wife) called. time to go... [19:10] well, I am also trying to learn git. [19:11] later, or tomorrow. thx. [19:11] aixtools: np! i'll be around [19:11] afk === rangerpbzzzz is now known as rangerpb [19:33] powersj: I got the setup_overrides working so lxd images can force --upgrade even if it isn't specified [19:33] powersj: so 'run -n xenial' should work now without upgrade [19:42] magicalChicken: tests appear to be running, thank you! [19:46] magicalChicken: I just got the too many open files on my laptop [19:46] powersj: what image was it on? [19:47] python3 -m tests.cloud_tests run -v -n xenial [19:47] didn't think you mentioned that with the ubuntu image [19:47] powersj: i've never seen it with xenial [19:48] https://paste.ubuntu.com/23748172/ [19:48] i only get it on centos 7/6.6, debian wheezy, and ubuntu precise [19:49] powersj: you're getting it in a different place too [19:49] i've always seen the stacktrace come from inside pylxd [19:50] not sure what's going on yet, must be resources leaking somehow [19:51] something is leaking file descriptors, likely the exec bits in pylxd ? [19:51] yeah, it must be something like that [19:51] this only started after the switch to pylxd 2.2 i think [19:52] i'm going to make a branch with centos/debian support but using pylxd 2.1, see if it happens there [19:52] https://bugs.launchpad.net/juju/+bug/1602192 [19:53] Apparently the relevant limit is /proc/sys/fs/inotify/max_user_instances. This is "128" by default. When increasing it with [19:53] sudo sysctl fs.inotify.max_user_instances=256 [19:53] looks to be the tl;dr [19:54] https://github.com/lxc/lxd/blob/master/doc/production-setup.md [19:54] that looks helpful here [19:54] that definitely looks related [19:55] from the last comment on there, it sounds like the init system being used by the container has an effect on whether or not the bug occurs [19:55] which could explain why i'm only seeing it with some distros [19:56] for sure [19:56] systemd new enough spawns cgroups per unit [19:56] including inotify watches on each one [19:56] aah [19:56] that make sense [19:56] so, pre-systemd like 2XX or something like that isn't affected [19:57] I'm hesitant about just bumping the limit though [19:57] because it may still hit the limit eventually [19:57] but there's nothing to do about that other than wait [19:57] right? [19:57] Maybe minimizing the calls to execute() would help, as right now it polls for the system being up using execute() [19:57] it's a global limit [19:57] well, sounds like systemd execution in the guest is consuming them [19:58] not exec (I was thinking of leaking fd's , we say that like more than a year ago) [19:58] oh, yeah if its systemd itself that's using all the fd then there's not much to help [19:58] right, other than watching the global limit and raising it before running [19:58] we can raise it up, watch the count during a run [19:58] and see how close we get [19:59] and then in jenkins (at least) ensure we run with a limit high enough to handle things with some head room [19:59] yeah, that should work [19:59] hopefully there'll be either systemd or kernel changes to fix this eventually though [19:59] there's no fix [19:59] it's just a global resource that's being used [20:00] sorta like file descriptors; if you make that man opens, it has to be tracked; [20:00] magicalChicken: also https://github.com/lxc/pylxd/issues/209 [20:00] systemd is a heavy inotify user [20:00] and 211 from the link at the bottom [20:00] ha! [20:00] double whammy [20:00] yay :) [20:01] time to poke rockstar in #lxd [20:01] well, that one is fixable [20:01] haha yeah [20:01] temp fix in the test suite is to go back to polling inside the instance with a single call to execute() though [20:02] we could even test it [20:02] either way, for sure [20:02] funny enough, I changed to doing it in python right after the pylxd 2.2 switch [20:02] hehe [20:02] yeah, doing that swicth should show which bug broke us [20:02] or it could be both too :) [20:03] well, I suspect the exec fd is the big one, but it was hard to find without systemd consuming a bunch of stuff too [20:03] the systemd limit is still an issue without exec though too [20:03] because ideally I had wanted to get all of this going in parallel [20:03] yeah, but I think raising the limit on a jenkins instance is reasonable, and we can add that to testsuite docs [20:03] yeah =) [20:03] yeah, that makes sense [22:13] hi [22:13] anyone here? [22:13] or everyone hidleing? [22:13] idleing [22:14] best to just ask, folks will answer when they can [22:14] oh nice [22:14] just making sure there is ppl [22:14] i go other rooms and they are full yet is like there is nobody [22:14] happens here too [22:14] =) [22:14] oh [22:14] hehe [22:15] question [22:15] i am trying to expire root password after launch [22:15] is it possible to do it within the vm? [22:15] in cloud.cfg? [22:15] although this doesnt work: # System and/or distro specific settings # (not accessible to handlers/transforms) system_info: # This will affect which distro class gets used distro: ubuntu # Default user name + that default users groups (if added/used) default_user: name: root lock_passwd: false chpasswd: expire: True [22:15] let's look at the user config [22:15] something like that? [22:16] like the cloud.cfg? [22:17] http://cloudinit.readthedocs.io/en/latest/topics/modules.html#users-and-groups shows you can set expire to a date; [22:17] lemme look at the code to see what get's passed around [22:17] another option is to use the chage command as a run_cmd [22:18] sorry rharper am not familiar with terms [22:19] by user config you mean cloud.cfg on /etc/cloud? [22:19] there's a linux command called 'chage' [22:19] /etc/cloud is the default config, typically one passes in user-data into the stance in addition to the default [22:19] at least on debian/ubuntu, there's no root password set, so nothing to expire [22:19] problem is i would like to do it within the vm [22:20] instead of passing it throught user-data [22:20] i did set a password [22:20] and your image already has a password set for root, right [22:20] but upon launching i want it to expire so customers have to reset it [22:20] if you're authoring the image [22:21] then when you set the root password, you can use the chage command to expire it immediately [22:21] rather than doing it in cloud-init (which is just going to run the 'chage' command anyhow) [22:22] i tried both [22:22] passwd --expire root and chage -d 0 root [22:22] before powering off and runing cloud-init [22:22] but does't work [22:23] you want to look at the 'mount-image-callback' command; this let's you run commands inside the filesystem of the vm [22:23] it applies whatever password you set but upon login it is not expired [22:23] you can use chage --list root to see what got set [22:25] ok gonna try that agian [22:31] ummm [22:31] rharper it worked [22:32] but [22:32] i get this https://i.imgur.com/xPgaxcO.png [22:32] until i change the expired password i can't ssh [22:32] are you trying to ssh in as root ? [22:33] yes [22:34] i had to login throught the vm console and set the expired password [22:35] yeah; I've never set a root password or forced it to expire; I suspect there's something at play with sshd config with root [22:37] sorry I'm not more help here [22:39] thanks for trying [22:40] http://askubuntu.com/questions/427153/change-expired-password-via-ssh [22:40] maybe not; -1 disables expiration [22:56] quick question [22:56]