[07:00] <ara> good morning!
[11:53] <ara> schwuk: ping
[15:23] <cr3> hey folks, I would appreciate your feedback to improve the concept of attachments in Checkbox!
[15:24] <cr3> currently, attachments can be expressed either as a filename or a command. the problem is that a command makes for a horrible download filename
[15:25] <cr3> so, we need to find a way to still make it possible to express commands but also relate a name to that command somehow
[15:25] <fader> cr3: There are a couple of things that immediately come to mind...
[15:26] <fader> cr3: 1. Add an (optional?) parameter to attachments to provide a filename
[15:26] <cr3> fader: attachments are currently expressed this way in a test definition file: attachments: cat /foo
[15:26] <fader> cr3: 2. Generate the filename by some simple method, e.g. "$SYSTEM_ID-$DATE-$TESTNAME"
[15:27] <cr3> fader: also, a single test definition can define multiple attachments
[15:28] <cr3> aha! what if we had attachment definition files!?
[15:28] <fader> cr3: So could you say: attachments: filename cat /foo ?
[15:29] <cr3> fader: what if filename contains a space?
[15:29] <fader> cr3: "Don't do that" :P
[15:29] <fader> cr3: Seriously, didn't you just write some escaping code?
[15:30] <cr3> fader: currently, the values for definitions do not support the concept of a dictionary, ie key/value pairs such as filename/command
[15:31] <cr3> what if we introduce another type of definition, like: type: attachment; filename: foo; command; cat /foo
[15:31] <fader> Hey, I like that... it leaves flexibility for future expansion
[15:32] <cr3> fader: only type and filename would be required, command would be optional
[15:32] <cr3> furthermore, in order to link an attachment to one or more tests: tests: foo bar baz
[15:32] <fader> If you don't provide a command it just attaches the filename specified?
[15:32] <cr3> fader: right
[15:33] <fader> cr3: Hmm, I'm not sure if that's a good idea.  You have the same keyword performing two fairly different functions
[15:33] <fader> 1. grabbing a file, 2. specifying a filename for output of a command
[15:34] <cr3> fader: ok, so you'd have this then, where parts in square brackets are optional: type: attachment; name: foo; [filename: /tmp/foo;] [command: cat /tmp/foo;] [tests: foo bar baz;]
[15:34] <cr3> fader: filename and command would be mutually exclusive
[15:35] <fader> cr3: I like it.
[15:36] <cr3> I think I'm comfortable with that too. there is one caveat though: if tests are specified, this means that filename or command needs to be evaluated immediately when the test is finished, not at the end of running all tests
[15:37] <fader> Why is that?
[15:37] <cr3> so, it is possible that the same attachment definition will produce different attachment instances
[15:37] <cr3> fader: this was a problem raised by eeejay where mago overwrites the same log file for every test, or somesuch
[15:37] <fader> cr3: Oh.  That's ugly :(
[15:38] <cr3> fader: this is a valid concern though, an attachment should indeed attempt to capture the state of the system immediately after the test was run
[15:38] <fader> I won't argue with you... you're the one who has to code it ;)
[15:39] <cr3> fader: heh, it does indeed put additional burden on the coding, the result should be transparent to everyone else
[15:40] <fader> cr3: See, this is why everybody puts up with you working through weekends and national holidays.
[15:40] <fader> Wait, hang on, that doesn't make sense...
[15:48] <cr3> fader: I think I finally understood the purpose of national holidays: it's not so that I can have a holiday, it's so that other people can have a holiday from me.
[15:48] <fader> Hehe
[17:04] <fader> Is there a way to see what image an installed system was built from?  Something more specific than /etc/lsb-release?
[18:29] <cr3> eeejay: yo, got a minute to bounce ideas?
[19:42] <cr3> fader: what would think of changing the requires field in a test definition to explicit: packages and devices?
[19:43] <cr3> these would still take registry expressions so that I can do something like: packages: package.name == 'firefox' and int(package.version) > 2
[19:43] <fader> cr3: Only those two specifically?  What if I want to require e.g. processor scaling support to do a test?
[19:43] <fader> Or does that fall under 'devices'?
[19:44] <cr3> fader: is that information available in lshal or only in cpuinfo?
[19:44] <fader> cr3: Good question.  Let me check.
[19:45] <fader> Looks like it's in lshal:   info.capabilities = {'cpufreq_control'} (string list)
[19:45] <cr3> all the use cases I have so far relate to either packages and devices. so, that's why I'm thinking that from our experience it might make sense to be explicit about both
[19:45] <cr3> but requires is pretty powerful though, maybe I should keep it
[19:45] <fader> cr3: What about existence of a file?
[19:46] <cr3> the existence of a file is not necessarily contained in the registry
[19:47] <fader> cr3: Ah, so one could still express requirements for things that are not specifically contained within the registry then, right?
[19:47] <cr3> no, the other way around: one could not express requirements for things...
[19:48] <cr3> or, to avoid the double negative: one can only express requirements for things specifically contained within the registry
[19:48]  * fader tries to think of a use case where this would be a problem.
[19:48] <cr3> typically, when this has posed a problem, a new registry was created
[19:48] <fader> Heh, y'know, that's obviously the right way to handle it :)
[19:50] <fader> I'm just concerned that there might potentially be a case where you'd want some information that's not from the package manager or lshal et al.  But I can't think of a specific case.
[19:50] <cr3> coming back to my grid testing idea to take over the world, I just remembered that some job description languages support the concept of expressing requirements as a form of boolean query. so, I'm going to keep "requires"
[19:50] <cr3> fader: right, it's not because our current limited experience has not come up with a valid use case that there are none
[19:51] <fader> cr3: Yeah, that.
[19:51] <cr3> fader: however, I am tempted to remove "architectures" and "categories" which are basically just shorthands for requirements. those seem to just add noise to the test definition format
[19:51] <cr3> and I don't even recall architectures ever being used
[19:52] <fader> cr3: So you'd just formulate them like e.g. "requires: category=server"?
[19:52] <fader> I can imagine cases where architecture would be useful, especially with e.g. LPIA devices
[19:52] <cr3> fader: something that correlates to the registry in a boolean expression
[19:52] <cr3> so that would use "==" rather than "=" :)
[19:52] <fader> Hehe
[19:53] <cr3> dpkg.architecture == "i386"
[19:54] <cr3> there is a difference between the architecture of the system and the architecture of the packages installed on the system, dpkg.architecture is already provided in the registry to return the latter information
[19:54] <cr3> so for the rare times a test might care about the architecture, having to specify that boolean expression should not cause anyone carpel tunnel syndrome
[19:59] <fader> cr3: Cool, I like it.
[20:45] <cr3> fader: another question for you: do you think that timeout should be part of the test definition format or part of the command: timeout 10 some_command
[20:46] <fader> cr3: Hmm... I'm assuming it would be optional?
[20:49] <cr3> fader: it's optional, but does it belong as an attribute of the test definition or as the command itself
[20:49] <fader> cr3: Would Checkbox terminate the test if it runs over the timeout value?
[20:50] <cr3> fader: yes, but the timeout command could do the same
[20:50] <fader> cr3: It seems to make more sense to me at the test level.  If you have multiple commands that should each have a timeout, then you should either split them into separate tests or handle that inside your test itself IMO
[20:51] <cr3> the only potential reason to make it an attribute of the test definition is whether we care to formalize the difference between a failure because the test failed or because it timed out
[20:52] <cr3> fader: what do you mean by "handle that inside the test itself"?
[20:53] <fader> cr3: I mean that if the conditions of your test are complex enough that multiple different timeout values are required but you still can't split the test into multiple tests, you can't expect Checkbox to do it all for you :)
[20:53] <fader> cr3: Yeah, the difference between 'failed' and 'timed out' seems important to me
[20:54] <cr3> fader: when you say "the test", do you mean the command being called?
[20:54] <fader> cr3: Yes, the script that is being called
[20:54] <cr3> fader: timeout 10 the_script
[20:55] <cr3> fader: the "timeout" script can essentially do the same: first argument is the timeout in seconds, the rest is the command to run and it's arguments
[20:56] <fader> cr3: Right, either way works for simple cases.  It's only cases where you might want to say "this test will run 3 commands.  Let the first one run for 10s, the second for 30s, the third for 10s" that I am saying it should be handled inside the test script
[20:56] <fader> Which if the timeout is defined at the test definition level and is optional, everything is fine
[20:57] <cr3> fader: or that timeout command could actually relieve the script for that responsibility: timeout 10 first_one && timeout 30 second_one && timeout 10 third_one
[20:57] <cr3> but then, in that complex situation, you lose that granularity of 'failure' vs 'timed out'
[20:57] <fader> cr3: Good point.  But it doesn't seem to give the ability to track the difference between 'timed out' and 'failed'
[20:57] <fader> Heh
[20:58] <fader> It seems cleaner to me to do it at the test definition level, but that's just an aesthetic distinction
[20:59] <cr3> referring to my test-result-codes blog post, I'm starting to agree with the importance between 'failure' and 'timed out'. I think the latter might fall under the code UNRESOLVED or INCOMPLETE
[21:00] <cr3> I'm really glad I took the time to enumerate all those darn test result codes, good reference for myself :)
[21:00] <cr3> ok, I'm convinced, timeout stays
[21:01] <cr3> fader: for your example of three tests, they should be expressed with dependencies between each other so that if one times out, the others aren't run
[21:01] <fader> cr3: Ooh, slick
[21:01] <cr3> and they should use the timeout feature of the test definition to distinguish 'failure' from 'timed out'
[21:01] <fader> Man, you've thought about this stuff :)
[21:02] <cr3> I did some things right, but there's plenty I did wrong too :)
[21:02] <fader> And your blog posts are really in-depth.  It's good for me to read them... maybe I can learn something.
[21:02] <fader> :)
[21:03] <cr3> I've learned a lot myself in the process :)
[21:03] <cr3> yesterday, I was googling for some test related problem and my blog actually came up as the third result :)
[21:06] <fader> Hehe
[21:07] <fader> You'll end up writing a book.  Just wait.
[21:09] <cr3> fader: I already have a few people lined up for a book about my little humiliating stories :)
[21:09] <fader> cr3: Not *quite* what I had in mind, but that works too.
[21:09] <fader> You need to start including those on the blog as well!
[21:10] <cr3> fader: maybe I can combine both: Testing in underwear
[21:10] <fader> You know, you can tag the entries so only the testing related ones get syndicated on planet.u.c... :)
[21:11] <cr3> fader: I've made it a point that my blog and micro-blog posts will be strictly testing related, I didn't want to fall into inanities like where I'm scratching myself right now
[21:12] <cr3> fader: besides, you already know where I'm scratching most of the time...
[21:13] <fader> cr3: You don't know how often I lay awake at night wondering what you had for breakfast or how often you went to the bathroom today.  Enquiring twitterers want to know!
[21:14] <cr3> I think that crosses the fine line between twitterers and just twits
[21:23] <fader> cr3: I know we've been over this but I've confused myself.  Checkbox executes tests as root whether run interactively (e.g. checkbox-gtk) or not (e.g. kicked off after boot from certify-web) right?
[21:26] <fader> (I ask as some of the security qa-regression-test tests refuse to run as root and explicitly call sudo in the scripts, which will have to change if everything is run as root)
[21:26] <cr3> fader: checkbox-(gtk|cli) runs tests as the user unless overridden by a specific user in the test definition. checkbox-(compatibility|certification)* runs everything as root
[21:26] <cr3> fader: calling sudo in scripts run as the normal user won't work either
[21:27] <cr3> fader: if the script prompts, we're screwed. I need to create a bug to disable all interactivity possibly assumed by scripts
[21:27] <fader> cr3: It seems like the best way to handle this is to remove any prompts in the script, which means we may end up maintaining our own version of some tests :(
[21:27] <cr3> fader: or maybe they weren't written well in the first place...
[21:28] <fader> cr3: Heh, I'll let you have that fight with kees :)
[21:29] <cr3> tests simply shoved in a directory are probably not written properly in the first place. if the same tests were written within any test suite like checkbox, subunit, or whatever, these problems would've been caught early on
[21:29] <cr3> tests in a directory are scripts, not tests
[21:30] <cr3> it's not really a fight that I'm looking for, it's just that the sru team was probably under pressure to just get something done which implies some eventual migration process
[21:30] <cr3> if we can help with this migration process, I'm sure it will be much appreciated
[21:34] <fader> cr3: Right, I'm trying to migrate some of those myself and wanted to make sure I was on the right track
[21:35] <fader> It'll just be a question of if we can make the tests in checkbox the authoritative source for those tests
[21:39] <cr3> fader: it doesn't have to be though, any test suite could be integrated. however, checkbox is one of the rare ones which supports interactive testing in addition to automated testing, so that might be the deciding factor
[21:39] <cr3> the only problem is that the author is a real pain to deal with
[21:40] <fader> cr3: The author of checkbox you mean? :)
[21:40] <cr3> yeah, avoid him if possible
[21:40] <fader> He's not so bad... you just have to rough him up a little.
[21:41] <fader> But anyway, it seems like anything that refuses to run as root but explicitly calls sudo will need to be modified.  It's just a question of whether we can push those modifications upstream or if we have to maintain them
[21:42] <fader> And/or make our version the authoritative version and accept patches for other changes
[21:43] <sbeattie> fader: I don't have a problem merging them with upstream.
[21:43] <sbeattie> fader: tricky bit will be the tests for sudo itself.
[21:43] <fader> sbeattie: Yeah, there are some that are obviously ill-suited for this treatment.  I'm taking baby steps right now :)
[21:43] <sbeattie> fader: excellent. How can I help?
[21:44] <sbeattie> do you have work-in-progress stuffed anywhere?
[21:44] <fader> I'm also starting right now by purposefully ignoring any destructive tests (those which overwrite config files, etc.)
[21:45] <fader> sbeattie: Just my laptop right now.  I don't have anything usable yet, just my prototype bits that suck and need thrown out.  I'll try to have something to point you at next week though.
[21:46] <fader> = was aiming for this week but that didn't happen :(
[21:46] <fader> s/=/I/
[21:46] <sbeattie> fader: alright, but I'm keenly interested in getting this going, so I don't mind looking at junk that needs to be thrown out.
[21:48] <sbeattie> (I briefly started on it at one point, but operator error prevented me from getting my tests noticed by checkbox)
[21:58] <cr3> interactive tests as for sudo perhaps could be wrapped in an (py)expect script to be fully automated
[21:58] <cr3> or, they could remain a manual test by asking the user to perform a series of steps
[22:00] <fader> cr3: I'd rather keep them automated and just remove the requirement for sudo for the ones that will work when run as root
[22:01] <fader> sbeattie: Maybe we can come to an arrangement where you can beat on the test scripts and I can work on the suite definitions and make everything run from Checkbox :)
[22:01] <fader> (Or more likely annoy cr3 until he tells me what I'm doing wrong, but whatever.)
[22:32] <fader> sbeattie: I cleaned up what I've been playing with a bit so you can at least see what I'm doing, but it will still make the baby cr3 cry so don't show it to him
[22:32] <fader> sbeattie: https://code.launchpad.net/~fader/checkbox-certification/security-tests
[22:32] <fader> This is just the glibc test; I've also poked a bit at the gcc test but I'd rather hold onto it for a day or two
[22:33] <fader> (As it's in worse shape and doesn't run right at all)
[22:34] <fader> NB that you'll need build-essential installed to run the test.  That's not in a 'requires' field for the suite yet.  (Another reason not to tell cr3)
[22:54] <sbeattie> fader: cool, thanks.
[22:55] <fader> sbeattie: I'll be interested to hear if it works for you.  :)
[22:56] <fader> You can look at ~/.checkbox/submission.xml when it prompts you to enter a secure ID, which you probably don't have
[22:56] <fader> A quick and dirty way to see if the glibc script ran is to look for /tmp/glibc-security, which I haven't bothered to clean up at the end of the script yet :(
[23:00] <fader> Okay, time to find food and be interactive in meatspace for a bit... sniff you jerks later :)