/srv/irclogs.ubuntu.com/2012/04/13/#ubuntu-classroom.txt

=== delcoyot1 is now known as delcoyote
=== raju is now known as ubucop
=== bulldog98_ is now known as bulldog98
Liliantest15:52
bdfhjktest passed15:54
khler_marcuswhen does the workshop begin16:19
bdfhjkin 40 minutes16:19
khler_marcusoh, ok16:20
khler_marcusthanks16:20
=== ChanServ changed the topic of #ubuntu-classroom to: Welcome to the Ubuntu Classroom - https://wiki.ubuntu.com/Classroom || Support in #ubuntu || Upcoming Schedule: http://is.gd/8rtIi || Questions in #ubuntu-classroom-chat || Current Session: Ubuntu Algorithm Class - Instructors: bdfhjk
ClassBotLogs for this session will be available at http://irclogs.ubuntu.com/2012/04/13/%23ubuntu-classroom.html following the conclusion of the session.16:59
bdfhjkWelcome in the first Ubuntu Algorithm Classes!16:59
bdfhjkPlease don’t be afraid to ask questions.17:00
bdfhjkif something isn't clear17:00
bdfhjkI like active participants and questions are the important part of classes.17:00
bdfhjkClasses will be splitted into 3 topics.17:01
bdfhjkDuring first 20 minutes, I will talk about binary search, one of the basic algorithms.17:01
bdfhjkNext 20 minutes we will spend on learning hashing methods, a more advanced topic.17:01
bdfhjkThen, we are changing topic to introduction to artificial intelligence and usage of FANN library in practise17:02
bdfhjkLet’s start with simple example.17:02
bdfhjkYour friend has chosen randomly a number between 0 and 100. You have to guess it. If you tell him a number and this number isn’t same as his one, he tells you if the number you tried is greater or lower than his one17:03
bdfhjkHow many questions do You have to ask to guess your friend’s number in the worst case?17:03
khler_marcus5017:03
Lilianno17:03
bdfhjkthat isn't correct17:03
obounaimlog 10017:03
aANKOo100 ?17:03
Lilianlog10017:03
Lerhond100, if you're an idiot17:03
bdfhjkThe correct answer is 7. Each correct, smart question will reduce the set of possible answers to a half.17:04
bdfhjkso, In general case, we need only log2K  questions to guess17:04
Lilian100* log 10017:04
khler_marcussuch an small number? how could this be17:04
bdfhjkwhere k is the upperbound limit to number17:04
bdfhjkThis is an idea of binary search17:04
bdfhjkWe can use it to solve each problem, where it is easy to test, if correct answer is below tested value or above it.17:05
bdfhjkEvery time we ask about state of the element in the middle17:05
bdfhjkDepending on in which half the correct value is, we call this algorithm recursively in suitable range of initial set17:05
bdfhjkHow it work?17:06
bdfhjkBelow I’m presenting algorithm17:06
bdfhjkint p := 0 (number of first element)17:06
bdfhjkint k := n-1 (number of last element)17:06
bdfhjkwhile(p<k) {17:06
bdfhjk    int sr = (p+k)/2;17:06
bdfhjk    if (try(sr) = success) p := sr;17:06
bdfhjk        else k := sr-1;17:07
bdfhjk    if (k-p = 1 && try(p+1) = success) p++;17:07
bdfhjk    else k--;17:07
bdfhjk}17:07
bdfhjkPlease pay attention to line17:07
bdfhjkif (k-p = 1 && try(p+1) = success) p++; else k--;17:07
bdfhjkThis fragment prevents the infinite loop17:07
bdfhjkIt is clear now, how it work?17:08
bdfhjkOn notes, we have also a another algorithm17:08
bdfhjkShorter, but may be more difficult fo understand17:09
bdfhjkwhere we can forget about infinite loop17:09
bdfhjkThis algorithm work, when value is an integer17:10
bdfhjkso what if this is a float number?17:10
Lilianharder to find17:10
bdfhjkIf You plan to use binary search to find a float variable17:10
bdfhjkchoose an EPS value, a very small number (typical 0,000001) and break loop when k-p < EPS17:11
khler_marcusso sometimes it is impossible?17:11
bdfhjkit depend, if we can ask questions17:11
bdfhjk"One the which side of number x is answer"17:12
bdfhjkif we can ask this type of questions17:12
bdfhjkthen it is always possible17:12
bdfhjkif answer is a float17:12
bdfhjkthen we can reduce set of possible answer17:12
bdfhjkfor example17:12
bdfhjk0-117:13
bdfhjk0 - 0.517:13
bdfhjk0 - 0.2517:13
bdfhjk....17:13
bdfhjk0.1345 - 0.134617:13
bdfhjkand then we know17:13
bdfhjkthat the correct answer is about 0.134517:13
bdfhjkI hope that is clear enought17:14
bdfhjkit is a simple algorithm17:14
bdfhjkbut need some time to understand how it work17:14
bdfhjkIn the notes, I wrote about some practical usages of binary search17:14
bdfhjkOften on job interviews for programming, one of questions is to write binary search pseudocode17:14
bdfhjkand one of the common errors is infinite loop17:15
bdfhjkI saw, that we have questions on discussion room17:15
bdfhjkI wrote code as pseudocode17:15
bdfhjkit is used only in books or lecutres17:15
bdfhjkto present idea independent of language17:16
ClassBotjsjgruber-l84-p asked: Where will we find the ntes?17:16
bdfhjkhttps://docs.google.com/document/d/1CHU5TgXegp6IkRQhx7ok9HFNMPVONiWojbkIgfdFJrQ/edit17:16
bdfhjkhere17:16
ClassBotraju asked: where we can use this / area of usage ? I am pretty new to this17:16
bdfhjkPlease look at notes17:17
bdfhjkWe can use this in finding value in sorted set17:17
bdfhjkbut not only17:18
bdfhjkfor example I often use binary search method in debugging17:18
bdfhjktrying to find where I have a bug17:18
bdfhjkI put assertsa17:18
bdfhjk'asserts17:18
bdfhjkalways in the middle of code, where I may find my bug17:19
ClassBotkhler_marcus asked: Works this with characters too?17:19
bdfhjkYes, it work for integers, floats, chars17:20
bdfhjkand any datatype17:20
bdfhjkwith '>' operator - which we can compare, that the first is below/under the second17:21
bdfhjkI think that is all about binary search17:21
bdfhjkDo You have any questions?17:21
bdfhjkI don't see17:22
bdfhjkso17:22
bdfhjkThe second topic is hashing methods17:22
bdfhjkIn hashing, we are going to reduce amount of data (something like compressing it), in a determinist way17:22
bdfhjkWe assume, that each equal part of data always have equal hash17:22
bdfhjkThere is no return way. We can’t revert hashing.17:23
bdfhjkWe lose our initial data, but we gain something more useful in some cases.17:23
bdfhjkA hash is usually a number like 123143224117:23
bdfhjkThe most important thing is, that we can compare hashes and if they are equal - we know, that corresponding data is equal.17:23
bdfhjkThere is a small issue, which I describe later17:24
bdfhjkbut in the most cases we can assume that17:24
bdfhjkWe may use hashing in many areas, but today we limit it to hashing words / sequence of chars . Also we may store different information in hashes.17:24
bdfhjkI will present hashes, which store information about letters and their position in word - so we will be able to compare words17:25
ClassBotraju asked: if we got enough time please explain more about bug finding by using this .17:25
bdfhjkI will answer this question after classes17:25
bdfhjkso please stay online here17:25
bdfhjk" QUESTION : " We lose our initial data, but we gain something more useful in some cases" If we are losting the data means its loss only na , please explain more ,"17:26
bdfhjkTo make it more visual17:26
bdfhjkthink about beautiful landscape17:27
bdfhjkYou can make a photo17:27
bdfhjkone or two17:27
bdfhjkthen look at both17:27
bdfhjkand say - that is the same place17:27
bdfhjkbut in photo17:28
bdfhjkyou will lose the real beauty, that can not be captured17:28
bdfhjkbut your photo is smaller than landscape17:29
bdfhjkand You are able to put it in the pocket17:29
bdfhjkor share with friends17:29
bdfhjkso You losed some 'data'17:29
bdfhjkbut gained some capabilities17:29
bdfhjkWe have a few possible ways to implement hashes. I will teach you about one of the most popular storing information about position of chars and their type.17:30
bdfhjkWords are combinations of chars, each char is represented by a small number (ASCII code).17:30
bdfhjkOur hash is defined as s[i] + s[i+1]*p^1 + s[i+2]*p^2 + ... + s[j]*p^(j-i)17:31
bdfhjkwhere s is the array which contain word17:31
bdfhjkand every element in it is a char17:31
bdfhjknoted as number in ASCII code17:31
bdfhjkp is a small prime number17:31
bdfhjkq is a big prime number17:31
bdfhjkIt is quite complicated, why they should be a prime number17:32
bdfhjkI encourage you to think about it, but now I’m only saying, it reduces the probability of collision.17:32
bdfhjkWhat is the collision?17:32
bdfhjkSometimes, two different parts of data can be represented by the same hash, and we may think, that parts are the same17:33
bdfhjkIs said, that we define hash as17:33
bdfhjks[i] + s[i+1]*p^1 + s[i+2]*p^2 + ... + s[j]*p^(j-i)17:33
bdfhjkbut this may be a very large number17:33
bdfhjkso we will store and calculate it modulo q17:34
rajuooooooooooooyyyyyyyyyyyyyyyyyy17:34
bdfhjkq is a large prime number17:34
rajubdfhjk:  sorry for mis type17:35
bdfhjkThe greater q, the more potential hashes we can use17:35
bdfhjkbut also we need more memory to store it17:36
bdfhjkI ussualy use q = 10^9+3317:36
bdfhjkwe have only 1/q chance for that (if we choose q = 10^9+33, then we may have one collision per 10^9+33 comparation - it is small enough to forget about it in the most cases)17:36
bdfhjktwo hashes are identical and parts of data are different17:37
bdfhjkNow, how to use it in practise17:37
bdfhjkFor example, we have 10^6 big words, and we want to be able to fast compare lexicographically two of it in logarithmic time of length. We want to figure, which is bigger. That is useful eg. in sorting.17:38
bdfhjkThe first step is preprocessing - calculate hashes for each suffix17:38
bdfhjkwhen we know value of hash for each suffix, we can use the formula h(s[i..j]) = (h(i) - p^(j-i+1)*h(j+1)) mod q to calculate hash of fragment starting at char number i, and ending at char number j17:39
bdfhjkIt is clear, why this formula work?17:39
gang65not really17:40
gang65could you please explain this formula?17:40
ClassBotobounaim asked: what is q?17:40
bdfhjksorry not this question17:41
ClassBotjsjgruber-l84-p asked: what do you mean by suffix, here?17:41
bdfhjksuffix is a last part of word, beginning at specified position17:41
bdfhjkfor example17:42
bdfhjkword: algorithm17:42
bdfhjksufix17:42
bdfhjkm17:42
bdfhjkhm17:42
bdfhjkthm17:42
bdfhjkgorithm17:42
bdfhjkand algorthm as the longest suffix17:42
ClassBotfasked asked: Why we calculate j+1 at finish, but not j17:42
bdfhjkThat isn't true17:43
bdfhjkWe calculate hash for part i..j inclusive17:43
bdfhjkbut we store in h[i] hash for suffix starting on ith position17:43
bdfhjkso if we want suffix for i..j17:44
bdfhjkthen from suffix i17:44
bdfhjkwe must deduct suffix j+117:44
bdfhjkand we have i..j17:44
bdfhjkbecause suffix from i17:45
bdfhjkwe can denote as17:45
bdfhjki..j + (j+1)..end17:45
bdfhjkWe also multiply the j+1-th suffix17:46
bdfhjkby p^(j-i+1)17:46
bdfhjkbecause we removing chars at positions starting (j-i+1) after17:47
bdfhjki17:47
ClassBotLilian asked: Can you explain if this is the only formula for hashes or we can create one ourselves(but the main idea it to calculate everywhere the hash the same way)? ;)17:47
bdfhjkNo17:47
bdfhjkWe can create our formula17:48
bdfhjkIt depend, what informations we want to store17:48
bdfhjkfor example17:48
bdfhjkif we are interested only in number of chars17:48
bdfhjkthen we can simply add numbers of chars17:48
bdfhjkthen words abc and cba17:49
bdfhjkwill have the same hash17:49
ClassBotThere are 10 minutes remaining in the current session.17:49
bdfhjkor for example use prefix than suffix17:49
ClassBotjsjgruber-l84-p asked: Does the suffix hash arithmetic work while it is all modulo q?17:49
bdfhjkYes, we can multiply add and deduct in modulo arithmetic17:50
bdfhjkwe can also divide, but we must use fermat's lemat17:50
bdfhjkI can say more after the classes17:50
bdfhjkabout dividing hashes17:50
bdfhjkand for calculating p^x17:50
bdfhjkwe can use fast_binary_power algorithm17:51
ClassBotfasked asked: everything which allows to access data for O(1) is hash function, isn't it?17:51
bdfhjkThe definition of hash isn't strict17:52
bdfhjkAlways, when we store informations with data loss, we can say about hashing17:52
bdfhjkwhen we only reduce size without data losss17:53
bdfhjk'loss17:53
bdfhjkthen we only making a compression17:53
bdfhjkOk17:53
bdfhjkso to continue17:53
bdfhjkto compare two words, we use binary search to find first char, where there is a difference between two words.17:53
bdfhjkand to test, it two parts of word are the same17:54
bdfhjkwe test it's hash17:54
bdfhjkwhich can be done in O(1)17:54
bdfhjkso overall to compare lexicographically two words17:55
bdfhjkwe need O(log s) time17:55
bdfhjkwhere s is a size of the smallest word17:55
bdfhjkDo You have any questions about hashing methods?17:56
ClassBotgang65 asked: What will happen if we use too small p number  (ex. 7), for formula: s[i] + s[i+1]*p^1 + s[i+2]*p^2 + ... + s[j]*p^(j-i)17:56
bdfhjkif we store chars as numbers in ASCII code17:57
FeSludgeis hashing different than comparing the ASCII value of the characters?17:57
bdfhjkthen they will may be between <65.97>17:57
bdfhjkIf I remember this numbers correctly17:58
bdfhjkbut at least > 5017:58
bdfhjkI often use p=1317:58
bdfhjkbecasue if this number is prime17:59
bdfhjkthen our hashes always will be different17:59
bdfhjkbecause if we split they into prime factors18:00
bdfhjkalways will be different number of p there18:00
bdfhjkFeSludge : I don't understand your question, could You clarify?18:01
bdfhjkhashing allow us to make it faster18:01
bdfhjkFinally, the last but not the least topic : Introduction to the artificial intelligence18:02
bdfhjkThis is not such a difficult subject (in practise usage), as it might seem at first time18:03
bdfhjkArtificial intelligence can be used in a simple way to identify the language of the text18:03
bdfhjkWe can also guess the complicated patterns with a small number of possible outcomes18:03
bdfhjkI present, how to make the first mentioned task18:04
bdfhjkFirstly we must define the input and output of artificial neural network (ANN)18:04
bdfhjkThe input can be for example the number of occurrences of a each character type in the text, so we have 26 input neurons18:05
bdfhjkOutput neurons can be languages, that we are going to detect18:05
bdfhjkThese data can be easily counted by a simple program and saved to a file18:06
bdfhjkNext step is to create the ANN - artificial neural network18:06
bdfhjkThis can be done using FANN library, by calling18:06
bdfhjkstruct fann *ann = fann_create(1, 0.7, 3, 26, 13, 3);18:06
bdfhjkwhere:18:06
bdfhjk1 - connection rate (we aren’t pay attention to this argument now, for our purposes it should be 1)18:07
bdfhjk0.7 - learning rate - defines how aggressive will be the learning algorithm. It depends on our learning data, for this example 0.7 is a good value for relative small amount of training data.18:07
bdfhjk26, 13, 3 - our layers.  First layer is the input. The last one is the output.18:07
bdfhjkWhat is a (lucky) number 13?18:08
bdfhjkThis is the number of neurons in the hidden, middle layer.18:08
bdfhjkIn every problem, we must guess, what will be the best amount of hidden neurons  - there isn’t any strict rule. For some problems it is useful to make 2,3 or more hidden layers.18:09
bdfhjkThe next step is to train our ANN using pregenerated file18:09
bdfhjkfann_train_on_file(ann, "frequencies.data", 200, 10, 0.0001);18:09
bdfhjkFirst two arguments are easy to guess18:09
bdfhjkThrid represents the maximal number of epochs - how many times we ‘feed’ our ANN using data from file18:10
bdfhjkFourth is only for debugging, after how many epochs we want to see report at console.18:10
bdfhjkThe last one represent, what is maximal acceptable error on test data, so we can exit training after reaching it.18:10
bdfhjkAt this state we have operational ANN18:11
bdfhjkready for usage18:11
bdfhjkIf we want to detect language of text, we must only :18:11
bdfhjk1. Generate frequencies in this text18:11
bdfhjkfloat frequencies[26];18:11
bdfhjkgenerate_frequencies(argv[1], frequencies);18:11
bdfhjkthis function read input text and count frequencies of each char18:12
bdfhjk2. Run our ANN on this data18:12
bdfhjkfloat *output = fann_run(ann, frequencies);18:12
bdfhjk3. Print the best answer18:12
bdfhjkprintf max{output}18:12
bdfhjkwhat represent the biggest number from output values in output layer18:13
bdfhjkremind: each value represents one language18:13
bdfhjkin output layer18:13
bdfhjkAnd that is all18:14
bdfhjkDo You have any questions about the last part of classes?18:14
ClassBotLilian asked: printf max{output} - Python?18:14
bdfhjkno, I wrote in pseudocode here18:14
Lilianow, thx18:15
bdfhjkin c++ we must iterate all answers18:15
bdfhjkand pick the bigger18:15
bdfhjklanguage with the bigger probability18:15
bdfhjkany other questions?18:15
bdfhjkAt the end, I want to say some words about Ubuntu Algorithm Team18:16
bdfhjkOur main goal is to help programmers in problems related with algorithms18:16
bdfhjkYou can use our launchpad page https://launchpad.net/algorithms-headquarter to post your problems (as bugs)18:16
bdfhjkThanks everyone for participation!18:17
yoyooWe thank you.18:17
ClassBotjsjgruber-l84-p asked: So frequencies.data might contain something one line per language containing the frequecy of occurance of each letter?18:17
bdfhjkan example frequencies.data:18:18
Lilianbdfhjk: what do you mean by "problems"? Pseudo implementation of algorithms?18:18
bdfhjk12 26 318:18
bdfhjk0.103 0.016 0.054 0.060 0.113 0.010 0.010 0.048 0.05618:18
bdfhjk0.003 0.010 0.035 0.014 0.065 0.075 0.013 0.000 0.05118:18
bdfhjk0.083 0.111 0.030 0.008 0.019 0.000 0.016 0.00018:18
bdfhjk1 0 018:18
bdfhjk0.076 0.010 0.022 0.039 0.151 0.013 0.009 0.009 0.08118:18
bdfhjk0.001 0.000 0.058 0.024 0.074 0.061 0.030 0.011 0.06918:18
bdfhjk0.100 0.074 0.059 0.015 0.000 0.009 0.003 0.00318:19
bdfhjk0 1 018:19
bdfhjk...18:19
bdfhjkYou can refer on fann article18:19
bdfhjklinked in notes18:19
ClassBotfasked asked: We can post any problems or only discussed here to launchpad?18:19
bdfhjkBecasue we don't have many problems on launchpad, I allow to ask any question now18:20
bdfhjkIf in the future it change, then I inform about this18:20
bdfhjkgenerally18:20
bdfhjkanswers is for all algorthim-related questions18:20
bdfhjkbugs in for problems related with concrete program/application18:21
bdfhjkby problems18:21
bdfhjkI mean task18:21
bdfhjktrouble18:21
bdfhjkissue18:22
bdfhjkthe thing, which we want to do18:22
bdfhjkHere I used pseudocode18:22
bdfhjkto describe algorithms18:22
bdfhjksometime I mix it with c++18:23
bdfhjkbut all should be affordable18:23
bdfhjkif not, please ask18:23
bdfhjkhttp://www.surveymonkey.com/s/MQK9SKP18:24
bdfhjkhere is a link to survey18:24
bdfhjkplease share your thoughts18:24
bdfhjkabout classes18:24
ClassBotThere are 5 minutes remaining in the current session.18:24
bdfhjknow we have 5 minutes for any questions18:25
bdfhjknot necessary related with today's topics18:25
bdfhjkI remember, that I had to answer some questions18:25
bdfhjkplease remind them18:26
faskedWhich topics you want to discuss in the future?18:26
bdfhjkI have some proposition, not decided yet18:26
bdfhjkone of them are matrix18:26
bdfhjkhow to calc sum i=0 to n=10^1818:26
Lerhondinterval trees!18:26
bdfhjk3^i (i+1)(i+2)18:27
bdfhjkPlease send your ideas in survey18:27
bdfhjkonline judge18:27
bdfhjkYou can train coding today's algorithms18:27
bdfhjksolving tasks in online judge18:27
faskedbdfhjk: thanks for class18:28
bdfhjkfor each topic we prepared tasks18:28
bdfhjkso if You want to train18:28
bdfhjktry solve they18:29
bdfhjksome are easy, some are hard18:29
bdfhjkbinary search while debugging18:29
bdfhjksuppose18:29
bdfhjkYou have a program18:29
ClassBotLogs for this session will be available at http://irclogs.ubuntu.com/2012/04/13/%23ubuntu-classroom.html18:29
bdfhjkwhich crashes at some point18:29
=== ChanServ changed the topic of #ubuntu-classroom to: Welcome to the Ubuntu Classroom - https://wiki.ubuntu.com/Classroom || Support in #ubuntu || Upcoming Schedule: http://is.gd/8rtIi || Questions in #ubuntu-classroom-chat ||
bdfhjkthen to figure where it crashed18:30
bdfhjkYou place a line18:30
bdfhjklike printf("Here I'm still alive");18:30
bdfhjkand run your program18:30
bdfhjkif You see this message18:30
bdfhjkthan you know - program isn't crashed before this line18:30
bdfhjkin another case18:30
bdfhjkYou know, that bug is after this line18:31
bdfhjkso you can ask questions like in binary search18:31
Lilianbdfhjk: b-tree or fractral-tree ? :)18:31
bdfhjkand every time place this line in the middle of unreliable code18:31
bdfhjkLilian: b-trees are interesting18:31
bdfhjkbut I think18:31
Lilianfractal*18:32
bdfhjkare not often used18:32
bdfhjkare used primarily to read from disk18:32
bdfhjkfind an information in 3 or 4 asks18:32
Lilianfrom a big database...18:32
bdfhjkyes18:33
bdfhjkIf You are interested18:33
bdfhjkplease read about it in Cormen book18:33
bdfhjkIntroduction to Algorithms, Second Edition (9780262032933): Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein: Books.18:33
Lilianbdfhjk: nice, thx18:33
bdfhjkthere are very well described18:34
Lilianbdfhjk: I went to a job interview and I was attacked with a bunch of algorithms to explain...18:35
bdfhjkalgorithms are very useful18:35
=== yofel_ is now known as yofel
=== dduffey_afk is now known as dduffey
=== Resistance is now known as EvilResistance

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!