[16:04] Eickmeyer: is OCR of any importance? That is taking a pdf that is a scan and converting to text. [16:05] OvenWerks: Yeah, in pdfs it's a nice-to-have because it makes the PDF searchable by embedding the text in the PDF. a PDF is merely an image, but having the text embedded is certainly nice. [16:05] Calibre is supposed to do that, but when I tried to use it I got a zero byte output file [16:05] Calibre has a tendency to be completely broken at times, so I'm not surprised. [16:07] I installed gimagereader-qt5 and it complained about no language file [16:08] so I installed tesseract-ocr-eng [16:08] That worked pretty good [16:09] It may be that calibre also neede that installed [16:09] That's possible, it might be under suggested packages. [16:09] tesseract is the defacto standard for OCR. [16:10] Not in recomends [16:11] Muon does not allow highlighting text in its depends window :P [16:11] Obviously not in recommends otherwise it would be installed with it. It might be in suggests though. I'd look but ERR:NotEnoughTimeRightNow [16:13] I just looked in apt-cache depends calibre and it doesn't show up as a suggests. Should probably be. [16:13] No worries, tesseract being installed may be less than useful unless the user's language file is also installed [16:13] Direct sync from Debian, might be worthy of a "wishilist" bug report. [16:15] I'm converting a PDF to chordpro. I am getting tired of showing up to practice with always the wrong key printed out. [16:22] Ooof, yeah. [16:23] Even with tesseract installed calibre doesn't work for me. [16:23] So probably I am doing something wrong :) [16:24] gimagereader works just fine so I will use that. [16:24] Most songs we can use from CCLI but there are a few that are written locally. [16:24] Yeah, I'd say try the calibre snap as an experiment but it looks unmaintained. [16:26] For what I am doing calibre is really not the right tool anyway. I am not making a book, just a page. [16:26] And I am editting the output anyway. [20:35] Eickmeyer: After reading through some more of calibre's docs. I have come to the conclusion that OCR is not included in the program and whoever I was reading that said it did was mistaken (plus one for google). It seems that some PDF documents have a scan that is what we see and also include an OCRed text portion in the file. Clibre is able to detect this text and grab it but the OCR has to [20:35] have already been done elsewhere. [20:35] Ah, that explains a lot. [20:35] So maybe look at gimagereader-qt if we want an OCR app. [20:36] * OvenWerks guesses the gimagereader (minus the -qt5) is a gtk app [20:37] Most likely. We could throw that in the publishing seed, which doesn't get installed by default anymore. [20:38] I would include at least the english language file it needs [20:39] What's the package name on that? [20:40] (fwiw, it looks like we already seed libtesseract5 somehow) [20:40] tesseract-ocr-eng [20:40] Maybe there is another application that tries to do ocr? [20:41] It's wanted by the graphics and video tasks, so it's in there somewhere. [20:41] libtesseract4 in the LTS [20:43] Either way, I have no issue adding gimagereader-qt and at least tesseract-ocr-eng [20:45] It's sort of a scanner like application [20:45] Right, kinda like gscan2pdf but with OCR I'd imagine, therefore much more handy. [20:45] Not needed to create but a utlility [20:47] FYI, tesseract-ocr and tesseract-ocr-eng are circular-deps of each other, so I'll just add tesseract-ocr. [20:48] Eickmeyer: sure, I picked the language one because thats what my error came up with as missing [20:48] It did add some deps too [20:48] Heh, interesting. [20:49] I'm keeping skanlite since that still can access network scanners, which most scanning applications cannot. [20:50] I still install simple-scan [20:51] it looks ugly but I am used to it [20:51] Heh, no worries. You make your entire desktop ugly, but I don't judge. :) [20:52] you might consider simple-scap beautiful then [20:52] Hehe