[clug] Howto change /Title in PDFs?

Scott Ferguson scott.ferguson.clug at gmail.com
Tue Aug 14 22:48:07 MDT 2012


On 15/08/12 13:51, Alex Satrapa wrote:
> On 15/08/2012, at 13:05 , Scott Ferguson
> <scott.ferguson.clug at gmail.com> wrote:
> 
>> Any suggestions for simple methods to change/create the Title for
>> PDFs?
> 
>> The Title I wish to apply to the PDFs is the same as the PDF
>> file-name minus the .pdf extension, `date "+%H%M_%d-%m-%Y"`
> 
>> scanbuttond (get button push) > scanimage (GRAY 8-bit) >
>> imagemagick (convert to jpg) > imagemagick (convert to small pdf)
> 
> Have imagemagick output to a specific directory, with a converter
> monitoring that directory, updating the title, and placing the output
> in a different directory (e.g: "Scanned PDFs To Be Retitled" -> fix
> title -> "Scanned PDFs"). You could use this "waterfall" approach to
> handle OCR too ("Scanned PDFs to be OCRed" -> OCR package -> "Scanned
> PDFs to be Retitled"), assuming your PDFs have text in them that
> you'd like to be able to index/search/find.

The images are of text files, Excel spreadsheets with no maths in them,
(sigh). One of the principals loves Excel for documents, Acrobat (and
Comic Sans). The PDFs are OCRed at the office end (proprietary,
expensive version of tesseract).

> 
> Check out the Perl package PDF::API2 at
> http://search.cpan.org/dist/PDF-API2/lib/PDF/API2.pm

Will do.

> 
> HTH HAND Alex
> 
> PS: I've had some inspiration, since I have a 300 page PDF manual for
> my camera which has no links between pages. Time to get into some
> serious programming and resolve this situation :)

try tesseract... I've got a similar project in mind to automate using
OO.o, kdialog and tesseract.
> 
> 


Thanks Alex - I did look at the perl package, and the ruby equivalent.
This is a rewrite of something I did some time ago - so I'm looking for
minimal changes (lazy).  Given the number of functions ImageMagick is
capable of I'm surprised it doesn't allow manipulation of the pdf
metadata (can edit/create metadata in images though).

I explained the situation and needs poorly... this is for a script that
listens to scanbuttond. Scanbuttond monitors the four buttons on the
(Canon CanoScan LiDE) scanners and the script (pressedbutton.sh)
provides the button functionality. This is to enable the unskilled
operator/s to *quickly* (under 2 minutes using a low powered device with
a small drive)  do basic scans of worksheet/s (automatically attached to
email and posted with mailx). Kdialog and festival keep the operator
informed during the process, and Okular allows them to check the PDF
before sending.

1. scan to Gray scale image (PBM) as /tmp/scanimage
2. "convert" /tmp/scanimage (PBM) to /tmp/scanimage.jpg (lazy way to
small PDF)
3. "convert" /tmp/scanimage.jpg to
~/Documents/Pictures/Scanned/$TIMESTAMP.pdf
4. ? (pipe through titling process, and remove scanner lock file)
5. email ~/Documents/Pictures/Scanned/$TIMESTAMP.pdf
6. remove tmp files, rinse and repeat.

For the new step 4 I have:-
;your suggestion
;similar ruby wrapper
;python-pdfrw (and others)
;string manipulation (favoured as it doesn't require more
packages/liabilities/work)
;pdftk


I've a nagging feeling I'm missing something obvious though.


Kind regards


More information about the linux mailing list