the timestampz in a pdf, oh my

Matt Graham mhgraham at crow202.org
Wed Feb 1 08:54:24 MST 2017


On 2017-02-01 02:23, Joseph Sinclair wrote:
> On 02/01/2017 01:29 AM, der.hans wrote:
>> Each server should be generating a PDF that looks exactly the same as 
>> from
>> all the other servers.

>> The PDF generation includes sticking in a few timestamps and possibly 
>> some
>> hostnames or other dynamic content. The dynamic content eliminates 
>> the
>> option of just using checksums

>> Any suggestions on how I can write a command line check. Needing to
>> install a script would be far less than ideal in this situation.

This is probably complicated enough that you'll need a shell script, 
mostly because of the sentence below:

>> pdf2text won't really work as I need to verify the graphic content

If you have to make sure that the images inside a PDF are the same as 
the images inside another PDF, you might have to use gs to convert the 
PDFs into 2 images, then fuzzy-match (20,20,500,200) on image1.png to 
(20,20,500,200) on image2.png.  This *should* be doable through some 
sort of library or imagemagick command.
"compare -metric PSNR image1.png image2.png output.png" returns "inf" 
and a status of 0 when image1.png and image2.png are identical.  When 
image2 has been color-shifted slightly via gimp, that returns "24.4951" 
and status 0.  The numbers get smaller as the images become more 
different.  I do not know whether this'll help, but it's a start.

> Another alternative might be to use pdf2ps (part of ghostscript) to
> transform into postscript and compare that

Yes.  That'd almost certainly help with anything that's textual.  It 
won't help for graphics unless the images are guaranteed to always be in 
the same position and have the same content....

-- 
Crow202 Blog: http://crow202.org/wordpress
There is no Darkness in Eternity
But only Light too dim for us to see.


More information about the PLUG-discuss mailing list