Re: the timestampz in a pdf, oh my

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: Victor Odhner
Date:  
To: Main PLUG discussion list
Subject: Re: the timestampz in a pdf, oh my
Hi, Hans.

My complicated C-programmer mind says to take each PDF, process it in binary form and spit out a new PDF with all timestamp digits changed to zeroes, then use the postprocessed files for comparison. That’s probably not a practical approach as it stands, but it might spark an idea that could work.

Victor
_________

On Feb 1, 2017, at 10:31:10, der.hans <> wrote:

Am 01. Feb, 2017 schwätzte Joseph Sinclair so:

moin moin,

> Have you checked DiffPDF?
> It's supposed to do what you're looking for, although it's no-longer actively maintained (author took it closed-source :( ).


Not familiar with it. Ah, comparepdf for the command line version.

comparepdf -ca web1.pdf web2.pdf

Will have to test with it to verify some basic tolerances.

> Another alternative might be to use pdf2ps (part of ghostscript) to
> transform into postscript and compare that, but you may need to do
> more massaging as timestamps and such would probably still be in the
> postscript.


I tried that. I expected the datestamps to carry over, but they appear to
not have. The files are still different, though :(.

I was thinking pdf2png or pdf2jpg type of thing might work. Have to check
if I always get the same output.

ciao,

der.hans

> On 02/01/2017 01:29 AM, der.hans wrote:
>> moin moin,
>>
>> I have some dynamically generated PDFs coming from a pool of web servers.
>>
>> Each server should be generating a PDF that looks exactly the same as from
>> all the other servers.
>>
>> The PDF generation includes sticking in a few timestamps and possibly some
>> hostnames or other dynamic content. The dynamic content eliminates the
>> option of just using checksums to verify the output file is the same from
>> all of the web servers.
>>
>> Any suggestions on how I can write a command line check. Needing to
>> install a script would be far less than ideal in this situation. Funnily
>> enough, needing to install a package would be less of an issue in this
>> particular case, especially something in CentOS 6.
>>
>> Me being me, I did try to just grep out the lines with timestamps :). That
>> didn't quite work :(. That probably indicates the files aren't as exactly
>> the same as I hope.
>>
>> I didn't see a pdf2sanity tool. pdf2text won't really work as I need to
>> verify the graphic content and hopefully the PDF wrapper.
>>
>> ciao,
>>
>> der.hans
>
>


-- 
#  http://www.LuftHans.com/        http://www.PhxLinux.org/
#  "Wasted day. Wasted life. Dessert, please."  -- Steven Meretzky---------------------------------------------------
PLUG-discuss mailing list - 
To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss


---------------------------------------------------
PLUG-discuss mailing list -
To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss