the timestampz in a pdf, oh my
Victor Odhner
vodhner at cox.net
Wed Feb 1 15:10:23 MST 2017
Hi, Hans.
My complicated C-programmer mind says to take each PDF, process it in binary form and spit out a new PDF with all timestamp digits changed to zeroes, then use the postprocessed files for comparison. That’s probably not a practical approach as it stands, but it might spark an idea that could work.
Victor
_________
On Feb 1, 2017, at 10:31:10, der.hans <PLUGd at LuftHans.com> wrote:
Am 01. Feb, 2017 schwätzte Joseph Sinclair so:
moin moin,
> Have you checked DiffPDF?
> It's supposed to do what you're looking for, although it's no-longer actively maintained (author took it closed-source :( ).
Not familiar with it. Ah, comparepdf for the command line version.
comparepdf -ca web1.pdf web2.pdf
Will have to test with it to verify some basic tolerances.
> Another alternative might be to use pdf2ps (part of ghostscript) to
> transform into postscript and compare that, but you may need to do
> more massaging as timestamps and such would probably still be in the
> postscript.
I tried that. I expected the datestamps to carry over, but they appear to
not have. The files are still different, though :(.
I was thinking pdf2png or pdf2jpg type of thing might work. Have to check
if I always get the same output.
ciao,
der.hans
> On 02/01/2017 01:29 AM, der.hans wrote:
>> moin moin,
>>
>> I have some dynamically generated PDFs coming from a pool of web servers.
>>
>> Each server should be generating a PDF that looks exactly the same as from
>> all the other servers.
>>
>> The PDF generation includes sticking in a few timestamps and possibly some
>> hostnames or other dynamic content. The dynamic content eliminates the
>> option of just using checksums to verify the output file is the same from
>> all of the web servers.
>>
>> Any suggestions on how I can write a command line check. Needing to
>> install a script would be far less than ideal in this situation. Funnily
>> enough, needing to install a package would be less of an issue in this
>> particular case, especially something in CentOS 6.
>>
>> Me being me, I did try to just grep out the lines with timestamps :). That
>> didn't quite work :(. That probably indicates the files aren't as exactly
>> the same as I hope.
>>
>> I didn't see a pdf2sanity tool. pdf2text won't really work as I need to
>> verify the graphic content and hopefully the PDF wrapper.
>>
>> ciao,
>>
>> der.hans
>
>
--
# http://www.LuftHans.com/ http://www.PhxLinux.org/
# "Wasted day. Wasted life. Dessert, please." -- Steven Meretzky---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss at lists.phxlinux.org
To subscribe, unsubscribe, or to change your mail settings:
http://lists.phxlinux.org/mailman/listinfo/plug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.phxlinux.org/pipermail/plug-discuss/attachments/20170201/9c105f28/attachment.html>
More information about the PLUG-discuss
mailing list