ps2pdf making very large pdf files

Martijn van Oosterhout kleptog at
Tue Sep 17 14:27:37 EST 2002

On Tue, Sep 17, 2002 at 01:09:12PM +1000, John Griffiths wrote:
> well i can't code worth a damm but i can make and test
> At 12:47 PM 9/17/02 +1000, Michael Still wrote:
> >On Tue, 17 Sep 2002, Martijn van Oosterhout wrote:
> >
> >> Unfortunatly I havn't seen an open-source product come close to it.
> >
> >How complicated is PDF to parse? We could write one using Panda!

How complicated is postscript to parse? PDFs are just a subset of postscript

What you would have to do is parse the postscript in the source PS (so
sometihng like ghostscript). You would also have to parse the DSCs (document
structure) the identify page. You would also have to define and extract the
subdocuments within it (so embedded EPS files, fonts, function libraries)
are extracted as seperate objects.

Then, once you've defined all the objects you need to optimise. Remove the
duplicates remembering to pick ones even if they have different names. This
is so that if you include a dozen EPS files from Illustrator, you only get
the special Illustrator PS code once. Note this means moving the postscript
code from inside the EPS file to a global scope so all EPS's can use the
same code. But then you have to be careful about name collisions.

Also, if the user has selected downsampling to 72 dpi check all embedded EPS
files to see if it can be downsampled and if it would save space. This
applies mostly to embedded bitmaps.

Update references and cross references. Process the pdfmark commands to
enable the special PDF features. Work out which glyphs are actually used and
include only those. If a glyph is only used once, include it direct instead
of leaving it in the font (but only if it saves space).

Finally output the objects again as a PDF compressed. I don't know how much
of this ps2pdf but it's quite a bit of non-trivial work.

I have to say, Adobe has a nice product there.
Martijn van Oosterhout   <kleptog at>
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

More information about the linux mailing list