3. Data_Formats

3.1. Document

3.1.1. Publishing

3.1.1.1. PDF
3.1.1.1.1. Annotating PDF resp. PostScript using flpsed

...

3.1.1.1.1.1. flpsed does not obey to PageOrientation i.e. it does not automatically rotate pages appropriately, so what ...
		  pdftoppm ...				# caveat: you create huge pixmaps hereby
		  convert -orient TopLeft *.ppm x.ps	# convert is an ImageMagick utility

		  # the result of these two operations looks pretty much idempotent,
		  # but it actually cut down its size drastically:

		  convert x.ps  x.pdf
		  pdf2ps  x.pdf x.ps

		  # if you try other presumably idempotent operations,
		  # check whether the tools cropped your pages silently!
		

I tried applying convert with -orient ... right to a PostScript file, but it did not obey to -orient ... then.

Which orientations does convert support?

		  convert -list orientation
		

These orientation literals sound meaningful, but what do they mean? I am sorry, I don't know, try yourself!

3.1.1.1.2. Converting PDF to some image format ...
3.1.1.1.2.1. Converting PDF to any image format using ImageMagick's convert

...

3.1.1.1.2.2. Converting PDF to any image format via PPM using pdftoppm

...

3.1.1.1.2.3. Converting PDF to any image format via tiff using ghostscript

Why would anybody want to convert PDF to tiff or whatever? Because Microsoft Offfice Document Imaging is a nice product, that is capable of recognizing text using OCR -- but only from TIFF files. We can give it TIFF, so let's go ahead!

I got the options and parameters from a sample script (ps2ascii).

		$ gs -q -dNODISPLAY -dSAFER -dNOBIND -dWRITESYSTEMDICT -sDEVICE=tiffg3 -c save -sOutputFile=my_file.tiff my_file.pdf -c quit

If you don't trust the obscurer settings, simply do this:

		$ gs -sDEVICE=tiffg3 -sOutputFile=my_file.tiff my_file.pdf

3.2. Markup_Languages

3.2.1. TROFF

3.2.1.1. all the macros
3.2.1.1.1. the ms macros

...

3.2.1.1.2. the me macros

...

3.2.1.1.3. the mm macros

The mm macros were my favourite TROFF macro package for a long, long time, and it was Jürgen Gulbins, who introduced me to them. I only stopped using the mm macros for my documents, letters, etc., when I started using DocBook, but sad enough with DocBook you don't right letters, as you have not enough influence on how the output of the rendering step will look.

...

3.2.1.1.4. the man macros

If you're doing man on Solaris nowadays, it's actually showing you something derived from a DocBook man page source file. DocBook is winning!!!

...

3.2.1.2. all the preprocessors, all the postprocessors, ...
3.2.1.2.1. pic, tbl, eqn, ...

...

3.2.1.2.2. grap

I can't remember exactly, how it happened, but somehow one day I got hold on the sources of grap, so I was probably one of the few groff users also using grap.

...

3.2.1.3. a ditroff previewer

When I worked for Jürgen Gulbins at PCS in Karlsruhe in 1987, one of my project assignments was something like xpdf for the output of ditroff.

I think ditroff could already generate PostScript then, but first there was no www then and also no search engines, so was probably already some work done then on display PostScript, but how should we now?

Actually Donald Knuth had already invented the DVI file format, and that was something far smarter then PostScript and PDF and all that and much earlier invented and also developed.

3.2.2. SGML

Industrial-Strength SGML: An Introduction to Enterprise Publishing

URL

SGML CD

URL

PARSEME.1st: SGML for Software Developers

URL

3.2.2.1. Developing SGML applications
3.2.2.1.1. Developing SGML applications with perl

...

3.2.2.1.2. Developing SGML applications with python

Paul Prescod's links and software

3.2.2.2. Various SGML / XML DTD-s
3.2.2.2.1. BMEcat

Electronic commerce asking for standards and ways to exchange information, like product catalogs. What about that (apparently) vendors do not build their systems around BMEcat, but instead they offer BMEcat interfaces. That must be because things had already matured for quite a while, before BMEcat was born.

3.2.2.2.2. DocBook

I started writing most of my documents using the DocBook DTD.

A few of my open issues:

  • The processing expectations of chapters and sections seem to be, that they are numbered, typically resp. sometimes.

    What if I don't want certain chapters or sections to be numbered?

  • ...

  • ...

  • ...

  • ...

3.2.2.2.3. HR-XML
Human Resources XML ...
3.2.2.2.4. OFX
Open Financial Exchange

A DTD (and more) that define bank transactions and statements, WWW bill presenting, ...

3.2.2.2.4.1. Electronic bill presenting

Deutsche Post AG announced around 2000-02 to start offering service in the area of electronic bill presenting.

I trust the market not to appreciate this company's competence in this area. Why on earth should anyone associate banking competence to Deutsche Post AG.

I actually think the german market is to small to ask for a company like Checkfree.com, and luckily there is no such thing as a european market as such in vulnerable areas as banking. You want your bank to speak your language to you, don't you?

3.2.2.2.5. openTRANS

From my experience with invoices in common and my usage experience with openTRANS esp. it needs quite a couple of enhancements esp. in the area of invoices.

3.2.2.2.5.1. The bank_code> used within account

Nowadays with crossborder direct account transfers there is more than one such kind of bank_code, so there should be an attribute to allow values like

swift for the SWIFT code
bic for the BIC or Bank Interchange Code
blz (the Bankleitzahl in Germany).

And these literals should be the allowed values of an attribute kind of bank_code.

3.2.2.2.5.2. The bank_account used within account

For proper use together with bank_code there should be an attribute to allow values like

iban
for_use_with_blz
other.

3.2.2.2.5.3. vat_id is still not enough tax identification

The german IRS (Finanzamt) also requests all invoices to show yet another tax identification. They call it the Steuernummer, and it must be shown together with the name of the Finanzamt. I suggest to introduce another element by the name of tax_id, which is long enough to also hold the place of that Finanzamt.

3.2.2.2.5.4. Something in party to specify the legal status of a firm

No, vat_id and tax_id are still not enough, e.g. in Germany there must be a field to specify the legal status of a firm, in german: Handelsregistereintrag, other countries certainly request a similar entry.

3.2.2.2.6. TEI
DSSSL TEI stylesheet

URL

3.2.3. UseModWiki