PDF/A

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
PDF/A
Filename extension .pdf
Internet media type application/pdf
Type code 'PDF ' (including a single space)
Uniform Type Identifier (UTI) com.adobe.pdf
Magic number %PDF
Developed by ISO
Initial release 2005 (2005)
Extended from PDF
Standard ISO 19005[1][2]

PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for the digital preservation of electronic documents.[3]

PDF/A differs from PDF by prohibiting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding).[3]

The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations.

Standards

PDF/A-1 is based on the PDF Reference Version 1.4 from Adobe Systems Inc. (implemented in Adobe Acrobat 5 and later versions) and is defined by ISO 19005-1:2005, an ISO Standard that was published on October 1, 2005: Document Management – Electronic document file format for long term preservation – Part 1: Use of PDF 1.4 (PDF/A-1)[1]

PDF/A-2 is based on ISO 32000-1 – PDF 1.7 and is defined by ISO 19005-2:2011, published on June 20, 2011 under the formal name Document management – Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2).[2]

PDF/A-3 is based on ISO 32000-1 – PDF 1.7 and is defined by ISO 19005-3:2012, published on October 15, 2012 under the formal name Document management -- Electronic document file format for long-term preservation -- Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3).[4]

ISO 19005 - Document management - Electronic document file format for long-term preservation (PDF/A)
Part Name Formal name Release date Standard Based on PDF version
Part 1 PDF/A-1 Use of PDF 1.4 (PDF/A-1) 2005 ISO 19005-1 PDF 1.4 (Adobe Systems, PDF Reference third edition, 2001)
Part 2 PDF/A-2 Use of ISO 32000-1 (PDF/A-2) 2011 ISO 19005-2 PDF 1.7 (ISO 32000-1:2008)
Part 3 PDF/A-3 Use of ISO 32000-1 with support for embedded files (PDF/A-3) 2012 ISO 19005-3 PDF 1.7 (ISO 32000-1:2008)

Background

PDF is a standard for encoding documents in an "as printed" form that is portable between systems. However, the suitability of a PDF file for archival preservation depends on options chosen when the PDF is created: most notably, whether to embed the necessary fonts for rendering the document; whether to use encryption; and whether to preserve additional information from the original document beyond what is needed to print it.

PDF/A was originally a new joint activity between The Association for Suppliers of Printing, Publishing and Converting Technologies (NPES) and the Association for Information and Image Management to develop an International standard defining the use of the Portable Document Format (PDF) for archiving documents.[5] The goal was to address the growing need to electronically archive documents in a way that would ensure preservation of their contents over an extended period of time and ensure that those documents would be able to be retrieved and rendered with a consistent and predictable result in the future.[6] This need exists in a wide variety of government and industry areas world-wide, including legal systems, libraries, newspapers, and regulated industries.[7]

Description

The PDF/A standard does not define an archiving strategy or the goals of an archiving system. It identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way using various software in years to come. A key element to this reproducibility is the requirement for PDF/A documents to be 100% self-contained. All of the information necessary for displaying the document in the same manner is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and data streams), but may include annotations (e.g. hypertext links) that link to external documents.[6]

Other key elements to PDF/A conformance include:[8][9][10]

  • Audio and video content is forbidden.
  • JavaScript and executable file launches are forbidden.
  • All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering. This also applies to the so-called PostScript standard fonts such as Times or Helvetica.
  • Colorspaces specified in a device-independent manner.
  • Encryption is forbidden.
  • Use of standards-based metadata is required.
  • External content references are forbidden.
  • LZW is forbidden due to intellectual property constraints. JPEG2000 image compression models is not allowed in PDF/A-1 (based on PDF 1.4), as it was first introduced in PDF 1.5. JPEG 2000 compression is allowed in PDF/A-2 and PDF/A-3.
  • Transparent objects and layers (Optional Content Groups) are forbidden in PDF/A-1, but are allowed in PDF/A-2.
  • Provisions for digital signatures in accordance with the PAdES (PDF Advanced Electronic Signatures) standard are supported in PDF/A-2.
  • Embedded files are forbidden in PDF/A-1, but PDF/A-2 allows embedding of PDF/A files, facilitating the archiving of sets of PDF/A documents in a single file. PDF/A-3 allows embedding of any file format such as XML, CAD and others into PDF/A documents.
  • The use of XML-based XML Forms Architecture (XFA) forms is forbidden in PDF/A. (XFA form data may be preserved in a PDF/A-2 file by moving from XFA key to the Names tree that itself is the value of the XFAResources key of the Names dictionary of the document catalog dictionary.)
  • Interactive PDF forms - Form fields must have an appearance dictionary associated with the field's data. The appearance dictionary shall be used when rendering the field.

Conformance levels and versions

PDF/A-1

The standard specifies two levels of compliance for PDF files:

  • PDF/A-1a – Level A compliance in Part 1
  • PDF/A-1b – Level B compliance in Part 1

PDF/A-1b's objective of ensuring reliable reproduction of the visual appearance of the document.

PDF/A-1a includes all the requirements of PDF/A-1b and additionally requires:[11]

  • document structure must be included (hierarchy)
  • Tagged PDF (use of alternative texts for images, tagging text spans and giving them an ID, replacement texts for symbols)
  • Unicode character maps
  • language specification.

PDF/A-1a's objective is to ensure that document content can be searched and repurposed.

The requirements for Level A conformance place greater responsibilities on writers preparing conforming files, but these requirements allow for a higher level of document preservation service and confidence over time. Level A conformance is intended to facilitate the accessibility of conforming files for physically impaired users, but does not include the technical specificity required for assuring accessibility as does PDF/UA.[12]

According to the specification, the following terms are recommended when referring to the ISO 19005-1:2005 specification when the full ISO name is not being used:

  • PDF/A – a synonym for the ISO 19005 family of standards
  • PDF/A-1 – a synonym for ISO 19005-1
  • PDF/A-1a – a synonym for ISO 19005-1 Level A conformance
  • PDF/A-1b – a synonym for ISO 19005-1 Level B conformance

PDF/A-2

PDF/A-2 is the second part of ISO 19005. PDF/A-2 address some of the new features added with versions 1.5, 1.6 and 1.7 of the PDF Reference. PDF/A-1 files will not necessarily conform to PDF/A-2, and PDF/A-2 compliant files will not necessarily conform to PDF/A-1.

Part 2 of the PDF/A Standard is based on a PDF 1.7 (ISO 32000-1), rather than PDF 1.4 and offers a number of new features:

  • JPEG2000 image compression
  • support for transparency effects and layers
  • embedding of OpenType fonts
  • provisions for digital signatures in accordance with the PDF Advanced Electronic Signatures – PAdES standard
  • the option of embedding PDF/A files to facilitate archiving of sets of documents with a single file.[9]

Part 2 defines three conformance levels. PDF/A-2a, PDF/A-2b correspond to conformance levels a and b in PDF/A-1. A new conformance level, PDF/A-2u, represents Level B conformance (PDF/A-2b) with the additional requirement that all text in the document have Unicode mapping.[11][13]

PDF/A-3

PDF/A-3 (ISO 19005-3:2012. Part 3) differs from PDF/A-2 in only one regard - it allows embedding of arbitrary file formats (such as XML, CSV, CAD, word-processing documents, spreadsheet documents and others) into PDF/A conforming documents.[14]

The PDF/A-3 specification was published on October 17, 2012.[15]

Identification

A PDF/A document can be identified as such through PDF/A-specific metadata located in the "http://www.aiim.org/pdfa/ns/id/" namespace. This metadata represents a claim of conformance; in itself it does not assure conformance:

  • a PDF document can be PDF/A-compliant, except for its lack of PDF/A metadata. This may happen for instance with documents that were generated before the definition of the PDF/A standard, by authors aware of features that present long-term preservation issues.
  • a PDF document can be identified as PDF/A, but may incorrectly contain PDF features not allowed in PDF/A; hence, documents which claim to be PDF/A-compliant should be tested for PDF/A compliance.[16]

Establishing Conformance

Many vendors license software claiming to produce PDF files that conform to PDF/A (i.e., files that include PDF/A metadata). Conformance with the PDF/A specification is obtained by way of validation.

A variety of vendors offer commercial validation tools. Two projects are intended to establish industry norms for valid PDF/A documents:

Isartor Test Suite

Industry collaboration in the original PDF/A Competence Center[17] following release of PDF/A-1 in 2006 led to development of the Isartor Test Suite[18] in 2008. Isartor consists of a set of PDF files intentionally constructed to systematically fail each of the requirements for PDF/A-1b, allowing developers to check the ability of their software to conform to the core conformance level in the first part of the standard. Isartor was extended by PDFLib's Bavaria Test Suite[19] in 2009. No further work is planned for this test suite.

veraPDF

Working with the other members of the veraPDF consortium, including the Open Preservation Foundation,[20] to respond to the EU Commission's PREFORMA project[21] the PDF Association launched the PDF Validation Technical Working Group[22] in November, 2014 to articulate a plan for developing a definitive PDF/A validator designed to win acceptance industry-wide.

Based on its test corpora (which incorporates the Isartor Test Suite) and software development plan the veraPDF consortium subsequently won phase 2 of the PREFORMA contract in April 2015.[23] Phase 2 will be completed by December 2016.

PDF/A viewer mode

The PDF/A specification also states some requirements for a conforming PDF/A reader, which must

  • ignore any data that are not described by the PDF and PDF/A standards;
  • ignore any linearization information provided by the file;
  • only use the embedded fonts (rather than any locally available, substituted or simulated fonts);
  • only display using the embedded colour profile;
  • ensure that form fields do not change the rendered presentation and are rendered without regard to the form data;
  • ensure that annotations are rendered consistently.

When encountering a file that claims conformance with PDF/A, some PDF viewers will default to a special "PDF/A viewing mode" to fulfill conforming reader requirements. To take one example, Adobe Acrobat and Adobe Reader 9 include an alert to advise the user that PDF/A viewing mode has been activated. Although not required by the PDF/A specification, Adobe Acrobat 9's PDF/A viewer mode disables functions for changing the document; this functionality was changed in Adobe Acrobat XI. Some PDF viewers allow users to disable the PDF/A viewing mode or to remove the PDF/A information from a file.[24][25]

Drawbacks

A PDF/A document must embed all fonts in use; accordingly, a PDF/A file will often be bigger than an equivalent PDF file that does not include embedded fonts.

The use of transparency is forbidden in PDF/A-1. The majority of PDF generation tools that allow for PDF/A document compliance, such as the PDF export in OpenOffice.org or PDF export tool in Microsoft Office 2007 suites, will also make any transparent images in a given document non-transparent. That restriction was removed in PDF/A-2.[8]

Some archivists have voiced concerns that PDF/A-3, which allows arbitrary files to be embedded in PDF/A documents, could result in circumvention of memory institution procedures and restrictions on archived formats.[26]

The PDF Association had addressed various misconceptions[27] regarding PDF/A in its publication "PDF/A in a Nutshell 2.0".

Converting a PDF (up to version 1.4) into a PDF/A-2 usually works as expected, except for problems with glyphs. According to the PDF Association, "Problems can occur before and/or during the generation of PDFs. A PDF/A file can be formally correct yet still have incorrect glyphs. Only a careful visual check can uncover this problem. Because generation problems also affect Unicode mapping, the problem attracts the attention when a visual check is carried out on the extracted text. In PDF/A, text/font usage is specified uniquely enough to ensure that it cannot be incorrect. If viewers or printers do not offer complete support for encoding systems, this can result in problems with regard to PDF/A."[28] Meaning that for a document to be completely compliant with the standard, it will be correct internally, while the system used for viewing or printing the document may produce undesired results.

A document produced with OCR conversion into PDF/A-2 or PDF/A-3 doesn't support the notdefglyph flag. Therefore, this type of conversion can result in unrendered content.

See also

References

  1. 1.0 1.1 Lua error in package.lua at line 80: module 'strict' not found.
  2. 2.0 2.1 Lua error in package.lua at line 80: module 'strict' not found.
  3. 3.0 3.1 Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. 6.0 6.1 Lua error in package.lua at line 80: module 'strict' not found. Cite error: Invalid <ref> tag; name "nutshell-2-3" defined multiple times with different content
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. 8.0 8.1 Lua error in package.lua at line 80: module 'strict' not found.
  9. 9.0 9.1 Lua error in package.lua at line 80: module 'strict' not found.
  10. Lua error in package.lua at line 80: module 'strict' not found.
  11. 11.0 11.1 Lua error in package.lua at line 80: module 'strict' not found.
  12. Lua error in package.lua at line 80: module 'strict' not found.
  13. Lua error in package.lua at line 80: module 'strict' not found.
  14. Lua error in package.lua at line 80: module 'strict' not found.
  15. Lua error in package.lua at line 80: module 'strict' not found.
  16. Lua error in package.lua at line 80: module 'strict' not found.
  17. Lua error in package.lua at line 80: module 'strict' not found.
  18. Lua error in package.lua at line 80: module 'strict' not found.
  19. Lua error in package.lua at line 80: module 'strict' not found.
  20. Lua error in package.lua at line 80: module 'strict' not found.
  21. Lua error in package.lua at line 80: module 'strict' not found.
  22. Lua error in package.lua at line 80: module 'strict' not found.
  23. Lua error in package.lua at line 80: module 'strict' not found.
  24. Lua error in package.lua at line 80: module 'strict' not found.
  25. Lua error in package.lua at line 80: module 'strict' not found.
  26. Lua error in package.lua at line 80: module 'strict' not found.
  27. Lua error in package.lua at line 80: module 'strict' not found.
  28. Lua error in package.lua at line 80: module 'strict' not found.

External links