Abstract

A Portable Web Publication is a collection of content items (e.g., pages, chapters, modules, articles) whose content is compatible with Web usage, and structured as a single, self-contained logical unit. This document describes the use cases that inform the requirements for a Portable Web Publication, and should be read as part of the Portable Web Publications for the Open Web Platform.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Work In Progress.

This document was published by the Digital Publishing Interest Group as an Editor's Draft. If you wish to make comments regarding this document, please send them to public-digipub-ig@w3.org (subscribe, archives). All comments are welcome.

Publication as an Editor's Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

1. Introduction

TK

2. Use Cases - Fundamental Features of a PWP

2.1 Collection of Resources

Publisher P works with multiple authors to create an anthology and uses resources from different rights’ holders from different locations on the web. Following the current practice on the Web, the publication consists of many different resources (HTML, SVG, CSS, etc.). The publisher needs the collection of all the resources as a unit to include it into its business workflow. The publication must also be deposited to the national library of as a legal deposit.

A book on wines that can be read from A-Z, or personalized to only read about red wines or wines from a specific region. Each wine may be a resources/small chunk of data.

User A has access to materials only through an old computer in her local library. While she has time to read the entire copy of War and Peace, the system is unable to display the entire resource as one huge HTML file. Parsing through one 2000-page HTML document is difficult and resource- intensive. Parsing through a package of 20 10-page HTML documents is less resource-intensive.

2.2 Online/offline

Connectivity as a commodity: Students reading e-textbooks in a village in Africa where there is none or not a reliable connection.

Many institutions, such as schools and government organizations (even in the wealthiest countries), do not have the resources to update equipment frequently. Therefore, it is necessary for publications to be accessible on current as well as older browsers.

Bob wants read a PWP on an airplane.

2.3 Basic Actions

Anna is a self-publishing author. Anna creates a PWP, both packed and unpacked. Anna uses some cloud storage system such as DropBox to publish her PWP online. Her friend Bob is able to read Anna's PWP online and offline, either packed or unpacked, as Bob sees fit.

2.4 Online Discovery

As a reader, I want to discover a publication on the web so that I can start reading it right away.

2.5 Reading in Browser

Bob finds a PWP online. His preference is to read the publication in his web browser.

2.6 Platform Independence

Publisher ACME creates a publication that is consumable across a variety of reader platforms, whether online or offline.

2.7 Manifests/Metadata

As a reading system, I need a list of things to process.

As a reading system, I need to know if support a media type included in the publication.

As a reading system, I need to know the order in which to display material.

As a reading system, I need to know file size(s) because I have limited memory

As a reading system, I need to know which files are required or essential for the user to perceive the content.

As a reading system, I need to know enough about the publication efficiently and without preprocessing.

As a reading system, I need to know the title and cover image to display the publication on a shelf without downloading all it's content.

As a reading system, I need to know which file is the next one.

As a reading system, I need to know if I need additional processing instructions, such as with MathML.

As a reading system, I need to know if I have the rights to put the resource offline.

As a reading sytem, I need to know when all the files are delivered, and that if a file is not in the manifest, I will not process it.

As a reading system, I need descriptions about the publication that travel with the publication whether online or offline. For example: author(s), title, size, rights and permissions, accessibility, multimedia details.

As a reading system, I want to know that the content is unaltered.

As a reading system, I want to know if the content is intentionally updated.

As a reading system, I want to know the origin of the publication.

3. Use Cases - Changing State

3.1 State Changing Document

Many publications - especially long form fiction and non-fiction - that users engage with for many hours. During this time, the user may shift states in many ways - starting consumption on an internet enabled PC, moving to an internet enabled portable device, going into offline-mode on that device, and then back to the PC.

During all of these experiences, the user needs to ensure they have access to critical pieces of data while secondary assets have a pre- defined fallback that will allow the user to continue (for example, a poster image of a video that serves as a placeholder for an externally streamed video when internet is available).

Lets take the use case of a user, Let's call him Nick. Nick is reading long-form narrative non-fiction. A publication filled with text, images, sounds, and multi-media files. Nick is also a multi-device user who wishes to consume the publication on multiple devices. Some of those devices have limited storage, and some of them have limited connectivity. Nick also rides the subway - where he loses internet connection, without warning - for long stretches of time.

During offline or low-storage situations, there are still critical parts of the publication that are consumable - mainly the text (and possibly images). Having a reasonable fallback for video (a poster image or placeholder image) would allow Nick to read the content while offline or in limited storage. While this should be the job of the reading system, having a method in the publication for the author of the publication to mark what items are critical, and what need a fallback for limited connectivity/storage situations would greatly help the reading system and give more control to the publisher to ensure consistent experience with consuming the publication.

Nick may know he's going to be in a no-connectivity situation and may want to obtain and locally store the entire (even non-critical) contents. This would be up to the reading system to provide a mechanism, but having a way to denote critical and fallback assets ensures that an entire package isn't downloaded when not necessary.

For the case of scripting - it's possible that certain items will be dynamic - and will hit an external resource (or server) to generate on-the-fly data. In offline mode, the user would want to be alerted that content could not be obtained, or be shown some fallback set of data. In t his case, being able to specify a "no-connectivity" or "offline-mode" alternative for scripts would allow the publication author to have more control over the user's experience and replace a potential error-display with a limited subset of a good experience.

3.2 Annotating Across States Via a Reader Plugin

Writer Annie has her book published on her own web space as a PWP [http/packed]. Reader Bob opens it online using the PWP reading plugin PWPRead-plugin, and selects a nice quote to bookmark via PWPMark-plugin.

Writer Annie has her book published on her own web space as a PWP [http/packed]. Reader Bob caches it using a plugin [cache/packed], and selects a nice quote to bookmark via PWPMark-plugin.

Writer Annie has her book published on her own web space as a PWP, both packed [http/packed] as unpacked [http/unpacked]. Reader Bob reads [http/unpacked] online and selects a nice quote to bookmark via PWPMark-plugin. Then, Bob downloads [http/packed] to his local filesystem [file/packed] to open in his reading system of choice, namely, PWPRead-soft.

PWPRead-soft synchronizes with Bob's PWPMark profile, and can show Bob's bookmarks when he continues reading [file/packed]

3.3 Annotating Across States Via a Browser Plugin

Writer Annie has her book published on her own web space as a PWP [http/unpacked]. Reader Bob reads it online, and selects a nice quote to bookmark via its browser PWP bookmarking plugin: PWPMark-plugin.

When Bob re-opens the PWP offline, the bookmarks are shown via PWPMark-plugin

When Bob re-visits the original PWP, PWPMark-plugin can also show Bob's bookmarks in the online version.

4. Use Cases - Creating the PWP

4.1 Configurability of Important Resources

Chef Bob writes a cookbook with a lot of embedded videos to explain certain techniques. Bob finds it very important that his videos remain available even offline, and configures this in his cookbook. Reader Annie starts reading Bob's cookbook online. When Annie gets disconnected, the fonts of the cookbook fall back to the system fonts, but the videos remain available.

Typographer Charlie writes a book on typography, and configures differently: he finds fonts a very important aspect of his book, whilst the embedded videos may fall back to a still. Annie can read Charlie's book without err, online or offline. The fonts remain available, but the videos fall back to stills when offline.

Author David does not configure anything to his novel, but still, Annie can read David's book without problems whether she is online or not.

4.2 Updating PWPs

Corp.Inc. creates a PWP with dynamically updatable stock exchange information on chapter 4. Anna sends the locator for chapter 4 to Bob on April 1st. When Bob reads the PWP offline, chapter 4 is filled with some default content. However, when Bob gets online and clicks on the locator for chapter 4, he gets the updated stock exchange information, which might be different than the stock exchange information that Anna saw when she created the bookmark.

5. Use Cases - Security

5.1 Read and Write Controls Required

Alice is working on potentially Nobel prize winning research, and has drafted her paper describing her discoveries. She asks Bob to review the paper, but needs to make sure that the PWP retains specific protections, regardless of whether it is read online or offline.

6. Use Cases - Sharing, Distribution, and External Resources

6.1 Distribution

Publisher Corp. Inc. publishes a new PWP, and sends this PWP to ACME its customers. This PWP is downloaded to devices, or synced across several devices, or made available to a customer-specific cloud. Customers can access this file from different retailers, through different applications, either directly or downloaded from private cloud. Thus, the PWP is duplicated many, many, many, many, many times, resulting in a huge number of items. There is one source manifestation, one ISBN identifier, and lots of items spread across devices and buyers.

Annie buys a book and downloads it offline. She bookmarks a certain chapter (i.e., creates a locator for that chapter). She sends that bookmark to Bob. Bob is able to use that locator on any item of the same PWP, and gets redirected to the correct chapter.

6.2 Retail

(e)books that are sold need to be delivered, so that purchasers can load them on offline devices.

Purchased content has different expectations - one being that you have “something” or a reasonable use of that content in a logical way (such as always being able to read your amazon purchases through the amazon app)

The web is not permanent - sites go down, when you purchase a book, you need an offline copy that you can continue to read when the retailer you purchased from goes kaput…

Sales Auditing -> Ability to track “what” is sold so that it can be paid. If all content is just free and different chunks are purchased, chasing rights/payments is an issue. Basically the package can have an ISBN associated so that it can be tracked for sale - even multiple versions.

6.3 PWPs with Shared Resources

EsteemedPublisher creates apps to distribute several journals to readers. These apps share script libraries, CSS files, and other resources. Distributing many journals as a package should enable systems to call the shared resources once, speeding up processing and enabling offline reading.

6.4 Cross-references

Writer Annie writes a dissertation. She references to her Master's thesis, published on the university website. Her colleague Bob has read her Master's thesis before. When he clicks the reference in Annie's dissertation, he gets redirected to his local copy of Annie's Master's thesis. Her friend, Charlie, hasn't read her Master's thesis before. Charlie needs to be online when clicking the reference, to read Annie's Master's thesis.

6.5 Publication with Static Data

Rosa submitted an article to EsteemedJournal and provided her research data in CSV format. She and EsteemedJournal wish to provide users access to the CSVs when they gain access to her article. EsteemedJournal recommends that the package is built in such a way that a system can query the manifest to assess whether it is situationally appropriate to offer downloads. For example, the package might not offer the option to download the CSV while a user is reading offline.

6.6 Publication with Interactive Data

(Extension to the Publication with Data use case): the article of Rosa not only includes data, but also interactive graphics relying on that data, with interaction directed through javascript (or other) programming interfaces. These javascript programs may be fairly complex, and may also rely on external libraries (not necessarily integrated into the article itself).

7. Use Cases - Manifests and Packages

7.1 Streamlined Access to Disjoint Package Components

EsteemedJournalPublisher would like to offer the users of the EsteemedJournal of Chemistry App the opportunity to read only the abstracts of the journals in the app. The App Package must offer the user a list (table of contents) of abstracts (disjoint objects in the package with semantic information or metadata informing the package of the nature of the object).

(Is the abstract-only view built-in? spun-off using shared resource? totally independent publication?)

7.2 Manifest includes Information of New Content

Shoshana is an organic chemist. She has purchased the Esteemed Journal of Chemistry App. She downloads Organic Chem Quarterly in her lab and reads the first article over lunch. Shoshana begins the book reviews during office hours but must tend to her students' questions, so she closes the app. Shoshana opens the app on the train ride home to resume reading the book reviews. She is happy to find that the app opens to the exact location and opens quickly because most of the material does not need to be downloaded a second time.

An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because a new version of a component of the PWP has been published

8. Use Cases - Archival Interest

8.1 Integrity and Longevity

A government agency (e.g., laws, regulations, judicial decisions) publishes information that need to persist without any loss of information forever.

Journal article (e.g. announcing novel compound in chemistry) must be published in method that is persistent because it serves at the document of record for scientific record.

8.2 Retraction Notices

An archival service needs to harvest the retraction notice and update the Archival Information Package for the original PWP to include / link to the Retraction Notice.

8.3 Take-down Notices

An archival service needs to update an Archival Information Package (i.e., a previously harvested PWP) because it or one of its components has been taken down by the publisher.

9. Use Cases - Accessibility

9.1 Personalized Experience

Alice, a dyslexic student, downloads a textbook and proceeds to personalize the material with larger font and different contrast.

Supporting User Preferences

While reading a book on computer programming, Bob wants to change the font into a local font. However, the code should remain in a fixed-width font.

9.2 Adding Alternative Media

As a publisher of accessible content, I need to add content such as a braille style sheet, image descriptions, or video captioning (text / descriptive audio) to a PWP previously published by a third party.

9.3 Assisted Reader Technology

As a user of assistive technology, Alice wants to perceive the full PWP.

9.4 Time-based Media

Bobbie is learning to read and viewing a picture book. The picture book is fixed layout that will turn the page and reads along in sync with the page currently open.

9.5 Building a Custom PWP

Alice wants to download a PWP that captures only the external resources she needs to perceive the PWP.

10. CSS Requirements for Packaged Multiple Documents

10.1 Pagination and Generated Content

Any kind of pagination and also indexing has to be through a whole collection of documents that constitute a PWP, which may raise issues around transition between documents.

As a reader, I want to choose between a scrolled view and a paginated view of content that extends across multiple html documents

10.3 CSS customization

As a publisher I want my footnotes to number sequentially across the publication, even when the publication is constructed of multiple documents.

Content may have significantly different styles between files. For instance, some Japanese books will have documents whose root element is vertical-rl and others whose root element is horizontal-tb. These root element styles must be preserved.

10.4 Transitions

Placeholder: Comics-like transition

Placeholder: Page Transition effects both within an HTML document, and between HTML documents

11. Use Cases - Other

11.1 Offline Publications

Corp.Inc. creates an internal manual for its employees as a PWP. This PWP is not published online, but is sent around to all its employees. Employee Anna has some questions about figure 2b, and sends an email to co-worker Bob with a locator to that figure. Bob clicks on that locator, and his company-branded PWP reader opens Bob's personal copy of the manual, and redirects immediatly to that figure.

11.2 Publication plus Annotations

Oksana submits a scholarly article to a EsteemedJournal. EsteemedJournal puts the article through the Peer Review process, during which EsteemedJounral editors and third-party reviewers provide comments on the article. Ultimately, EsteemedJournal chooses not to publish Oksana's article, but the comments (annotations) that have already been made should persist as she submits it to RespectedJournal.

11.3 Specialized Domain Semantics

Placeholder: Formal usage terms and engineering or legal documents, possibly for accessibility also.

Specialized semantics are required for users and processors.

12. Requirement(s)

Be able handle intermittent changes in online/offline status during a single reading session while still minimizing the amount of material cached locally.

It SHOULD be possible to describe explicitly which resource does and which does not belong to the PWP or, to the “portable” part of a PWP.

There MUST be a separation between a format-independent (“canonical”) and format-dependent locator.

It MUST be possible (and necessary) to use, for all cross-references, the canonical locator.

There MUST be a separation between the identifier (e.g., ISBN) and the (canonical) locator of a specific instance of a PWP.

There SHOULD be a possibility in the PWP to follow (if necessary) the copying (provenance) chain

The Identifier (e.g., ISBN) MAY serve as a canonical locator for a specific instance of a PWP

It SHOULD be possible to use, in all circumstances, a relative locator to manipulate, annotate, etc, content in a PWP

A PWP Processor MUST be able to combine a relative locator with the canonical as well as state dependent locators of a PWP

Locating a resource within a PWP should not depend on the PWP's state.

A state independent locator should be part of the PWP.

There needs to be persistence of identifiers across PWP instances.

Any set up and mechanism, handling canonical and state-dependent locators, MUST be easily settable on any server (albeit maybe not in the most efficient manner) based on basic server behavior control.

There should be the capability for dynamic updating of information based on online/offline status and time of publication.

A PWP must allow for access control and write protections as part of the resource.

Package may contain/point to textual/graphical/media as well as data (CSV, code repository)

Package should include a queryable manifest.

Package may contain javascript libraries, and possibly other types of processing engines, like Java

Javascript libraries may refer to services "outside", ie, to services on the Web at large (e.g., reference to Wolfram Alfa, or online google services)

Javascript libraries may become relatively complex and may therefore be packaged; this means that the packaging format should allow for other packages as parts.

Content in package is fully annotatable

Annotations are considered part of the package

There needs to be metadata on package components

Manifest with metadata and semantic information

Customizable manifest functionality

Manifest indicates whether content is new (relative to last download) and triggers download only when content is new/updated.

There should be a discovery service or trigger available to indicate content changes or long-term availability changes.

The complex collection of documents, media, and other resources that comprise a publication must remain intact and complete across all transitions (online/offline), rendering in various formats, and distribution over time.

Data that affects content may be stored apart from the content.

The choice should be driven by content or workflow, not mandated by specification.

Long-form content should not be stored in one giant file. This affects performance, storage, workflow.

The components of a publication should be aggregated or disaggregated without loss of information.

Every resource can hold it’s own rights. The rights of this resource should be kept in the process of distributing the resource.

A collection of documents must be treated as a single unit for: Searching, values of counters (page counters, section numbering, footnotes), and user stylesheets (users must be able to adjust display, e.g., font selection, font-size adjustment, background color)

PWP needs to support rendering for visual output, tactile output, and audio output.

Package needs to support Time based media and Text such as audio synchronized with text, Video synchronized with Text, sign language sync with text.