Saturday, January 31, 2009

"Digitize This Book!" book review: follow-up

Following the post from last Tuesday January 27 of a review of Gary Hall's Digitize This Book! the dialogue below took place and focuses upon accessibility.

(I have picked up Bill Fitzgerald's, Drupal for Education and E-Learning and will post the review here once completed.)


From: Claude Almansi
To: Peter Jones


Hi Peter,

Thanks for the link (on xmca list) to your very detailed review of Gary Hall's book. As you are also an advocate of computer accessibility, may I ask if Hall's text mentions the accessibility of digitized works?

For instance the works offered by Google Books and The European Digital Library texts, with the exception of public domain ones, are presented as images of text that are mute to the screen readers used by blind people. So are all newspapers archives using Olive ActivePaper software.

This use of text images - apparently motivated by a "wish to protect copyright - is not only a barrier to blind people: it makes such "digitized" works very clumsy to use for research, and for reading on a portable device with a small screen. It also means that they are difficult to find with a search engine*. And the paradox is that these applications do use a plain text version, but hide it, only offering these images of text.

I wrote about this problem in:

http://innovateblog.wordpress.com/2009/01/22/unhide-that-hidden-text-please/

where I quote Gabriele Ghirlanda's description of how he did get his screen reader to read the content of such a "text image" article in an ActivePaper powered archive:
"With a screenshot, the image definition was too low for ABBYY FineReader 8.0 Professional Edition [optical character recognition software] to extract a meaningful text. But by chance, I noticed that the article presented is made of several blocs of images, for the title and for each column. Right-click, copy image, paste in OpenOffice; export as PDF; then I put the PDF through Abbyy Fine Reader.
[...]
For a sighted person, it is no problem to create a document of good quality for each article, keeping it in image format, without having to go through OpenOffice and/or pdf."
Rather frustrating to think that this archive is powered by a hidden plain text...

Best

Claude Almansi

On 1/29/09, peter jones <h2cmng @ yahoo.co.uk> wrote:

> Hi Claude,

>Thanks very much for these points. I've checked and 'accessibility' does not feature in the index.

(Claude Almansi) Interesting. When the archives of the Journal de Genève were announced last december, I first objected on the mailing list of the Swiss Internet User Group (SIUG) to the claim that they were "in free access" , in spite of their "text image" presentation made worse by scripts preventing full download (etc: see my post on the Innovate blog). Now SIUG is very committed to accessibility. During the revision of the Swiss copyright law, we lobbied for maintaining the exceptions to the interdiction of circumventing DRM, stressing that these exceptions were fundamental for blind people using screen readers [1].

Nevertheless, the first reactions were on the line of don't rock the boat: no other Swiss paper offers non-paying access to their archives yet, and if we are too critical, even this might be withdrawn. So I first submitted the post for the Innovate blog to Norbert Bollow, SIUG's chairman, and he OK'd it, as I had put the issue in a wider context.

Sure, a digitized text image is better than no digitized text at all. Yet real text is so much better for everybody [2].

> (Peter Jones) As you may have noticed I have HCI accessibility resources on links I.

(Claude Almansi) Yes, I did: that's why I sent you the question about accessibility in Gary Hall's book.

[1] For instance, I recorded an interview on DRM and assistive tech with Luca Mascaro, a computer accessibility specialist, posted it in:

http://noimedia.podspot.de/post/luca-mascaro-drm-e-tecnologie-assistive/

added the transcript in

http://noimedia.wikispaces.com/tecnologia_assistiva_e_DRM

and the English translation in

http://noimedia.wikispaces.com/assistive_tech_and_DRM

- then Norbert Bollow, SIUG's chairman, translated in German (see

http://siug.ch/URG/interview-mascaro-2007-05-09.html

and sent it to the members of the Judicial Commission, all within a few hours: because the content producers (IFPI, etc) had announced that they would exert the utmost pressures to have these exceptions to the interdiction of circumventing DRM removed from the law. Other groups lobbied for the exceptions from different view points (consumers' rights, open standards, culture) and in the end the exceptions were maintained.

[2] An interesting example of good digitizing:

"e-codices - Virtual Maniscript Library of Switzerland <http://www.e-codices.unifr.ch/en>.

They too give text images, but then how many sighted people can decipher a medieval or Renaissance MS, apart from scholars? So this is compensated by the fact you can switch at any time from the image view of a MS to its text description, written in simple language, with interesting details linked to the images various pages of the MS. Only issue: the descriptions are not translated in all 4 languages of the site - but then that's Switzerland :D

Entirely done with Open Source software: see
http://www.e-codices.unifr.ch/en/info/webapplication


Hi Peter,

About digitizing books: the forward is the second message of a discussion, with the first one under it, on the A2K [=Access to Knowledge] mailing list. The article amply quoted in the first message is "Google & the Future of Books" by Robert Darnton, New York Review of Books, dated Feb 12, 2009 but already on line at - <http://www.nybooks.com/articles/22281>. Archived at -
<http://www.webcitation.org/5e6Qtv7Xs> in case the NYRB removes stuff from their site after a while.

BTW, the A2K list members send very good info about copyright, copyleft and culture. Pity the list archive doesn't have an RSS feed or a search engine of its own, but as the archive is public, you can find relevant messages in a search engine by adding A2K to the search words.

Best,
Claude

To: Peter Jones, Claude Almansi
From: Gary Hall
Subject: Re: Review of Digitize This Book!
Date: 29 January 2009 20:39:39


Dear Claude, and Peter,

Actually, this is an area that Steve Green, who set up the CSeARCH archive with me, and who is also one of the co-founders of the Culture Machine journal I co-edit, is very interested in. One of his issues with the new design of Culture Machine - we recently moved over to Open Journal Systems as part of Open Humanites Press (http://www.openhumanitiespress.org), which is something else I'm also involved in - is that the text size, even at the highest option, is actually quick small, which might make it difficult to read for some people with poor vision. Also the fonts and colours can't be selected. Steve and a colleague are currently giving the new version of Culture Machine an accessibility audit for us. I'll let you know the results when I have them.

In the meantime, I've also raised the issue with my colleagues at Open Humanities Press, just to make certain we address it there, especially with regard to the open access book publishing strand we are in the process of establishing.

As far as Digitize This Book! is concerned, I'm afraid I don't mention accessibility in terms of the screen readers used by blind people there, no. I did have plans to include a chapter critiquing the notion, frequently heard within open access debates, that making academic research available online OA means they are available for everywhere, for everyone, for ever. My intention was to do so partly in terms of political economy (not every can afford access); partly in terms of language (what happens about translation issues and costs); and partly in terms of the geopolitics of academic publishing (whereby there are a few nations at the centre of this world who are exporting, and in effect universalising, their knowledge, a whole host of other nations outside the centre of the academic and publishing networks who may be able to import ‘universal’ theory, but who don’t have enough opportunities to publish, export or even develop their own ‘universal’ theory to rival those of Foucault, Derrida, Deleuze, Mouffe, Agamben, Badiou, Butler, Latour, Negri, Rancière et al). However, I must confess that in the little work I'd done on this chapter, I hadn't addressed the issue you raise regarding the use of text images.

(I'm aware that Google Books and The European Digital Library use images of texts, but I hadn't made the connection to the screen readers you mention.)

In the end the chapter I had planned didn't make it into Digitize This Book!, for reasons of time and space. But I may include it in the book I'm working on at the moment. If I do I'll certainly endeavour to address the issue there.

Thanks for bringing it to my attention. I'm very grateful.

My best to you both,
Gary

13 Feb (update) Additional links (c/o Deborah Elizabeth Finn - Information Systems Forum):

http://onlineadvocacy.tacticaltech.org - developed by Tactical Tech -
http://www.tacticaltech.org