Tag Archives: Digital Humanities

The value of XML and HTML in society today: a short analysis and reflection

The value of XML and HTML in society today: a short analysis and reflection

XML was originally developed by the World Wide Web Consortium (W3C) to overcome the limitations of HTML, which is the markup language for web page content. XML owes its name as an ‘extensible markup language’ to the fact that it can be used for a great variety of purposes because of the much greater freedom it allows, compared to HTML, to design various markup terms for different purposes. Therefore, its usages are many. It has been described as a basis for the ‘semantic web’, or the creation of a web of ‘linked data’ through the development of varied schema and customised markup vocabularies (or ‘languages’).(1) It has been applied frequently for purposes of data exchange and information management. Its usability for the former is great, so much so that XML has become the basis for ‘most electronic commerce applications’ and this has long been its ‘most popular’ usage.(2) However, XML has also been of value in the field of information management and, to some extent, publishing.

In the field of library and information studies, one good example of its usage has been the creation of the Encoded Archival Description (EAD) standard to better link the contents of various marked up archival catalogues, reflecting the intrinsic value of XML to the development of useful metadata standards.(3) A key feature of XML, as well as all good standards, is that they are non-proprietary in nature: their usage is not dependent on particular commercial software. Indeed, an XML document can be created, read and shared offline as well as online. This trait might be said to reflect the fact that ‘a formative influence’ on the creation of XML was the pre-existing Text Encoding Initiative (TEI), which has been called ‘the de facto standard for literary computing’ for the past few decades.(4) Its creation was motivated by the desire to create digital scholarly editions of texts than can be preserved perpetually.

A common schema in XML and TEI is the definition of particular document types. This is a process known as Document Type Definition (DTD). It is a schema that originated with SGML and creates a necessity for internal consistency within a marked up document for it to be ‘well formed’. Some well-formed XML and a valid TEI document are distinct entities, however. The schema adopted by the TEI programme is closer to the International Organisation for Standardisation (ISO) standard of RELAX NG than the XML schema as defined by the W3C and while the TEI guidelines for the creation of suitable elements in a text encoding are remarkably extensive they are also very specific.(5) In total, there are a total of 503 defined elements and 210 attributes, organised into 21 modules, included within the TEI Guidelines. However, this was simplified during 1995 to a ‘TEI Lite’ edition of the full TEI encoding schema and this consists of only 145 elements. TEI Lite has been judged to ‘meet the needs of 90% of the TEI community 90% of the time’.(6)

Although ‘XML exists because HTML was successful’,(7) in contrast to XML, which is usually used to cover the back end process of data management (making it particularly useful for the maintenance of very large websites, such as online archives and commercial ventures), HTML might be described as the markup language that is used for the ‘front page’ presentation of information online. Indeed, the very existence of HTML is fundamentally tied into the development of the Internet as a media or communications tool,(8) which has made an impact on society comparable to that made by the development of mass print journalism in the mid-nineteenth century or the development of television in the mid-twentieth century.(9)

The usage of HTML has transcended two limitations of traditional print media, of being bound by a physical format and the associated costs of production, precisely because a HTML file, or ‘document’, is essentially a computer file that can be viewed remotely using a web browser. The very meaning of the acronym HTML—Hypertext Markup Language—reflects the fact that hypertext is the technology that allows for the creation of links on the web, which could be said to be the most important feature within HTML. It is this process that enables HTML-based projects to facilitate the presentation, or linking, of multimedia content (such as audio and visual content in addition to text) at the one location or to link the locations of various computer files on different servers by means of the use of Uniform Resource Locators (URLs).(10)

Like many computer files, a HTML file, or ‘document’, is alterable and versatile. It can be combined with other technologies (including Cascading Style Sheets or ‘CSS’) to enhance its own text formatting, or presentation, options. Its functionality can be enhanced by the use of Hypertext Preprocessor (PHP) code, which can turn HTML files, or web pages, into ‘dynamic pages’ that can be processed by means of Relational Database Management Systems (RDBMS), facilitating the creation of ‘big data’ from website content.(11) The possibility of altering HTML files, or ‘web pages’, is what has created the idea of interactive, as opposed to static, websites (a development first nicknamed as ‘Web 2.0’). The options that exist in defining their functionality is also what enables them to be ‘responsive’: they can be designed so as to be represented differently depending on what device on which they are displayed. They can also be made searchable online through embedding ‘meta[data] tags’, or associated keywords, into the documents.



The centrality of XML to commercial transactions in the business world is undoubtedly the best, or most valuable, example of the use of text encoding within society today. Citing specific examples of this is not practicable because the schema used in various commercial transaction programmes are necessarily confidential in order to preserve, or protect, their integrity. If this is the reality of the world of data exchange, what can we say in conclusion about text encoding within the world of publishing?

Like any language, the value of markup languages is only as good as the uses for which they are applied. Markup has been defined as ‘any means of making explicit an interpretation of a text’ while a knowledge of markup techniques has been described as ‘a core competence of digital humanities’, so much so that text encoding (including TEI) ‘should be a central plank’ of digital humanities curricula. This is because text encoding creates ‘the foundation for almost any use of computers in the humanities’.(12) Effective practice is dependent on the existence of effective standards. This is why the creation of schema and standards for the presentation, processing and preservation of literary documents in a digital format through the TEI is undoubtedly important. However, an ability to use HTML for web design and XML for information management is also valuable. As a practice, digital humanities is related to information and library studies and archival science. However, the ‘digital humanities’ is also a scholarly discipline in the sense that it exists to encourage all students of the humanities to not only become literate in the use of text-encoding techniques but also to realise their value in both pursuing research questions and presenting research answers. In so far as this technological reorientation takes place, scholars within the humanities may be said to be effectively following what has already occurred within the world of government and business in terms of the effective management and presentation of information (a.k.a. data) so that it can be more readily, or easily, processed with a specific purpose in mind.


(2)Benoit Marchal, XML by example (Indianapolis, 2000), 2 (quote), 6-7


(4)Julianne Nyhan, ‘Text encoding and scholarly digital editions’, in C. Warwick, M. Terras, J. Nyhan (eds)Digital Humanities in practice (London, 2012), 117 (quote)



(7)Benoit Marchal, XML by example (Indianapolis, 2000), 7 (quote)

(8)Lee M. Cottrell, HTML and XHTML demystified (New York, 2011), chapter 1


(10)Lee M. Cottrell, HTML and XHTML demystified (New York, 2011), 4


(12)C. Warwick, M. Terras, J. Nyhan (eds)Digital Humanities in practice(London, 2012), 121


Is the open ethos of Digital Humanities something radical?


Miriam Posner of UCLA has recently suggested that digital humanities scholarship has an unrealised potential for radicalism by “critically investigating structures of power”. This is not necessarily a new idea. Almost a decade ago, Clay Shirky and other commentators suggested that a combination of the freedom of access to digital information and open source scholarship was bound to challenge long established practices of institutions, be they educational or even governmental. This idea may seem to be far less novel today in the light of the fact that open source scholarship has now governmental support both in the United States and the European Union. Nevertheless, Posner suggests that digital humanities scholarship still has the capacity to promote new ideas through the novel interrogation of sources and, in turn, the raising of new debates.

An example of this may be seen in the extensive responses to her own discussion piece. These responses are accessible from her own website and link one forum for debate with another, in the process  potentially bring greater vitality to each debate. For instance, Posner’s observations regarding how “profoundly ideological is the world being constructed around us with data” was linked by Angus Grieve-Smith to his own forum on the subject of “Technology and Language”. This expands upon her point that how we classify information, or even ideas, through language has tremendous power to shape how people think.

Does the digital research community reflect fully on the humanistic connotations of this reality? One might be inclined to answer this question with a simple “no”. For instance, Posner (who comes from a film-studies background) refers to the work of her UCLA colleague Anne Gilliland, who is a leading figure in the work of Library and Information Studies. Over the past decade, Gilliland has been a frequent contributor to new journals such as Archival Science on the role of Library and Information Studies specialists (such as herself) in redefining the archival profession through the development of new metadata standards in describing both records and collections. However, new metadata standards have not altered the traditional governmental objective of archives, which is to quantify information regarding both individuals and organisations in such a manner as to create records to enable more efficient governance.

It would be no exaggeration to say that this process of record creation has underpinned the concept of governance ever since the days of the Roman Empire and, in turn, shaped the very idea of civilisation itself. The existence of a logical process of recording and processing information has long been typified as the bedrock of civilisation, outside of which one may find only the chaotic world of nature where, in the absence of a logically ordered concept of society, there is only disorder, ignorance and unreformed barbarism. Nobody may like the idea of their personal identity being reduced to statistical information within a governmental record, but without this process political society would not essentially exist.

This raises an interesting concept, which is if an individual or organisation wishes to champion a particular cause (for instance, a campaign for social justice) then how they categorise that cause may, in itself, be the touchstone of its chances for success. Posner’s interest in feminist critiques of film studies prompts her to focus on ideas of gender and even race. Someone interested in archival science or history might focus more specifically on the concept of citizenship, for few words carry more consequential connotations than that term. If one can view classical civilisation as a root of civilisation, this was not only due to its preoccupation with the exercise of logic but also because it introduced the legal concept of citizenship. More often than not, this was defined against an idea of slavery. “Citizens” were protected by the law, while those who were not citizens had no legal rights at all and were generally classified as “slaves” or “barbarians” (hence the idea that society was a question of championing “civilisation” against “barbarism”). When people conceive of miscarriages of justice, or unwarranted subjugations of people, in our own day the first concept that generally springs into people’s minds (or, indeed, onto their lips) is still the question of “civic rights”.  Is Posner, therefore, essentially focusing on the idea of a “digital citizen”?

This term “digital citizen” has recently been invented to promote the idea of responsible use of the internet. It is based on a moral code of respecting and protecting both oneself and others through the use of the internet and an essentially legal code, based on the idea of respecting intellectual property. But can a “digital citizen” be more than this? Can the advent of the information highway allow non-governmental organisations to play a part in refining or, indeed, improving whatever ideas of citizenship may exist within those societies that they inhabit? Evidence may suggest that this process is already underway, in which case one may argue from Posner’s perspective that digital humanities scholars may actually have a key role to play in creating a meaningful concept of “digital citizenship” in the years ahead. In this, they may benefit from participating in debates with archivists regarding the relationship between human rights and recordkeeping. This has been a regular theme of archival conferences in recent years and a debate that was encouraged not least by the south African archivist Verne Harris.  Professor Anne Gilliland of UCLA will be speaking on this broad theme at the Liverpool University Centre for Archive Studies on 28 November 2016.