All – Knowledge Blog http://knowledgeblog.org Scientific Publishing for the Web Generation Mon, 19 Aug 2013 11:11:50 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.8 Moving to Github http://knowledgeblog.org/284 http://knowledgeblog.org/284#respond Mon, 11 Mar 2013 10:58:16 +0000 http://knowledgeblog.org/?p=284

Most of the knowledge blog software is now transitioning through to Github from its old home on Google Code. The code has been split up into mutiple repositories. Although this makes them harder to browse, it does make them easier to work with. It should also make it easier for others to offer contributions, which are always welcome.

]]>
http://knowledgeblog.org/284/feed 0
Knowledgeblog is Live http://knowledgeblog.org/252 http://knowledgeblog.org/252#respond Mon, 13 Feb 2012 12:48:29 +0000 http://knowledgeblog.org/?p=252

After a significant period, knowledgeblog is now live again. I can only apologise for the downtime. It clearly doesn’t do this site any good to be offline for so long. The sad story is that we were hacked and that restoration took a lot longer than it should have. As far as we can tell the content has been fully restored.

We have attempted to take this is positively as possible. It has given us a chance to clean up the server, as well as increase its size which should put us on a more stable footing for the future. We have also moved all mailing list functionality off the server, and onto Google.

During the downtime, all of the content was available from the UK Web Archive, but this would not have been obvious to most people as the link was available from this site — rather a chicken-and-egg situation which I hope we can address in the future.

Please let us know about any problems you find. Otherwise, welcome back!

]]>
http://knowledgeblog.org/252/feed 0
Introducing the Bioinformatics Knowledgeblog http://knowledgeblog.org/234 http://knowledgeblog.org/234#respond Tue, 28 Jun 2011 15:34:26 +0000 http://knowledgeblog.org/?p=234

Last week (21st June), a small group of bioinformaticians gathered in a little room in the basement of a building on the campus of Newcastle University, and quietly bashed out 17 articles about their discipline which now form the basis of the Bioinformatics Knowledgeblog.

The majority of these articles are tutorials, covering everything from large data integration suites like Ondex and cloud/grid computing infrastructure to metabolic modelling and some of the many facets of Ensembl.

We spent most of the day writing articles, and after about 4 hours dedicated to this, almost everyone had a post ready to publish (the advantage of short tutorial style articles is they are quick to write). By the end of the day all of the participants had published at least one piece, and many of them were reviewed on the day too (of the 17 articles, 10 have at least one review). The blog has also received short reviews from people who did not participate in the write-a-thon.

The blog has made a reasonably successful start, garnering around 1,400 page views in its first week, thanks to some positive Twitter traffic on the launch day, and a nice post on the Inside-R blog, cross-posted in a few places.

We would like to take this opportunity to thank all of the workshop participants, who worked so diligently to produce such an excellent resource on the day. We would also like to invite contributions from other bioinformaticians who think they have some tutorial material, or similar, that they would like to reach a larger audience, or if you have an idea for a short article that you would like to see hosted with similar material. All knowledgeblogs are made available under a CC-BY license, and we will provide every article with a DOI. If you have a contribution to make, contact admin@knowledgeblog.org.

]]>
http://knowledgeblog.org/234/feed 0
The Ontogenesis Knowledgeblog: Lightweight Semantic Publishing http://knowledgeblog.org/128 http://knowledgeblog.org/128#comments Tue, 07 Jun 2011 14:20:47 +0000 http://knowledgeblog.org/?p=128
The web has moved from a minority interest tool to one of the most heavily used platforms for publication. Despite originally being designed by and for academics, it has left academic publishing largely untouched; most papers are available on-line, but in PDF and are most easily read once printed. Here, we describe our experiments with using commodity web technology to replace the existing publishing process; the resource describing ontologies that we have developed with this platform; and, finally, the implications that this may have for publishing in a semantic web framework.

Authors

Phillip Lord Newcastle University Newcastle-upon-Tyne, UK

Simon Cockell Newcastle University Newcastle-upon-Tyne, UK

Daniel C. Swan Newcastle University Newcastle-upon-Tyne, UK

Robert Stevens University of Manchester Manchester, UK

Introduction

The Web was invented around 1990 as a light-weight mechanism for publication of documents, enabling scientists to share their knowledge, in the form of hypertext documents. Although scientists and later most academics, like the rest of society, have made heavy use of the web, it has not had a significant impact on the academic publication process. While most journals now have websites, the publication process is still based around paper documents or electronic representations of paper documents in the form of a PDF. Most conferences still handle submissions in the same way1. Books on the web, for example, are often limited to a table of contents.

For the authors (certainly from our personal experience), the process is dissatisfying; book writing is time-consuming, tiring and takes a number of years to come to fruition. If the book has one or a few authors, it tends to reflect only a narrow slice of opinion. Multi-author collected works tend to be even harder work for the editor than writing a book solo. Books do not change frequently; they are therefore out-of-date as soon as they are available. Authors feel a greater pressure for correctness, as they will have to live with the consequences of mistakes for the many years it takes to produce a second edition; most scientists welcome feedback, but being asked to justify something you wish you had not said becomes tiresome, especially if you are waiting to update it.

For the consumer of the material (either a human reader, or a computer), the experience is likewise limited. Books on paper are not searchable, not easy to carry around, are often not cheap to buy and more commonly very expensive to buy. For the computer, the material is hard to understand, or to parse. Even distinguishing basic structure (where do chapters start, who is the author, where is the legend for a given figure) is challenging.

All of this points to a need to exploit the Web for scientists to publish in a different way than simply replicating the old publishing process. Here, we describe our experiment with a new (to academia!) form of publishing: we have used widely-available and heavily used commodity software (WordPress [7]), running on low-end hardware, to develop a multi-author resource describing the use of ontologies in the life sciences (our main field of expertise). From this experience, we have built on and enhanced the basic platform to improve the author experience of publishing in this manner. We are now extending the platform further to enable the addition of light-weight semantics by authors to their own papers, without requiring authors to directly use semantic web technologies, and within their own tool environment. In short, we believe that this platform provides a ‘cheap and cheerful’ framework for semantic publishing.

The requirements

The initial motivation for this work came from our experience within the bio-ontology community3. Biomedicine is one of the largest domains for use of ontology technology, producing large and complex ontologies such as the Gene Ontology [28] or SNOMED [27].

As an ontologist, one of the most common questions that one has is: ‘where is there a book or a tutorial that I can read which describes how to build an ontology?’. Currently, there is some tutorial information on the web, there are some books; but there is not a clear answer to the question. Many of the books are collections of research-level papers, or are technologically biased. Currently many ontologists have learned their craft through years reading mailing lists, gathering information from the web and by word of mouth. We wished to develop a resource with short and succinct articles, published in a timely manner and freely available.

We wished, also, however to retain the core of academic publishing. This was for reasons both pragmatic, principled and political. Consider, for example, Wikipedia, that could otherwise serve as a model. Our own experience suggests that referencing Wikipedia can be dangerous: it can and does change over time meaning critical or supportive comments in other articles can be ‘orphaned’. Wikipedia maintains a ‘neutral point-of-view’ which, many are of the opinion, makes it less suitable for areas where knowledge is uncertain and disagreement frequent. Finally, Wikipedia is relatively anonymous in terms of authorship: whether this affects the quality of articles has been a topic of debate [17], but was not our primary concern; pragmatically, the promotion and career structure2 for most academics requires a form of professional narcissism; they cannot afford to contribute to a resource for which they cannot claim credit. Of course, our experiences may not be reflective of the body academic overall; there has, for example, been substantial discussion of the issues of expertise on Wikipedia itself [8]. Although the reasons may not be clear, it is clear that academics largely do not contribute to Wikipedia, and that Wikimedia sees this as an issue [16].

We also had an explicit set of non-functional requirements. We needed the resource to be easy to administer and low-cost, as this mirrored our resource availability; authors should be offered an easy-to-use publishing environment with minimal ‘setup’ costs, or they would be unlikely to contribute; readers should see a simple, but reasonably attractive and navigable website, or they would be unlikely to read.

The Ontogenesis experience

Our previous experience with the use of blog software within academia was limited to ‘traditional’ blogging: short pieces about either: the process of science (reports about conferences, or papers for example); journalistic articles about other peoples research; or, personal blogging, that is articles by people who just happen to be academics. Although we wished to develop different, more formal content, this experience suggests that many academics find blogging software convenient, straight-forward enough and useful.

To test this, we decided to hold a small workshop of 17 domain experts over a two day period, and task them with generating content, conduct peer-review of this content and publish it as articles on a blog.

Terminology and the Process

Like many communities, the blogosphere has developed its own and sometimes confusing terminology. To describe the process we adopted we first describe some of this terminology. A blog is a collection of web pages, usually with a common theme. These web pages can be divided into: posts that are published (or posted) on an explicit date and then unchanged; and pages that are not dated and can change. Posts and pages have permalinks: although they may be accessible via several URLs, they have one permalink that is stable and never changes. Posts and pages can be categorised – grouped under a predefined hierarchy – or tagged – grouped using ad hoc words or phrases defined at the point of use. A blog is usually hosted with a blog engine, such as WordPress that stores content in a database, combines it with style instructions in themes to generate the pages and posts. Most blog engines support extensions to their core functionality with plugins. Most blogs also support comments or short pieces of content added to a post or page by people other than the original authors. Most blog engines also support trackbacks which are bidirectional links: normally, a snippet from a linking post will appear as a comment in the linked to post. Trackbacks work both within a single blog and between different distributed blogs. Many blogs support remote posting: as well as using a web form for adding new content, users can also post from third party applications, through a programmatic interface using a protocol such as XML-RPC or even by email. Posts and pages are ultimately written in headless HTML (that part of HTML which appears inside the body element), although the different editing environments can hide this fact from the user.

Our initial process was designed to replicate the normal peer-review process, with a single adjustment, that peer-review was open and not blind: papers would be world-visible once submitted; the identities of reviewers would be known to authors; all reviews would be public. We adopted this approach for pragmatic reasons. WordPress has little support for authenticated viewing and none for anonymisation. The full process was as follows:

  • Authors write their content and publish using which ever tooling they find appropriate.

  • The author posts their content, categorising it as under review.

  • An editor assigns two reviewers.

  • Reviewers publish reviews as posts or comments. Reviews link to articles, resulting in a trackback from article to review.

  • The author modifies the post to address reviews.

  • Once done to the editors satisfaction, the post is recategorised as reviewed.

Our expectation was that following this process, articles would not be changed or updated; this is in stark contrast to common usage for wiki-based websites. New articles could, however, be written updating, extending or refuting old ones.

Reflections on the Ontogenesis K-Blog

Our initial meeting functioned to ‘bootstrap’ the Ontogenesis K-Blog. This was useful to acquire a critical mass of content, but also, on this first outing, to explore the K-Blogprocess and technology. The setup for the day was the vannilla WordPressinstallation. The day started with a short presentation on the K-Blogmanifesto [22] and an overview of the process, including authoring and reviewing. The guidelines to authors were to write short articles on an ontology subject (a list of suggestions was offered and authors also made their own choices) and to produce the article in whatever manner they felt appropriate. There was a certain level of uncertainty among authors as to the K-Blogprocess (partly because one of the objectives of the meeting was to ‘force out’ the process) and this, naturally, pointed to the need to document the K-Blogprocess so that authors could have the typical ‘instructions to authors’.

This first meeting produced a set of 20 completed and partially completed articles. Some even had reviews. Even on the day itself there was some external interest seen from Twitter. The first external blog post (outside of those produced by attendees) happened during the meeting [19] with a second shortly after [18].

We also held a second content provision meeting and together these generated a collection of articles that felt like an academic book in terms of content, but generated with considerably less effort. This experience was also sufficient to gather requirements on how to improve the K-Blogidea. A useful K-Blogon the K-Blogprocess itself was produced by Sean Bechhofer [13]. There is also a K-Bloglooking back on the first year of the Ontogenesis K-Blog [23].

Several requirements emerged with respect to authorship. The principle of the short, more or less self-contained article was attractive (though the audience were somewhat self-selecting). Authoring directly in the editor provided by WordPress was felt to be poor by those that tried it. Authoring in a favourite editing tool and then publishing via WordPress worked reasonably well for most authors. There were, however, a variety of issues with the mechanism of this style of publishing; referring to articles that will be, but have not yet, been written. To some extent this was an artefact of the day (many articles being written simultaneously), but authors needed to refer to glossaries and articles in progress.

One stylistic issue was the habit of putting full affiliations at the top of an article. The ontogenesis theme presents the first few lines when displaying many articles, but in many cases this was simply showing the title and author affiliation; where it would be more useful to have the first sentence or so of the article itself.

For the whole K-Blog, a table of contents was felt to be important. This would give an overview of contents and a simple place for navigation about the K-Blog. This raised the issue of attribution; the table of contents needed to expose the authors, including multiple, ordered authors. This is not an unsurprising need, as the authors’ scientific reputation is involved. In this vein, making K-Blogarticles citable by issuing of Digital Object Identifiers (DOI) was requested.

For scientific credibility, the ability to handle citations easily was an obvious requirement. Natively, WordPresshas little or no support for styling citations and references. The ability to cite via DOI and, in this field, PubMed identifiers to automatically make links and produce a reference list was felt to be important. Also, having the Ontogenesis K-Blogarticles in PubMed would also be attractive to authors.

The last authorship issue was the mutability of articles. One aim of K-Blogis to enable articles to change in the light of experience and scientific development, as well as a procedural requirement for updates following review. There was felt to be a conflicting need for articles not to change, so that comments and links from other documents work in the longer term.

The last significant issue was the reviewing of articles. The aim was to have this managed by authors choosing reviewers (with editorial oversight). On the Ontogenesis K-Blogday this could work with authors calling across the room for a review. This is, however, not a sustainable approach. WordPress, however, lacks tracking facilities to manage the reviewing process, whether this is done by an author or an editor. The realisation that such management support is needed is not the greatest insight ever gained, but the requirement is there even in a light weight publishing mechanism.

Improvements to the technology

Our initial experiment with the ontogenesis K-Blogsuggested a significant number of issues with the use of WordPressfor scientific publication. In this section, we describe the extensions that we have made or used to the publication process, documentation or to WordPressitself. Following our initial experience with Ontogenesis, we have started to trial these improvements, including through another workshop which resulted in a new K-Blog [12], describing the scientific workflow engine Taverna [24]; work is also in progress on the use of a K-Blogfor bioinformatics [1], and another for public healthcare [3].

Currently, we have 11 plugins extending the basic WordPressenvironment. For completeness, all of these are shown in Table 1. Our theme is also extended in some places to support the plugins. In general, the plugins are orthogonal and will work independently of each other. One advantage of using WordPressis that many of these plugins are freely available, written and maintained by other authors; while other academic publication environments, such as the Open Journal System [5] exist and are relatively widely-used, but WordPress is used to host perhaps 10% of the web, making the plugin ecosystem extremely fertile.

Plugin

Use

URL

Co-Authors Plus

Allows K-Blog posts to have more than one author

http://wordpress.org/extend/plugins/co-authors-plus/

COinS Metadata Exposer †

Provides COinS metadata on K-Blog posts (used by Zotero, Mendeley etc)

http://code.google.com/p/knowledgeblog/

Edit Flow

Gives editorial process management infrastructure

http://editflow.org/

ePub Export

Exports K-Blog posts as ePub documents

http://wordpress.org/extend/plugins/epub-export/

KCite

Automatic processing of DOIs and PMIDs into in-text citations and bibliographies

http://knowledgeblog.org/kcite-plugin

Knowledgeblog Post Metadata Plugin

Exposes generic metadata in post headers

http://code.google.com/p/knowledgeblog/

Knowledgeblog Table of Contents

Produces a table of contents based on a category of articles. Posts are listed with all authors

http://knowledgeblog.org/knowledgeblog-table-of-contents-plugin

Mathjax LaTeX

Enables use of TeXor MathML in posts, rendered in scalable web fonts

http://knowledgeblog.org/mathjax-latex-wordpress-plugin

Post Revision Display

Publicly exposes all revisions of an article after publication

http://wordpress.org/extend/plugins/post-revision-display/

SyntaxHighlighter Evolved

Syntax Highlights source code embedded in posts

http://wordpress.org/extend/plugins/syntaxhighlighter/

WP Post to PDF

Allows visitors to download posts in PDF format

http://wordpress.org/extend/plugins/wp-post-to-pdf/

Table 1: WordPress plugins employed by K-Blog. Plugins marked with are written by the authors. Plugins marked with are modified by the authors.

Reviewing: The initial process was self-managed and required two reviews per article; this was found to be cumbersome. We have addressed this in two ways; first, we have defined a number of different peer-review levels (public review, author review, editorial review [15]), including a light-weight process now being used for Ontogenesis; authors now select their own reviewers, and decide for themselves when articles are complete. Second, we have added software support. Initially, we attempted to use RequestTracker – an open source ticket system, but found the user interface too complex for this purpose. We are now using the EditFlow plugin to WordPress that was designed for managing a review process—albeit a hierarchical rather than peer-review process.

Authoring Environment: The standard WordPresseditor was found impractical by most authors, even for short articles. WordPressdoes provide ‘paste from word’ functionality, but this removes all formatting which defeats the point. While the lack of a good editing environment could have been a significant problem, our subsequent experimentation has shown that it is possible to post directly from a wide variety of tools, including ‘office’ tools such as Word, Google Docs, LiveWriter and OpenOffice. This is in addition to a variety of blog-specific tools and text formats (such as asciidoc), which are suitable for some users. We have added documentation to a kblog (http://process.knowledgeblog.org) to address these. In practice, only LaTeX proved problematic having no specific support. To address this, we have produced a tool called latextowordpress; this is an adaptation of the plasTeX tool, a python based TeX processor, to produce simplified HTML appropriate for WordPresspublishing. Our experience with using the tools is that while none are perfect, sometimes requiring ‘tweaking’ of HTML in WordPress, most reduce publishing time to seconds.

Citations: We have addressed the lack of support for citations within WordPresswith a plugin called kcite. This allows authors to add citations into documents as shortcodes with either a DOI or Pubmed ID (other identifiers can and are being added to kcite). Shortcodes are a commonly used form of markup of the form: [tag att=”att”]text[/tag]; they are often found where a simplified HTML-like markup is desired. A bibliography is then generated automatically on the web server. Requiring authors to add markup to otherwise WYSIWYG tools is damaging to the user experience. We believe that this is soluable, however, by extending bibliographic tools, by developing a ‘kcite’ style-file or template; we have a prototype of this (using CSL [10]) for Zotero and Mendeley, and another for asciidoc with bibtex. It is also possible to just use native tool support in Word or LaTeX, and convert bibliographies to HTML; the disadvantage with this approach is discussed later.

Archiving and Searching: Archiving is primarly a social, rather than technological, problem. A blog engine is fully capable of storing content in the long-term, but authors and readers have to believe that it will do so. As a novel form of academic publishing, K-Blogis not automatically archived by as a scientific journal. However, we have taken advantage of its web publication; the main K-Blogsite is now explicitly archived by the UK Web Archive, as well as implicitly by other web archives. We have enhanced the website with an ‘easy crawl’ plugin–that is a single web page pointing to add articles classified as reviewed. We now support the (technical) requirements for LOCKSS and Pubmed. Simultaneously, this also enhances the searchability of K-Blog, fulfilling the requirements for Google scholar.

Non-repudiability: The K-Blogprocess does not allow authors to make semantically meaningful changes after an article has been reviewed. Unfortunately, it is hard to define ‘semantically meaningful’ computationally, so we have made no attempt to address this by locking articles; rather, all versions of articles are now accessible to the reader (WordPressprovides this facility to the authors by default). This enables community enforcement of a no-change policy.

Multiple Authors: We believe that authoring is best done outside WordPress. This also means that we do not support multiple-authorship; we have made no attempt to add collaborative features to WordPress. However, we did need articles to carry a byline attributing the articles to multiple authors; although not critical to the functioning of a K-Blog, it is socially critical to appease the professional narcissism (see Section ) of scientists. Fortunately, this is a common requirement, and a suitable WordPressplugin existed.

Identifiers: WordPress already supports permalinks; although we believe that URLs are entirely fit for purpose technologically while DOIs do little other than introduce complexity [11], K-Blogrequired DOIs for professional narcissism. We considered becoming an DOI authority, but this proved impractical. Instead, we have used DataCite [2]. This has required a small extension to WordPress to extract appropriate metadata and to store the DOIs once minted.

Metadata: K-Blognow uncovers various parts of its metadata in a number of ways; unfortunately, there appear to be a large number of (non-)standards in use, each with its own application. K-Blogcurrently provides: COiNS, enabling integration with Zotero and Mendeley; meta tags for Google Scholar; and Dublin Core tags for no specific reason than completeness. We are in the process of providing bibtex export (for bibtex!), and a JSON representation to support citeproc-js [14] in the second generation of kcite.

Mathematics and Presentation: We have also provided several pieces of technology that did not stem from concrete requirements arising from the initial Ontogenesis meeting. We have improved parts of the presentation system by adding, for example, syntax highlighting to code blocks. Additionally, we have created the mathjax-latex plugin enabling the use of TeX(or MathML) markup in posts that are then rendered in the browser using scalable fonts. WordPresshas native math-mode TeX support, but using image fonts which do not scale and have an ugly pixelated display.

Discussion

We have been motivated by a lack of enthusiasm for traditional book publishing to devise another mechanism by which we can achieve the same ends. We wished to avoid the downsides of an ‘all or nothing’ approach to creating a ‘static’ paper document that is read by relatively few people due to price. The K-Blogapproach allows authors to publish in a piecemeal fashion; writing only that which they are motivated to write using a mechanism that avoids a third party making arbitrary decisions on formatting with peculiar time-scales.

To avoid all this, the K-Blogis a light-weight publishing process based on commodity blogging software. We have taken an approach of writing short articles around a theme of ‘ontology in biology’; the Ontogenesis K-Blog. At the time of writing we have 26 articles and page viewing numbers that are pleasing (see Figure 1). These statistics are generated by WordPressdirectly, and represent (an approximation of) ‘real’ page reads, with robot and self-viewing removed. This is confirmed by the ten most read articles (Table 2) that reflect our expectations – ‘What is an ontology’ being first. In this sense, we consider the K-Blogprocess to be a success, especially when considered against the circulation of an equivalent book.

Figure 1: Month page view statistics for the Ontogenesis K-Blog.

What is an ontology?

1,737

OWL Syntaxes

1,246

Ontology Learning

882

Table of Contents

740

What is an upper level ontology?

684

Reference and Application Ontologies

630

Protege & Protege-OWL

522

Semantic Integration in the Life Sciences

517

Automatic maintenance of multiple inheritance ontologies

469

Ontologies for Sharing, Ontologies for Use

330

Table 2: Most Viewed articles for the Ontogenesis K-Blog(Totals).

The social processes with K-Blogare largely similar to traditional publishing, with one exception – reviewing is public. While we may have been interested in experimenting with this for principled reasons, in practice we adopted it because we did not know how to support blind anonymous review with WordPress. Open review is not a new idea: Request For Comments are common in standards processes; both Nupedia [4] (the fore-runner of Wikipedia) and H2G2 [6] (which predates Nupedia) use public peer-review. It is still, however, unusual in academia. In our experience from Ontogenesis, it raised no worries from among our contributors, except that reviewers often wanted to be more involved in the proofing, a role normally played by authors low down the author list; open review processes blurs these lines somewhat.

One open area for the discussion is the extent to which authors can, should be and wish to change articles after publication. While the ability to update is inherent in the web, the desire for non-repudiability was considered to be important; the contradiction here appears fundamental, and we do not feel we have reached a good compromise yet. In one sense, our use of the post-revision display plugin solves this problem; even if the article changes, it is still possible to refer to a specific version. However, like all automated versioning tools, many versions get recorded often with very fine-grained changes, which makes selection of the ‘right’ version hard to impossible. We could replace this with an explicit versioning tool, similar to a source code versioning system; but these systems are hard-to-use for those unused to them, as well as being difficult to implement well. An environment like K-Blog, however, does allow rapid publication of and bi-directional linking with articles; combined with typed linking with CiTO, the ability to publish erratum, addendum and second editions may be a better solution.

Our experiences with K-Blog, we think, are useful in understanding how semantic web technology can and will impact on the publication and library process. Both from our initial work with Ontogenesis, and subsequent work with http://taverna.knowledgeblog.org, it has become obvious that good tool support is critical. ‘Good’ in this sense can be straight-forwardly interpreted as ‘familiar’ that in general can be interpreted as MS Word. Our choice of a blogging engine here was (unexpectedly) well-advised, as this form of publication is already supported by many tools. It is also clear that there are many other tools that could be added; while Ontogenesis has the content, for example, that might be found in an academic book, it does not currently have the presentation of the book. Articles are already available as ePUB, and more recent work has used our Table of Contents plugin to provide a single site-wide ePUB of all articles [25]. Pre-existing tools such as Anthologize [9] may also be useful for adding organised collections of articles gathered from the whole.

This has a direct implication on the addition of further semantics to content. On the positive side, the use of WordPress makes semantic additions plausible in a way that many conventional publishing processes do not. For example, the publication of our (PWL, RS) recent paper [20] required conversion from the LaTeX source to PDF (by latex), to another PDF, to a MS Word file (by hand), to XML before arriving at the final HTML form. This process took many weeks, required multiple interactions between the authors and publisher. It still failed to preserve the semantic use (to humans) of Courier font highlighting in-text ontology terms and requiring post-publication correction. The equivalent blog post [21] gave us nearly instantaneous feedback on the final form, allowing us to check that the semantics was present and correct.

The requirements for semantics have, however, to be light. We have concentrated throughout K-Blog on the ease of delivery of content; even with this focus, it is hard. In most cases, asking for more work, for more semantics than authors are used to giving in papers is problematic. For example, I (PWL) attempted to add microformat-based markup to Ontogenesis, again, identifying ontology terms. So far, all article authors have ignored this markup (including, embarrasingly, myself).

One solution to this issue is to ensure that authors themselves benefit directly from extra semantics. For example, the Mathjax-Latex plugin allows WordPressto present mathematics in TeX or MathML markup in the final document, which is more semantically meaningful than the default WordPress behaviour of rendering an image. From the authors perspective, it also enables the use of TeX markup in Word, and the end product scales and looks less ugly on the web page.

With Kcite, we allow the user to embed DOIs or Pubmed IDs; this can be achieved at no cost to the user, if they already use a bibliography tool, as it can transparently produce citations for them using Kcite shortcodes. Development versions of Kcite already allow easy switching of bibliographic style that we hope will become at the option of the author (rather than the website or publisher as is currently the case), and/or the reader. With this additional information, we can also embed more semantics into the end document at no additional cost to the author, using for example the least specific CiTO cites term. However, further use of CiTO that will require the author to decide which term to use, with relatively little gain to themselves, and may require extension to bibliographic tools if we are to maintain transparency of Kcite shortcodes; even if the tools are present, it is unclear whether authors will use them. We note that semantics useful to domain authors is likely to be domain-specific; mathematicians are more likely to care about maths presentation, but less likely to care about Pubmed IDs. We need to be able to extend the publishing model and environment for different journals to cope.

From a technological perspective, we have found the use of shortcodes to be a good mechanism for readers to add semantics. They are simple and relatively easy to understand. In some cases they can be hidden from the user entirely; forcing users to add markup to otherwise WYSIWYG environments such as MS Word is best avoided. Although the direct use of a more standard XML markup would seem more sensible, in practice it requires tool support, as XML markup will be escaped by helpful remote posting tools. Extension of remote posting tools is hard (for tools like MS Word) or impossible (for cloud tools such as Google Docs or LiveWriter). A blogging engine such as WordPress makes it trivial to replace shortcodes both with a presentation format and machine interpretable microformat; for example, the development version of Kcite transforms DOI short codes ([cite]10.232/43243[/cite]) into in-text citations (Smith et al, (2002)) embedded in a span tag (<span kcite-id="10.232/43243">Smith et al, (2002)</span>) that are subsequently transformed into final presentation form within the browser using Javascript. The presentation form can also support additional semantic markup such as CiTO [26].

Although we believe that additional semantics are a good thing, we will not enforce a requirement for additional semantics on authors. If authors choose not to use kcite, then this is their choice. We need to show that they are useful. Our experience with many (non)standards such as CoINS, DOIs, OAI-ORE, LOCKSS is that they are not simple, speaking primarily to publishers or librarians. For a semantic web approach to work, it must focus on authors and readers, as they produce and consume the content. Extracting even light-weight semantics even from authors who are ontology experts is hard. For other domains, the situation may be worse.

Current publishing practices make use of semantic web technology impractical; semantics added by authors are unlikely to be represented correctly if the end product is a PDF typeset by hand. More over, we can see little point adding semantics to individual articles if this is done in a bespoke way. With K-Blog, we have focused on providing both content, and a full process, with review, using existing tools and workflows, adding semantics secondarily or incidentally where we can. As a result, the level of semantics that we have achieved is light-weight. However, we believe that K-Blog and WordPress combined with associated tooling provides all the basic requirements for a publishing process, and that it provides an attractive framework on which to build a semantic web.

Acknowledgements

We would like to acknowledge the contribution of the authors of articles for both the Ontogenesis and Taverna K-Blog, whose feedback was essential for this process. K-Blogis currently funded by JISC.

Bibliography

[1]

Bioinformatics. http://bioinformatics.knowledgeblog.org.

[2]

Datacite. http://datacite.org/.

[3]

Health and Public Health. http://health.knowledgeblog.org.

[4]

Nupedia. http://en.wikipedia.org/wiki/Nupedia.

[5]

Open Journal System. http://pkp.sfu.ca/?q=ojs.

[6]

The Guide to Life, the Universe and Everything. http://www.bbc.co.uk/h2g2/.

[7]

WordPress. http://www.wordpress.org.

[8]

Wikipedia:expert retention, 2008. http://en.wikipedia.org/wiki/Wikipedia:Expert_retention.

[9]

Anthologize, 2010. http://anthologize.org/.

[10]

Citation style language, 2010. http://www.citations-styles.org.

[11]

The problem with DOIs, 2011. http://www.russet.org.uk/blog/2011/02/the-problem-with-dois/.

[12]

The Taverna Knowledgeblog, 2011. http://taverna.knowledgeblog.org.

[13]

Sean Bechhofer. Reflections on blogging a book. Ontogenesis, 2011. http://ontogenesis.knowledgeblog.org/647.

[14]

Frank Bennett. Citeproc-js. https://bitbucket.org/fbennett/citeproc-js/wiki/Home.

[15]

Simon Cockell, Dan Swan, and Phillip Lord. Knowledgeblog types and peer-review levels. Process, 2010. http://process.knowledgeblog.org/archives/19.

[16]

Zoe Corbyn. Wikipedia wants more contributions from academics, 2011. http://www.guardian.co.uk/education/2011/mar/29/wikipedia-survey-academ%
ic-contributions
.

[17]

Casper Grathwohl. Wikipedia comes of age. The Chronile of Higher Education, 2011. http://chronicle.com/article/article-content/125899/.

[18]

D. Kell. Metabolomics, food security and blogging a book, 2010. http://blogs.bbsrc.ac.uk/index.php/2010/01/metabolomics-food-security-b%
logging-book/
.

[19]

Jim Logan. What is an ontology? | ontogenesis, 2010. http://ontogoo.blogspot.com/2010/01/what-is-ontology-ontogenesis.html.

[20]

Phillip Lord and Robert Stevens. Adding a little reality to building ontologies for biology. PLoS One, 2010.

[21]

Phillip Lord and Robert Stevens. Adding a little reality to building ontologies for biology, 2010. http://www.russet.org.uk/blog/2010/07/realism-and-science/.

[22]

Phillip Lord and Robert Stevens. The Ontogenesis Manifesto, 2010. http://ontogenesis.knowledgeblog.org/manifesto.

[23]

Phillip Lord and Robert Stevens. Ontogenesis: One year one. Ontogenesis, 2011. http://ontogenesis.knowledgeblog.org/1063.

[24]

Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat, and Chris Wroe. Taverna: lessons in creating a workflow environment for the life sciences: Research articles. Concurr. Comput. : Pract. Exper., 18:1067–1100, August 2006.

[25]

Peter Sefton. Making epub from wordpress (and other) web collections, 2011. http://jiscpub.blogs.edina.ac.uk/2011/05/25/making-epub-from-wordpress-%
and-other-web-collections/
.

[26]

David Shotton. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1):S6, 2010.

[27]

M.Q. Stearns, C. Price, K.A. Spackman, and A.Y. Wang. SNOMED clinical terms: overview of the development process and project status. In AMIA Fall Symposium (AMIA-2001), pages 662–666. Henley & Belfus, 2001.

[28]

The Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biology. Nature Genetics, 25:25–29, 2000.

]]>
http://knowledgeblog.org/128/feed 1
Announcing a Bioinformatics Kblog writeathon http://knowledgeblog.org/131 http://knowledgeblog.org/131#comments Mon, 09 May 2011 13:24:53 +0000 http://knowledgeblog.org/?p=131

The Knowledgeblog team is holding a ‘writeathon’ to produce content for a tutorial-focused bioinformatics kblog.

The event will be taking place in Newcastle on the 21st June 2011.  We’re looking for volunteer contributors who would like to join us in Newcastle on the day, or would like to contribute tutorial material remotely to the project.

We will be sending invites shortly to a few invited contributors but are looking for a total of 15 to 20 participants in total.

Travel and accomodation costs (where appropriate) can be reimbursed.

If you would like to contribute tutorial material on microarray analysis, proteomics, next-generation sequencing, bioinformatics workflow development, bioinformatics database resources, network analysis or data integration and recieve a citable DOI for your work please get in touch with us at admin@knowledgeblog.org

For more information about Knowledgeblog please see http://knowledgeblog.org.  For examples of existing Knowledgeblogs please see http://ontogeneis.knowledgeblog.org and http://taverna.knowledgeblog.org.

]]>
http://knowledgeblog.org/131/feed 4
JISCMRD International Workshop http://knowledgeblog.org/117 http://knowledgeblog.org/117#respond Tue, 29 Mar 2011 15:50:34 +0000 http://knowledgeblog.org/?p=117

Phil Lord from the Knowledgeblog project has been at the JISCMRD International Workshop in Birmingham this week. There he gave a talk:

Here is a word cloud generated from the tweets from the meeting. @’s and #’s have been removed.

EDIT – some have suggested that the word cloud might be more interesting with the word “data” removed. So here is the alternative:

NB – Word clouds generated by Wordle.

]]>
http://knowledgeblog.org/117/feed 0
MathJax-LaTeX version 0.2 released http://knowledgeblog.org/106 http://knowledgeblog.org/106#comments Wed, 09 Feb 2011 16:41:16 +0000 http://knowledgeblog.org/?p=106

The new release of Knowledgeblog’s MathJax-LaTeX plugin is now available from wordpress.org.

This release introduces the ability to configure the location of the MathJax javascript library for your system, so if you are already using MathJax on your server for some other purpose, you don’t have to install a second instance just for the plugin to work. Other than this important addition the plugin functions as before, follow the installation instructions to get it working on your site.

Examples

Below are some examples of the syntax for MathJax-LaTeX, and the equations that are rendered in response.

The probability of getting \(k\) heads when flipping \(n\) coins:

[latex syntax='display']P(E)   = {n \choose k} p^k (1-p)^{ n-k}[/latex]

This is an inline equation: [latex syntax='inline']\sqrt{3x-1}+(1+x)^2[/latex]  it should be rendered without affecting the text around it.

OK, one more, definition of \(e\):

[latex syntax='display']e = \lim_{n\to\infty} \left( 1 + \frac{1}{n} \right)^n[/latex]

[mathjax]

The probability of getting \(k\) heads when flipping \(n\) coins:

\[P(E) = {n \choose k} p^k (1-p)^{ n-k}\]

This is an inline equation: \(\sqrt{3x-1}+(1+x)^2\)  it should be rendered without affecting the text around it.

OK, one more, definition of \(e\):

\[e = \lim_{n\to\infty} \left( 1 + \frac{1}{n} \right)^n\] ]]>
http://knowledgeblog.org/106/feed 2
Taverna Knowledgeblog http://knowledgeblog.org/103 http://knowledgeblog.org/103#respond Fri, 21 Jan 2011 14:57:32 +0000 http://knowledgeblog.org/?p=103

We are pleased to announce that new Taverna knowledgeblog is now available. Taverna is a widely used scientific workflow editor and processor. With a wide-range of different users, the project has generated significant amounts of tutorial material over the years; the rapid publication environment provided by a kblog has provided a suitable framework for this material.

This initial material was generated at a workshop on 13th December funded as part of the JISC knowledgeblog grant. The workshop was attended by 10 researchers who work with or on Taverna in some capacity. As Katy Wolstencroft, University of Manchester said: “We have been planning to write a book about Taverna for some time, but traditional publishing methods are too slow to cope with the fast-paced development of our software and the resulting capabilities of the workflows. With the knowledgeblog, we were able to produce 21 new articles in a 1 day workshop, which we can add to whenever it is required”

These articles range from a first-steps introduction to Taverna and workflows, through to advanced topics such as running R scripts in Taverna and case studies for real-world scenarios where Taverna may be an invaluable tool.

]]>
http://knowledgeblog.org/103/feed 0
Research Outcomes http://knowledgeblog.org/93 http://knowledgeblog.org/93#respond Thu, 13 Jan 2011 10:48:16 +0000 http://knowledgeblog.org/?p=93

We’re pleased that that research outcomes blog has found us of interest. They are investigating ways to increase the accessibility of the outcomes of research projects and are maintaining an active blog as a result. A by-product of this is that they are offering training on how to use WordPress.

It’s good to see another example of a project using commodity software as a mechanism for the releasing the output of the research.

]]>
http://knowledgeblog.org/93/feed 0
A New Grant For Knowledge Blog http://knowledgeblog.org/62 http://knowledgeblog.org/62#comments Mon, 02 Aug 2010 14:26:18 +0000 http://knowledgeblog.org/2010/08/02/a-new-grant-for-knowledge-blog/

 

I’m very pleased to announce that knowledgeblog has received it’s first funding from JISC. This should enable us to enhance both the process, site and the content over the next year.

The grant proposal is attached here – the conversion from Word is imperfect, so apologies for this.

 

 

Outline Project Description

The project extends existing blogging tools for use as a lightweight, semantically linked publication environment. This enables researchers to create a hub in the linked-data environment, that we call knowledge or k-blogs. K-blogs are convenient and straight-forward for authors to use, integrating into researchers existing work practices and tools. The provide readers with distributed feedback and commenting mechanisms. We will support three communities (microarray, public health and workflow), providing immediate benefit, in addition to the long term benefit of the platform as a whole. Additionally, this will enable a user-centric development approach, while showcasing the platform as the basis for next generation research publishing.

1. Introduction

1This document describes a proposal for a project within the JISC “Managing research Data” call. Data comes in many forms, from raw statistics, to highly structured databases, through to textual reports; natural language, although hard to search and manage, is still the richest form of representation; data in the form of reports and publications are the central hub around which all other data sit. This project, therefore, will provide a lightweight, yet extensible, framework for scientific publishing, incorporating a software-supported peer-review process. Bi-directional links will be maintained both between publications and to other forms of data, using semantic markup to enhance the meaning of these links. We will also customize this framework for three communities which, as well as being directly useful, will provide real-world requirements. The project will largely develop “glue” between existing, widely-used, open-source software systems, ensuring its sustainability and usefulness past the end of the funding.

2. Fit to Programme Objectives and Project Outline


2The project call identifies the complexity and hybrid nature of the UK research data environment; despite this, one central focal point remains — most researchers spend considerable amounts of time discussing their data in the form of “paper” publications. For some, more theoretical disciplines, such as parts of computer science, the paper is the sole output; in others, such as biology, datasets are associated with papers and the barriers between “publication” and “data” are breaking down; most data sources in biology are rich in annotation; text that supports and explains the raw data. It is normally the annotation, not the raw data, which defines the quality of the resource. In these cases, text is an intrinsic part of the data.

3However, the conventional publication process has changed relatively little; the adoption of web technologies have largely been used as a distribution mechanism. Publications are still expensive — either at subscription or publication time, depending on the business model of the publisher, and involve considerable, time-consuming interactions between author and publisher, often relating to display and presentation issues. This is in stark contrast to, for example, the biological data centres where both raw and annotated data are often made available within hours of their generation.

4This situation is unfortunate because it limits the ability of researchers to customise their publication process for the requirements of their own discipline. As demonstrated by Shotton et al, and Rousay et al, it is possible to add considerable value, both enhancing the paper for the reader, as well as providing direct and semantically enhanced links to underlying data. The cost of the existing process, however, makes this form of publication unlikely for some data; for example, few scientists publish papers about negative results, resulting in an acknowledged publication bias,. As a result, it is hard for the semantically enhanced publication to take its place as the central hub for a linked data environment as envisioned by Coles and Frey, linking to and between research datasets, and the published knowledge about these datasets.

5In the last decade, the blog has become a common, web-based publication framework. There are now numerous off-the-shelf tools and platforms for managing blogs, providing a high-degree of functionality. Many scientists blog about their work, about other published work (research blogging) or “live blog” about conferences and talks as they happen. In this case, the researcher is in-charge of their own publication environment, can extend it to their requirements, and publication happens immediately. However, the blog has not yet become a standard means of publication for primary research output.

6Recently, as part of the EPSRC funded Ontogenesis network (ref), we trialled the Knowledge Blog process; in this case aimed at producing an educational resource describing many aspects of ontology development and usage, which might previously have been published in book form. We have shown that with this technology base, it is possible to replicate many of the features of the open peer-review, scientific book publication process; following two small meetings, we have written around 20 articles, and the website maintains around 1000 post reads per month (not simple hits!). To achieve this, we used only two features of the blog — trackbacks (bidirectional links) and categories (hierarchical keywords); although we used the WordPress blogging software, these features are supported by most other systems. We call these articles k-blogs.

7Currently, however, the k-blog process is not fully supported with blog software alone, nor does it fully support the referencing, advanced linking and provenance needed specifically for research publications. For this project, we propose to provide extensions to support data-rich publications, deeply and semantically linked to other k-blogs and to other forms of data repository. Therefore, the project addresses the objectives and aims of the call through four main workpackages.

1) A documented k-blog process (WP1.1) describing different levels of  peer-review suitable for different forms of research data. An implementation (WP1.2), the k-blog platform, of these process based around open-source, off-the-shelf software.

2) Extensions to the k-blog platform supporting linking. This includes full support for referencing including COINS metadata on posts (WP2.1), client-side and permanently linked versions (WP2.2) and bidirectional links (WP2.3) to other data sets. We will add semantics to these links using the Citation Ontology (CiTO) (WP2.4).

3) Support for three specialist environments—healthcare (WP3.1), microarray (WP3.2) and workflows (WP3.3). All useful in their own right and showcasing the extensibility of the framework.

4) Documentation and tooling to integrate the k-blog process into scientists existing working practice and tooling; scientists will be able to publish from Word, OpenOffice, Google Docs or LaTeX (WP4.1). We will add tooling and documentation, as WP4.2, to support the use of reference management tools such as Endnote, Mendeley or Zotero, making use of deliverables from WP2.

3. Quality of proposal and Robustness of Workplan

 

3.1 WP1: Knowledge Blog Process

8In this project, we aim to develop a light-weight publication framework, including the desirable aspects of the formal peer-review process. However, different forms of scientific publication require different levels of peer-review. For example, for http://ontogenesis.knowledgeblog.org, we require two reviews from an editorial board, assessing quality, appropriate for an educational resource. However, for http://process.knowledgeblog.org, which is intended to contain informal “how-to” and request for comment documents, a much lighter-weight, single editorial review assessing scope alone is more appropriate. Deliverable WP1.1 will consist of documentation describing both formally and informally, a number of levels for the knowledge blog process, and how these can be achieved using a blog. These documents will, themselves, be published on http://process.knowledgeblog.org.

9These processes will be implemented as Deliverable WP1.2, comprising freely available and widely used pieces of software, with additional “glue”. The basic publication framework will use WordPress 3 (WoP) — an open-source, multi-site, multi-author blogging system used to provide the hosted blog service at http://www.wordpress.com. While, we have found that WoP supports many aspects of this process, particularly from the readers perspective, a significant degree of “book-keeping” is required from authors, reviewers and editors. Readers know whether a paper has been reviewed or not, but authors have to remember for themselves who is reviewing the paper. Therefore, we will use a “ticket system”, specifically Request Tracker 3 (RT) (http://bestpractical.com/rt/). Both WoP and RT are extensible with plugins and will be extended and adapted to reflect the k-blog levels of WP1.1.

10We will use this extensibility to provide a light-weight integration. RT operates as an email response system; by extending WoP to send email on submission of new papers, this can provide both an integration point, as well as the main point of interaction for authors, reviewers and editors. To provide editorial and reviewer functionality tickets can be moved between queues; extensions to RT will use standard blogging XML-RPC calls to feedback to WoP by, for example, re-categorising papers once accepted. OpenID (http://openid.net) will be used to integrate the user accounts between the two systems. WoP already supports this fully, while RT supports it in skeleton form.

11Although we will provide an implementation of the k-blog process, it will be described sufficiently generically to support complete and independent implementation.

 

3.2 WP2: References and Metadata
12For k-blogs to become an integral part of the scientific record, they must fully support the semantic and linked data environment. Although WoP supports standard URI based linking to resources, and bidirectional “trackback” linking to other resources, it lacks complete functionality suitable for research communities. This is a rare example of functionality that is not already provided by WoP or an associated plugin. Deliverable WP2.1 will fulfil this need; we will support the insertion of at least DOIs and PubMed IDs (PMID), that will be resolved to full human-readable reference lists for display, using APIs provided by CrossRef and NCBI eUtils respectively. To fully support computational agents wishing to access the same information, references will also support COinS metadata, embedded into the display HTML.

K-blog posts will also require outward facing metadata, that describe the resources they provide in a standards-compliant manner. The Open Archives Initiative (OAI) provide standards that aim to facilitate the efficient dissemination of content. Specifically, the Object Reuse and Exchange specification (OAI-ORE) is a standard for the description and exchange of compound digital objects  (such as a WoP post or page). The WordPress OAI-ORE plugin provides link header elements that implement this specification.

13Our initial investigations into the k-blog process showed that WoP support for versioning and provenance are lacking; the k-blog process involves updating papers after submission but before final acceptance. While WoP stores all these versions, these are only currently visible by authors or editors through the administration interface. Whilst existing plugins for WoP already provide some of this functionality, Deliverable WP2.2 will uncover these to readers, along with a defined permalink scheme for access to all versions, providing full provenance.

14WoP supports bi-directional links in the form of trackbacks; this is mediated by XML-RPC calls between resources when a link is made. This will support linking to data where, for example, the data is another k-blog; however, general data resources may lack support for this process. Therefore, as Deliverable WP2.3, we will provide a trackback proxy, hosted on the http://knowledgeblog.org server, storing and presenting these links for resources that cannot directly process trackbacks.

15To complete this work package, we will add semantics to the links using CiTO, as Deliverable WP2.4. Therefore, as well as enabling easier data linking and provenance, we will also enable addition of meaning to these links.

 

3.3 WP3 – Specialist Environments

16The k-blog platform and process is designed to be flexible and adaptable to the needs of specialist environments. We will use three main use cases to ensure real world applicability of the software, as well as fulfilling the immediate needs of these communities.

17For Deliverable WP3.1, we will add additional features for supporting the microarray community. Currently, the microarray community is well serviced in terms of metadata capture (MIAME) and deposition in public repositories (ArrayExpress, GEO). As part of WP2, we will support linking to these datasets through stable URIs. However, these resources deal only with data generation. Post-processing and analysis is largely captured at the publication stage, often in supplementary material.

18A substantial amount of this analysis uses BioConductor: a widely used, open-source platform for statistical microarray analysis based on the R statistical programming language. We will extend k-blog with specific support for R and BioConductor. Authors will be able to directly embed code into k-blog papers, along with the figures that result; as a result reviewers and readers will be able to see a computationally precise description of methods and replicate the generation of figures should they choose.

19Finally, we will investigate the possibility of publication to a k-blog using only R code and references to public databases, in a process similar to Sweave — figures will be generated on the server, provide guarantees of correctness and precise provenance. The limited scope of this call means this part of WP3.1 will be proof-of-principle only.

20For WP3.2, we will focus on the public health community (PHC): a key workforce in delivering quality and effective healthcare by providing timely and accurate public health intelligence (PHI),. PHI is a varied environment performing statistical analyses: producing information figures, diagrams and reports to communicate results to the wider health community. However, the PHC operates in small groups with little knowledge networking. The main aim of the k-blog is to improve the availability of health information, data and knowledge, to inform decisions for health protection and care standards as supported by the Quality Improvement Productivity and Prevention initiative. The NWeHealth e-Lab project, hosted at The University of Manchester, provides an environment to bring together research objects into a single location. As elsewhere, textual data forms the key hub that links together all the other forms of knowledge. By linking to e-Lab
research objects from a k-blog, this link will be made explicit, available, interpretable and directly valuable to the PHC; as a result WP3.2 is synergistic with the rest of the proposal. This community also bring a set of access control requirements. To support these we will use existing WoP facilities, providing a simple, easy-to-use three level access model.

 

20For WP3.3, we will generate k-blog content about Taverna workflows and methods for building them. Workflows have become a popular way of realizing computational analyses and have become an important form of data. The JISC funded myExperiment project is widely used to disseminate the workflows themselves. Knowledge about issues surrounding workflows is, however, more difficult to produce and disseminate. A k-blog, with its ability to produce short, targeted articles as the need arises and the resources become available for writing, suits the need for taverna workflow documentation. We will seek k-blogs on Taverna issues such as: the basics of workflow design; how to choose among a set of similar services in producing a workflow; and, the testing of workflows. We will implement a light-weight mechanism, using trackbacks, to link between the k-blog and myExperiment.

 

21As part of WP3, we will also hold four workshops, at 3-month intervals, each focusing on one particular k-blog and community. These workshops will be of the form previously trialled as part of the Ontogenesis network, and will serve several purposes; requirements gathering and feedback for us, education for the community and development of content, that demonstrates the process to the general readership.

 

3.4 WP4 – Integration with Existing Working Practices

22For the k-blog process to be acceptable to communities such as those described in WP3, it must fit with existing working practices. Researchers mostly write documents using a word-processor. Fortunately, as the k-blog platform is based on the widely-used WoP, which in turns offers a widely-supported API, this style of working can be readily integrated. It is already possible to author using Word (2007 onward), OpenOffice, Google Docs and LaTeX using integrated or existing technologies, as demonstrated by our previous work at http://ontogenesis.knowledgeblog.org. For Deliverable WP4.1, user oriented documentation, describing these tools will be developed. This documentation will also describe clearly how to present and organise papers in a way which is optimized for the k-blog process. While, we expect this documentation to take a significant time-span to produce, refining it as a result of user feedback, it is important to note that a k-blog is already useful and possible.

To take maximal advantage of linking technologies developed in WP2, we will need to integrate with existing technologies for referencing. As deliverable WP4.2, we will add tooling to enable the use of bibliographic tools such as Endnote, Mendeley, Zotero or BiBTeX to insert references that k-blog can directly translate. Largely, this should consist of “styles”, modifying the in-text citation, as the reference plugin of WP2.1 will generate reference lists. As with other deliverables, this tooling will include substantial documentation, developed using the k-blog process.

4. Project Timeline

 

Name

Start

End

Staff

Notes

WP 1

02/08/2010

30/10/2010

   

WP 1.1

02/08/2010

31/08/2010

All

A documented k-blog process

WP 1.2

01/09/2010

30/10/2010

DS,SC

Implementation with off-the-shelf software

WP 2

01/11/2010

30/04/2011

   

WP 2.1

01/11/2010

26/02/2011

SC

COinS metadata on posts

WP 2.2

01/11/2010

29/01/2011

SC

Client-side, permanently linked versions

WP 2.3

03/01/2011

26/02/2011

DS

Bi-directional links to other datasets

WP 2.4

01/03/2011

30/04/2011

PL

Semantic linking with CITO

WP 3

01/11/2010

30/07/2011

   

WP 3.1

01/11/2010

30/07/2011

GM

Specialist environment – Healthcare

WP 3.2

01/11/2010

30/07/2011

DS

Specialist environment – Microarrays

WP 3.3

01/11/2010

30/07/2011

RS

Specialist environment – Workflows

WP 4

02/08/2010

30/06/2011

   

WP 4.1

02/08/2010

30/04/2011

GM,DS

Authoring documentation and tools

WP 4.2

02/05/2011

30/06/2011

GM,SC

Referencing documentation and tools

 

5. Project Management Arrangements

23The project will be managed from Newcastle University; the primary management will be from Dr Lord who will be responsible for:

  • Developing Project Management Plans;
  • Ensuring that the Project technical objectives are met;
  • Prioritising and reconciling conflicting opportunities;
  • Reporting and collaborating with JISC programme Manager;
  • Dissemination of the k-blog platform.

Project progress will be evaluated through scheduled, short, “stand-up” meetings on a weekly basis, conducted face-to-face, via skype or phone as appropriate. Although most project staff are co-located, primary unscheduled communication will be via public mailing list, ensuring maximum visibility and openness. User consultation will be via public mailing list, as well as through a “dogfooding” k-blog. All project staff have been handpicked; they are highly experienced and self-directed, as outlined elsewhere. All are associated with several other projects and duties (research, research support, teaching and training), and are responsible for managing these independent workloads.

 

5.1 Risks

24Staff Risk – as with all projects, loss of staff could negatively impact on this project; however, all staff are on permanent contracts, have long histories in research, so this is less likely. Additionally, by dividing the work between five individuals, we limit the risk should a single person leave.

WoP3 and other dependencies – the project depends on other software, most notably WoP for which a new version (3.0) is now in beta; however the software is widely supported. Other software is replaceable.

Standards Shifting – the project depends on a number of standards and these may change. In this project, we will NOT support standards, but rather use those that support us. Where standard change rapidly, their implementation will be delayed (till they stabilize) or dropped. None of the standards described here is critical to the success of the project.

 

5.2 IPR Position

25All code will be developed under open source licences. WoP and RT are licensed under GPL, so code linking to these will be likewise licensed. Code that is separable will be released under LGPL. Code will remain copyright of respective institutions or authors. Any documentation produced by project staff relating to the project will be licensed under Creative Commons Attribution license. Licensing of individual k-blogs will be delegated, but permissive licenses will be encouraged.

 

5.3 Sustainability

26This project is largely based around innovative, novel and leading use of existing software. As such the sustainability of the majority of the technology base is not dependent on project members but large companies with established and proven business models. The k-blog process will be cleanly separated from its implementation, ensuring only weak dependencies to underlying software. Where, we produce software “glue”, public and widely supported APIs will be used where possible. This will ensure that components are replaceable. All code, including historical versions will be publicly available. Documents produced by project staff will be publically available and clearly licensed so will be archived through the internet “cloud” resources; we are also seeking explicit support for archiving from the British Library.

 

5.4 Staff Recruitment

27All staff are already in post.

 

5.5 Key Beneficiaries

28Our key beneficiaries are the public health, microarray and workflow communities; as the k-blog process is based around commodity software, these groups can use the basic environment from the first day of the project to generate and share content. As the project progresses, so will the process, the software to support it and the documentation to explain it; at all stages, the k-blog process fulfils a clear and immediate need. While we are specifically targeting these communities, the k-blog process and platform is sufficiently generic that it can support a wide range of research activities.

Although presented here as a single platform, the process and components are separable and can benefit communities independently. In particular, the tools and documentation from WP2 and WP4 will find use within the research blogging community, who find, in particular, the lack of tooling for referencing difficult. Finally, the statement of a peer-review process, and its implementation within RT will be applicable to any peer-review environment regardless of the form of publication. This includes publications published using wiki or other Content Management Systems.

 

5.6 Engagement with Community

29We consider the mechanism for engagement with four kinds of community: engagement with our core content generating community is an intrinsic part of this proposal, as described in WP3. Further interaction with more disparate groups will be maintained through personal contacts; each of the five individuals named in this proposal are experienced and embedded in different communities (health care, microarray, ontology, proteomics). Engagement with our core content consuming community is, again, an intrinsic part of the proposal; all project communications will be via open mailing list or k-blog. Project members are active users of Web 2.0 social technologies; our initial trials as part of Ontogenesis showing this approach to be highly effective form of dissemination, with minimal effort. Engagement with software users will be via website and direct interaction. All software will be released or advertised via normal channels (website, versioning, and mailing list), including a (debian) package repository for those wishing to set up their own server. Finally, developer communities will not be specifically targeted, but our open source, continually integrated development plan will be attractive, and we will accept suitably licensed contributions.

30All communities will benefit from the open and agile development methodology we will adopt; changes to the environment will be integrated and released rapidly, ensuring continual improvement and facilitating rapid feedback cycles.

 

6. Previous Experience and Project Team

 

31Dr. Phillip Lord is a Lecturer of Computing Science at Newcastle University. He has a PhD in yeast genetics from University of Edinburgh, after which he moved into bioinformatics. He is well known for his work on ontologies in biology, as well as his contributions to eScience beginning with his role as a RA on the myGrid project. Since his move to Newcastle, he has been an investigator on there more eScience projects; CARMEN, ONDEX and InstantSOAP, as well as maintaining an active engagement in standards development (OBI, MIGS, MIBBI), and publishing on the fundamentals of ontology design. He was an active participant in the Ontogenesis network, and developed the initial idea for knowledge blogs as part of this. He is an active blogger and developer.

 

32Dr. Georgina Moulton is an Education and Development Fellow at The University of Manchester. Since 2005 her main roles have been to co-ordinate the development, and delivery of multi-disciplinary bio/health informatics education programmes; and to facilitate the engagement of biological and health communities in a variety of bio and health informatics research projects (e.g., ONDEX, Obesity e-Lab). For 3 years, Georgina was the EPSRC funded Ontogenesis Network Manager, in which she co-ordinated the activities of the network and expanded the network through the facilitation of the development of new activities and was involved in the trial k-blog process. More recently her work includes the development and delivery in conjunction with NHS partners of an education and development programme tailored to match the needs of North West public health analysts and the wider healthcare workforce.

 

33Dr. Daniel Swan has a PhD in developmental biology and continued to work in developmental biology as a post-doctoral researcher before moving into bioinformatics in 2001.  Subsequent positions included working for Bart’s and the London Genome Centre and the Centre for Hydrology and Ecology in informatics driven roles dealing with large, distributed biological datasets generated by large user communities.  Currently the manager of the Newcastle University Bioinformatics Support Unit, he leads a small team aiding biological researchers generate, capture, store and analyse their digital data.  His interdisciplinary background means he has grounding in both computer and biological sciences and is comfortable working on CS focused projects (CARMEN, InstantSOAP, Bio-Linux) as well as acting in a research capacity analysing high-throughput data.

 

34Dr. Simon Cockell has a PhD in Genetics from Leicester University, and refocussed into Bioinformatics with a Masters degree from Leeds in 2005. From there he moved to Newcastle, and the Bioinformatics Support Unit. Since coming to Newcastle, Simon has worked on a range of projects involving large scale analyses (AptaMEMS-ID), data integration (Ondex) and health informatics (MRC Mitochondrial Disease Cohort). 

 

35Dr Robert Stevens is a senior lecturer in Bioinformatics in the Bio and Health Informatics group at the University of Manchester. His main areas of research are in the development and use of semantics within the life sciences. This is blended with the use of e-Science platforms to gather and manage the data and knowledge of the life sciences. He was PI on the Ontogenesis network that ran the meetings for the first k-blog. He is or has been a co-investigator on the myGrid and myExperiment grants that will provide both content and technical input to this project. As well as the JISC funded myExperiment project, Stevens was an investigator on the JISC funded CO-ODE project that developed Protégé 4. On the back of this, Stevens has led the OWL training activities at Manchester that has directly fed in to the Ontogenesis k-blog. This range of experience makes Stevens an ideal partner to lead the development of content within this project.


 

]]>
http://knowledgeblog.org/62/feed 3