Semantic Wave Hits STM Publishing, Part 2: Current Innovators

Door.jpgThere is a lot of innovation in this space. The fundamental change is from commercial journal publishing to what is called “open access.” This is analogous to the move from commercial software to open source. Open access STM publishing may play out similarly to open source software, in which case we will see:

- A mix of commercial and open but with the new default option bing open.
- New value creation on top of open access by both the existing commercial publishers (the ones who manage their innovator’s dilemma effectively) as well as by entirely new ventures.

(Photo: Flickr/Joel Aron)

Open Access

Open access is the idea of making original research articles freely accessible on the web. It deconstructs the journal. The article is like a song on iTunes and the Journal is like a CD.

(That analogy does not quite work. Scientists don’t want to “listen” to the song. They want to understand the harmony, find out exactly what instruments are playing and do semantic analysis on the lyrics).

Open access is often defined by the stage in the publishing cycle:

• Stage 1: draft manuscript for consideration by a journal, often called a preprint

• Stage 2: peer reviewed “accepted manuscript,” ready for publication by a journal

• Stage 3: final published citable article available from the journal’s website. This is the “Version of Record.”

There are three options for open access:

1. Prior to publication. These are called “preprint servers.”

2. Immediately on publication. This is sometimes called full open access or the “gold” route.

3. Delayed open access, only available after an “embargo” period. This is what commercial publishers favor.

Types Of Innovation

We can see the following broad types of innovation:

1. Search Within A Single Scientific Discipline. This is analogous to vertical search in the commercial web market. For example: ChemSpider, released in March 2007, is a chemistry search engine. It aggregates and indexes chemical structures and their associated information into a single searchable repository and makes it available to everybody, at no charge. They describe the problem they are solving:

“There are tens if not hundreds of chemical structure databases and no single way to search across them. There are databases of curated literature data, chemical vendor catalogs, molecular properties, environmental data, toxicity data, analytical data and on and on.

The only way to know whether a specific piece of information is available for a chemical structure is to have simultaneous access to all of these databases. Since many of these databases are for profit there is no way to easily determine the availability of information within these commercial or even in the open access databases. With ChemSpider the intention is to aggregate into a single database all chemical structures available within open access and commercial databases and to provide the necessary pointers from the ChemSpider search engine to the information of interest. This service will allow users to either access the data immediately via open access links or have the information necessary to continue their searches into commercially available systems. The question ‘is there specific information about my chemical’ will be answered. Accessing the information may require a commercial transaction with the appropriate provider.”

2. Cross-Discipline Search. There are two big ones – Google Scholar and Wolfram Alpha. The latter is a fundamentally different approach, which we explore in another section.

3. Video Journals. One example is JoVE. The Journal of Visualized Experiments (JoVE) is a peer reviewed, PubMed indexed journal devoted to the publication of biological research in a video format. Here is how they describe the proposition:

“As every researcher in the life sciences knows, it can take weeks or even months to learn, perfect, and apply new experimental techniques. It is especially difficult to reproduce newly published studies describing the advanced state-of-the-art techniques. Thus, much time in the laboratory is spent learning techniques and procedures. This is a never ending process for experimental scientists as methodologies in this fast-growing field evolve and change with each coming year (e.g. genomics and proteomics, most dramatically). The time and resource-consuming process of learning and staying current with techniques and procedures is a rate-limiting step in the advancement of scientific research and drug discovery.”

Another example is: Video Journal of Orthopaedics. This is where you see:

“- The critical steps in a surgery with clear demonstrations.
- The rationale for why an intraoperative decision is made.
- Results of techniques, including complication rates.”/blockquote>

4. New Repositories. There are lots of these and they are the core of the open access movement. We dedicate a separate section to these.

5. Peer Networks. These are based on the social networking paradigm. We call them Peer Networks as the price of admission is verified status within the academic/scientific/professional domain. As these peer networks do not require the intermediation of a journal brand, they are fundamentally disruptive. We therefore dedicate a separate section to these Peer Networks.

6. Augmented Reality. This technology enables you to hold up a mobile device to a physical object and the service will overlay information about that object. The oft-quoted use case is overlaying Yelp reviews when you point your Android phone at a restaurant. But it may be that this “bleeding edge” technology gets its first traction in STM. Imagine the potential to find out more about any natural or man-made object by just pointing your phone at it. The researcher who does this can also annotate the record with their own findings on the spot.

7. Scientific Databases. These are different from journals. The data is not structured in the form of articles. You are looking at the underlying data sets. Two examples are Genbank and SciFinder:

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

SciFinder have aggregated and parsed the structures of millions of biological and chemical sequences and structures. SciFinder is from the American Chemical Society (ACS).

New Repositories

These are also sometimes called a “citation index.”

The early pioneer was PubMed, a free database accessing the MEDLINE database of citations, abstracts and some full text articles on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health (NIH) maintains PubMed as part of the Entrez information retrieval system.

Wikipedia has a list of others as a good starting point:

• CiteSeer
• CiteULike
• The Collection of Computer Science Bibliographies
• DBLP (Digital Bibliography & Library Project)
• getCITED
• Web of Science (from Thomson Reuters Corp)
• Libra (Academic Search) (from Microsoft, free)
• Scirus (from Elsevier)
• Scopus (Elsevier)

There are two types:

1. Commercial, from major STM publishers, such Thomson Reuters’ Web of Science and Elsevier’s Scopus and Scirus.

2. Open access and free, funded by institutions to promote science.

If you don’t see the repository that fits your needs, you can even build your own using SeerSuite:

“SeerSuite refers a to a collection of open source tools that provide the underlying application software for creating academic search engines and digital libraries such as CiteSeerX, ChemXSeer, and ArchSeer. The collections of tools is now available on SourceForge under the Apache Software Foundation License.”

Peer Networks

There are three interesting ones that we found:

1. Nature Network

2. Mendeley

3. UniPHY

Nature Network is a free service from Nature Publishing Group (NPG) (the publisher of Nature) with classic social networking features refined for a scientist:
- - personal profile page
- - groupfor your lab, department, institution or subject of interest.
- - forum
- - follow your contacts
- - blog
- - listings of upcoming seminars and conferences.


Mendeley
is a free research management tool for desktop & web that bills itself as “Like iTunes for research papers.”

Mendeley Desktop organizes your research paper collection and citations. It automatically extracts references from documents, generates bibliographies, and is freely available on Windows, Mac OS X and Linux.

Mendeley Web lets you access your research paper library from anywhere, share documents in closed groups, and collaborate on research projects online. It connects you to like-minded academics and puts the latest research trend statistics at your fingertips.

Mendelay seems more like a personalization tool or personal information manager than a classic social network. However if something like this becomes a trusted tool it can evolve into a lot more.

UniPHY is from the American Institute of Physics (AIP) and bills itself as:

“SimpliPHY the search for colleagues and collaborators”

AIP UniPHY is a free online network linking 275,000 physical science professionals, academics, and serious enthusiasts worldwide. Anyone can join and explore the global connections they share through the commonality of publishing in the scholarly community. If you’ve published at least three topical papers, you have the beginnings of a customizable profile already set up on AIP UniPHY.

Personal Research Management tools

One that is getting good traction is Zotero. As they describe themselves, Zotero is:

“a free, easy-to-use Firefox extension to help you collect, manage, and cite your research sources. It lives right where you do your work—in the web browser itself.”

The Web 2.0 lesson has been that simple tools, with low barriers to adoption often win out over more complex solutions. Zotero’s evolution will be interesting to watch.

Wolfram Alpha: The Long Play On Complexity Science

The most wild out of the box innovation is Wolfram Alpha. Contrast this with Google Scholar, the other cross-discipline search engine. Google Scholar is similar to Google’s other play in book publishing. They are playing their normal role, using their search engine to deliver extracts to searchers who them click to get the rest, thus delivering traffic to publishers.

Wolfram Alpha is fundamentally different both at a technical level and in terms of the end objective. To understand Wolfram Alpha one needs to understand Stephen Wolfram and his view on complexity science.

Wolfram’s view is that the science of the 21st century will be fundamentally different. He does not mean that 21st century science will be a better, faster, cheaper version of 20th century science. He means that it will be fundamentally different.

20th century science was about specialization. A physicist dug deep into her specialized domain and seemed to live in a totally different world from a biologist, or economist or mathematician. Complexity science, which evolved from chaos theory, found patterns that were common across all these disciplines.

To a non-scientist, this is intuitively obvious. All the sciences aim to describe the same world we all live in. The patterns must be the same.

If cross-discipline pattern recognition is key to 21st century science, then it is likely that something like Wolfram Alpha will play a big role.

This opens the intriguing possibility of bringing in the smart amateur, the person who sees a pattern that the specialists miss.

Google Scholar: Frenemy?

In the commercial publishing world there are two views of Google:

- Friend: they deliver traffic to my content
- Enemy: they take ad revenue away and commoditize my content

Usually, old media views Google as enemy and new media view them as friend – gross simplification alert ☺

The advertising industry has a similar confused point of view about Google. Martin Sorrell, who runs WPP (largest ad agency) coined the term “frenemy” to describe this.

How does the STM world view Google? One way or another Google will have a big impact on STM publishing. For background, here is an extract from the Wikipedia entry on Google Scholar:

“Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America’s largest scholarly publishers. It is similar in function to the freely-available Scirus from Elsevier, CiteSeerX, and getCITED. It is also similar to the subscription-based tools, Elsevier’s Scopus and Thomson ISI’s Web of Science.

A significant problem with Google Scholar is the secrecy about its coverage. Some publishers do not allow it to crawl their journals. Elsevier journals were not included before mid-2007, when Elsevier began to make most of its ScienceDirect content available to Google Scholar and Google’s web search. As of February 2008 the absentees still included the most recent years of the American Chemical Society journals. Google Scholar does not publish a list of scientific journals crawled, and the frequency of its updates is unknown. It is therefore impossible to know how current or exhaustive searches are in Google Scholar. Nonetheless, it allows easy access to published articles without the difficulties encountered in some of the most expensive commercial databases.”

Google Scholar looks like a bit of a threat to commercial STM publishers. They face the same dilemma as other publishers. Do they take the adversarial approach that Rupert Murdoch takes to Google, basically by blocking their crawler? Or do they work out a fair use policy?

It looks like they have to work out a fair use policy. Rupert Murdoch’s News Corp really does own the content that they produce. They pay the journalist’s salaries. STM Publishers don’t pay the scientist’s salaries.

Google’s problem in STM is how to have more relevant search than a) Wolfram Alpha, which has science at the core or b) scientific discipline specific search engines. As STM is today a sideshow for Google (it is a niche to them), it is possible that they will NOT win this one. It is complex and they are not focused on this.

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!