The “publish or perish” model of the academic world has pretty much followed the same pattern since the middle of the last century. It’s about a seven-year time-span from the a researcher’s original “ah-ha” moment, to the publishing of her paper, to the point where a critical mass of citations are formally gathered around it, as others read the work and cite it in their own research, says Andrea Michalek, co-founder of startup Plum Analytics.
“Clearly the world moves much, much faster than that now,” she says, with researchers posting slides online of talks about their work even before it’s published, and tweets referencing those discussions and linking back to the content, for example. “All this data exhaust is happening in advance of researchers’ getting those cited-by counts,” she says, and once a paper is published, the opportunities for online references to it grow.
Works live not only on the publisher’s web site, in institutional repositories, and also in many cases in Open Access repositories. From there, it’s only a small step to Delicious bookmarks and other pointers to the research.
This new world is creating a need for alternative metrics, or, as Plum calls it, altmetrics, to understand usage of and interaction with a researcher’s body of work. And those metrics can be delivered with the help of the Researcher Graph that Plum is building, which mines the web, social networks and university-hosted data repositories as well as other relevant offline sources to create a map of the relationships between a researcher, his institution, his work, and those who follow or engage with it.
RDF is the underlying data model for this, and the University of Pittsburgh Library System (ULS) is the recently announced first customer for the solution. It plans to use Plum’s technology to provide aggregated open metrics for the university’s research output, as well as for individual researchers to measure their work’s impact beyond traditional citation metrics.
“We’re dealing with a complex space in that we are trying to look at both the individual researchers and their articles and also their affiliations,” says Michalek. “We are trying to get all that data together and that is part of why we feel strongly about how we model it – all those interactions do make a difference, so we can give people who are interested in the impact of research and those who are doing it a way to understand all that data.”
There’s more to the story of a work’s impact than raw counts of citations, she says, and often it’s the relationships among data that reveal the stories inside the data, she says. While the raw count Plum Analytics collects is interesting, she says, “if you find from that Graph that certain sets of people tend to comment on these types of articles and they are all in the same discipline, then you can start to understand if you have engagement in that community or not,” she says. “Modeling the data to carry that kind of information is important.”
Leveraging the Graph
The more universities that sign on to the project can’t but help the development of the Researcher Graph, but Michalek says the company is taking a university by university approach. There’s value both to having the Graph built up, as noted above, but even in its nascent stages there’s a data advantage, she says. “Even if you don’t know the who that you are interacting with, you have data right now…. You can show the impact of your research [community],” she notes.
For instance, for the individuals still a few years away from their PhDs, the raw counts Plum collects make it possible for universities to see sooner that certain of these individuals’ work is gaining traction. “So if you figure that traditional measures won’t kick in until years and years later, that early data helps, and it doesn’t matter that you don’t know who you are interacting with.” In addition to seeing the impact of a work, universities can analyze metrics around what might be helping to drive certain researchers’ work while others stagnate – perhaps, for instance, the researcher who is getting a lot of pickup is using Vimeo and maybe others in the department should be encouraged to take the same route. Or maybe the university will find that there’s too much energy being wasted on videos that aren’t leading to ROI.
The individual researchers can benefit from the feedback loop, too – seeing engagement with their work can be more of a carrot than the stick of university mandates to put papers in institutional repositories or risk grants or raises or other returns.
In its work with ULS, Plum is engaged in the initial exercises of deciding where to start – what researchers to map, and creating researcher profiles that the librarians there will help vet. “We want to make sure we have quality vetted data about how we harvest metrics, associate people, roll this up to the department and so on, and then we also will be working with them to put an automated procedure in place to roll out at a larger-scale across the university,” she says.
The role of librarians here is changing from creators and collectors of physical data to that of service providers for the community, and Michalek belives the tool her company is creating is going to help them deliver those value-added services to have in-depth dialogues with university researchers to ensure their work can be found and gets counted. “Now they’re really seen as a partner,” she says.
The Researcher Graph is seeded with departmental ontologies, document objects IDs for published articles, ISBNs for books, and other information universities typically already have data about. For the Document ID, it is creating a set of aliases and rules to find different URIs by which a work is referenced, since one paper can be living legitimately at 50 different places around the web; so, when someone tweets a link for one of these resources it will know it is the same one that might go by a different alias from another publisher.
“Now we have known researchers from known universities with known artifacts that we can go and gather metrics about,” she says, and the graph will go on to grow from that known place. It’s far more difficult, she says, to try to build a product starting with all the world’s data then to being with a known set of data, and the different types of actions that matter, “and if an action occurs with someone not yet in the graph, it just gets a count. If it’s someone we know from a different university, it gets built up, and is that much richer. We know Researcher A and B and then can track that interaction between the two of them.”
Following the backend work underway now, Plum plans to expose an API to enable its partners and others building applications on top of its research directory of data.
As the Researcher Graph continues to form, Michalek sees benefit not only to researchers and institutions, but to students, too. While the universities can make accessible research currently underway on campus as a showcase in the competition for students, she says that the students she’s spoken to about this effort also are excited. They like the idea of knowing in advance that the professor of a class they’d like to take has a particular passion in one angle of the subject, she says, because it gives them an opportunity to engage at a different level with the instructor. Often professors’ personal web sites don’t do a good job of making that plain.
The company expects that by the end of summer the University of Pittsburgh Library System project should be ready to go live, and other customers are on the horizon. “The data is there,” she says. “Now it’s building the systems for the purpose of finding out certain types of interesting facts.”
- Read All About It: News Storyline Ontology Goes To Press
- Automatic Hashtags & Machine Learning: The New Google+
- MarkLogic 7 Vision: World-Class Triple Store and World-Beating Information Store
- Bing Gets More Social with Facebook Likes