David Hill Radcliffe of the OUPblog recently wrote, “The publication of the Oxford Dictionary of National Biography in September 2004 was a milestone in the history of scholarship, not least for crossing from print to digital publication. Prior to this moment a small army of biographers, myself among them, had worked almost entirely from paper sources, including the stately volumes of the first, Victorian ‘DNB’ and its 20th-century print supplement volumes. But the Oxford DNB of 2004 was conceived from the outset as a database and published online as web pages, not paper pages reproduced in facsimile. In doing away with the page image as a means of structuring digital information, the online ODNB made an important step which scholarly monographs and articles might do well to emulate.” Read more
Posts Tagged ‘database’
Is SPARQL the SQL for NoSQL? The question will be discussed at this month’s Semantic Technology & Business Conference in San Jose by Arthur Keen, vp of solution architecture of startup SPARQL City.
It’s not the first time that the industry has considered common database query languages for NoSQL (see this story at our sister site Dataversity.net for some perspective on that). But as Keen sees it, SPARQL has the legs for the job. “What I know about SPARQL is that for every database [SQL and NoSQL alike] out there, someone has tried to put SPARQL on it,” he says, whereas other common query language efforts may be limited in database support. A factor in SPARQL’s favor is query portability across NoSQL systems. Additionally, “you can achieve much higher performance using declarative query languages like SPARQL because they specify the ‘What’ and not the ‘How’ of the query, allowing optimizers to choose the best way to implement the query,” he explains.
Benjamin Young of Cloudant reports, “Data is often stored and distributed in esoteric formats… Even when the data is available in a parse-able format (CSV, XML, JSON, etc), there is often little provided with the data to explain what’s inside. If there is descriptive meta data provided, it’s often only meant for the next developer to read when implementing yet-another-parser for said data. Really, it’s all quite abysmal… Enter, JSON-LD! JSON-LD (JSON Linked Data) is a simple way of providing semantic meaning for the terms and values in a JSON document. Providing that meaning with the JSON means that the next developer’s application can parse and understand the JSON you gave them.” Read more
Tom Simonite of the MIT Technology Review recently wrote, “For all its success, Google’s famous Page Rank algorithm has never understood a word of the billions of Web pages it has directed people to over the years. That’s why in 2010 Google acquired Metaweb, a company building a database intended to give computers the ability to understand the world. Two years later the company’s technology resurfaced as the Knowledge Graph. John Giannandrea, vice president of engineering at Google and a Metaweb cofounder, says that will lead to Google’s future products being able to truly understand the people who use them and the things they care about. He told MIT Technology Review’s Tom Simonite how a data store designed to link together all the knowledge on Earth might do that.” Read more
Max Smolaks of Tech Week Europe reports, “Andrew Fogg, co-founder of the UK start-up Import.io, thinks every web resource should have an Application Programming Interface (API). In order to make online data more accessible, his company turns any website into a spreadsheet or an API, for free. Fogg claims that in the past few months, the users of this service have created more Web APIs than the rest of the Internet combined. Jerome Bouteiller has interviewed the entrepreneur at LeWeb 2013 conference in Paris, where the two discussed the future of the company and the idea of the Semantic Web, proposed by the ‘father of the Internet’ Sir Tim Berners-Lee.” Read more
Our discussion of Big Data at SemTechBiz, begun here, continues:
The Enterprise Linked Data Cloud Needs Semantics, And More
Another exploration of Big Data’s intersection with semantic technology will take place at this session, where Dr. Giovanni Tummarello, senior research fellow at DERI and CTO of SindiceTech, will talk about the former becoming an enabler for the latter to be really useful in enterprises. “A lot of people say it’s via Big Data that semantic technologies like RDF will see a coming of age and clear applications in certain industries,” he says. There’s value to adding data first and understanding it later, and to that end, “semantic technologies give you the most agile tool to deal with data you don’t know, where there’s a lot of diversity, and you don’t know what of it particularly will be useful.”
Last month at its MarkLogic World 2013 conference, the enterprise NoSQL database platform provider talked semantics as it related to its MarkLogic Server technology that ingests, manages and searches structured, semi-structured, and unstructured data (see our story here). The vendor late last week was scheduled to provide an early access release of MarkLogic 7, formally due by year’s end, to some dozens of initial users.
“People see a convergence of search and semantics,” Stephen Buxton, Director, Product Management, recently told The Semantic Web Blog. To that end, a lot of the vendor’s customers have deployed MarkLogic technology as well as specialized triple stores, but what they really want, he says, is an integrated approach, “a single database that does both individually and both together,” he says. “We see the future of search as semantics and the future of semantics as search, and they are very much converging.” At its recent conference, Buxton says the company demonstrated a MarkLogic app it built to function like Google’s Knowledge Graph to provide an idea of the kinds of things the enterprise might do with both search and semantics together.
Following up on the comments made by MarkLogic CEO Gary Bloom at his keynote address at the conference, Buxton explained that, “the function in MarkLogic we are working on in engineering is a way to store and manage triples in the MarkLogic database natively, right alongside structured and unstructured information – a specialized triples index so queries are very fast, and so you can do SPARQL queries in MarkLogic. So, with MarkLogic 7 we will have a world-class triple store and world-beating information store – no one else does documents, values and triples in combination the way MarkLogic 7 will.”
Philipp Gratzel Von Gratz of Medical Xpress writes, “Imagine a hospital where patient data from numerous sources is made accessible to ward physicians with the help of hyperlinks and intelligent indexing. Imagine a healthcare system that hands its patients – not an envelope or a CD-ROM – but an integrated dataset that allows them to truly understand their illness, and even use the Internet to obtain additional information. Imagine a radiologist who uses semantic technologies to navigate smoothly through the myriad imaging data. Welcome to the future of semantic technologies in health information retrieval.” Read more
Enterprise NoSQL database platform provider MarkLogic has come into some cash: a $25 million round of growth capital from investors including Sequoia Capital, Tenaya Capital, Northgate Capital, CEO Gary Bloom and other corporate executives. Yesterday, at the company’s MarkLogic World 2013 conference, Bloom also prepared the audience to hear more today from company executives about MarkLogic’s next steps in semantics for its MarkLogic Server technology that ingests, manages and searches structured, semi-structured, and unstructured data.
“The way to think about this is that when we look at semantics, we didn’t … say we just want to check a box on semantics,” Bloom said, by working with partners on some low-hanging fruit – although it will be collaborating with them on various semantic enrichment capabilities. “We think semantics is critical technology, and more interesting I believe is that it is a critical technology that is both a search technology as well as a database technology.” Others in the marketplace will focus on changing their search engines to do semantics, but optimum results won’t come if all that’s being done is layering in semantics at the search level, he said.
Algebraix Data Corporation today announced its SPARQL Server(TM) RDF database successfully executed all 17 of its queries on the SP2 benchmark up to one billion triples on one computer node. The SP2 benchmark is the most computationally complex for testing SPARQL performance and no other vendor has reported results for all queries on data sizes above five million triples. Read more
NEXT PAGE >>