Posts Tagged ‘XML’

Lessons Learned On the Road To Linked Data

What’s the path from an XML based e-government metadata application to a linked data version? At the upcoming Semantic Tech & Business Conference in Berlin, the road taken by the Dutch government will be described by Paul Hermans, lead architect of Belgian project Erfgoedplus.be, which uses RDF/XML, OWL and SKOS to describe relationships to heritage types, concepts, objects, people, place and time.

Some 1,000 individual organizations compose the Dutch government, each with their own websites. An effort to employ a search engine a few years ago to spider those different and separate web sites to have one single point of access didn’t work as anticipated. The next step to bring some order was to assign all the documents published on those sites a common kernel of metadata fields, which led to building an XML application to enable a structured approach. Linked Data entered the picture about a year and a half ago.

Read more

Announcing Semantic Tech & Business Conference - San Francisco 2012

Semantic Tech & Business Conference is returning to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

Just How Big A Rock Star Is Eric Clapton?

Who has had the greatest impact on rock music? It’s a question that still isn’t answered, despite the efforts of Ronald P. Reck, principal at RRecktek LLC, and Kenneth B. Sall, principal systems engineer/XML data analyst at Ken Sall Consulting.

The team wanted to use semantic technology, along with DBpedia and MusicBrainz data sources, to try and figure out the answer. Reck and Sall recently published a paper, Determining the Impact of Eric Clapton on Music Using RDF Graphs: Selected Challenges of Semantics Across and Within Datasets, based on their experiences. Their plan was to use RDF and SPARQL to query properties and relationships among musical artists to reveal their activity, impact and “six degrees of Eric Clapton” connections to other artists.

Reck and Sall initially saw this project as a door-opener to showing relationships between pieces of data, and drawing inferences and conclusions from them, for a more serious purpose: “We were interested in music, but the real application, especially in the government, is tying the clues together, for example, around terrorists,” says Sall. It turns out that musicians and terrorists have some things in common — they tend to have specific roles in their organizations, and may cross-partner with other groups in loose relationships.

While the work didn’t result in answering the original question posed, it did reveal, as Sall puts it, “what can go wrong in doing this kind of semantic analysis.” That’s in itself useful, as it presents an opportunity to find at least some solutions around those pitfalls.

Read more

An RDF based Permissions Model

GatesOne of the primary challenges in putting together a good content management system is building a decent permissions model. Whether a particular user or process is able to perform some kind of an action upon a resource or not can be remarkably difficult to establish, especially when there are multiple constraints involved. For an XML-based CMS, this can be even more of a challenge, because the n-dimensional nature of such a constraint model is often difficult to model in hierarchical structures.

However, RDF is far more ideally suited for this particular role. A permissions system is, at its core, a set of assertions about who can do what to what, which fits nicely with the “subject predicate object” model that RDF exemplifies. Moreover, because such models are sparse — the number of assertions is likely to be very small compared to the total potential assertions that are possible — this fits nicely into models where sparseness of data is a common characteristic (again, RDF), as compared to storing this information (expensively) in tabular fields as with a relational database.

I’m working on building an XML-based CMS (specifically on a MarkLogic platform, though I would like to keep it portable), and realized as I was working on it that while the user permissions system that MarkLogic employs is powerful, it’s not portable and there are facets that don’t fit nicely into that particular model. Thus, I decided to chase the RDF triples approach to see if that would work better for this. (The end product may very well be a hybrid approach to take advantage of fast queries, but that’s beyond the scope of this particular article).

Read more

Mending Media’s Tangled Relationship With the Web

The media industry has had a complicated relationship with the Web, and that’s putting it kindly. While other sectors pretty quickly realized ways to take advantage of that new thing called the Internet – to sell goods, accelerate supply chains, and build deeper customer relationships – established content providers spent years trying to figure it out. And many still are tussling with big issues, such as whether or not to charge for access to content.

Given the Web’s impact on their business model and their revenues, you can forgive publishers if they might prefer if the darn Internet just stood still for a few minutes and let them catch their breaths and catch up.  Since that isn’t about to happen, the thing to do is to make peace with those changes, many of them thanks to Semantic Web technologies – and figure out fast how they’re going to profit from them.

They’ll have an opportunity to do just that at the upcoming Semantic Web Media Summit in New York City, whose speakers will include Michael Dunn, VP and CTO at Hearst Interactive Media on the topic of why media companies should be interested in this critical part of the Web 3.0 world.

Dunn sees a number of reasons for using Semantic Web technologies as the means for structuring the wealth of content that publishers produce. There’s improving its discoverability by the world via search and social, of course, but it matters for internal operations, too. And add to that the relationship with online advertising so that content can be better monetized.

Read more

The Value of Semantic Markup to Retailers

A recent article informs online retailers that “Starting now, you’re going to need good structured markup on your X/HTML in addition to your white hat tactics. I see structured markup as being equally important to authoritative inbound links as a ranking factor when optimizing content. Why? Because search robots are designed to serve search engine users by matching their search query expectations, known as user intent. These bots are machines, and they’re trying to discern the human mind’s evaluation of information in answer to human-entered keywords.” Read more

Jeni Tennison on Web Development

Jeni Tennison recently shared her experiences working with web standards in her work at legislation.gov.uk. In particular, Tennison looks at how her organization has need to use multiple technologies in concert to achieve various publishing goals and satisfy various types of data consumers.  She begins, “One of the things that’s been niggling at the back of my mind since the schema.org announcement is how small a role search engine results plays in the wider data sharing efforts that I’m more familiar with in my work on legislation.gov.uk, and more generally how my day job experience differs from (what seem to be) more common experiences of development on the web. In this post, I’m going to talk about that experience, and about the particular problems that I see with the coexistence of microdata and RDFa as a result.” Read more

Linked Data and US Law

Bob DuCharme recently discussed the value of linked data in US law. DuCharme notes, “At a recent W3C Government Linked Data Working Group working group meeting, I started thinking more about the role in linked data of laws that are published online. To summarize, you don’t want to publish the laws themselves as triples, because they’re a bad fit for the triples data model, but as online resources relevant to a lot of issues out there, they make an excellent set of resources to point to, although you may not always get the granularity you want.” Read more

LAC Releases Government of Canada Core Subject Thesaurus

The government of Canada has released a new downloadable version of its Core Subject Thesaurus in SKOS/RDF format. According to Library and Archives Canada, “The Government of Canada Core Subject Thesaurus is a bilingual thesaurus consisting of terminology that represents all the fields covered in the information resources of the Government of Canada. Library and Archives Canada is exploring the potential for linked data and the semantic web with LAC vocabularies, metadata and open content.” Read more

Native XML Databases and RDF

Royal Enfield sidecarThere are three trends that I observed at SemTech 2011 in San Francisco last week.  First was the increased role of native XML databases used in combination with RDF data stores.  Second was the many natural-language processing tools and vendors at the conference.  And third was the role of semantic annotations and standards directly in web content.  I think these trends are related.

One of the keynote presentations at the SemTech 2011 conference was done by the BBC.  They presented their core architecture for managing web content as having two main components: a native XML database(MarkLogic)  for content and a RDF triple store for “metadata.”  These tools were at the core of their architecture for their web sites.

Another presentation was done by the Mayo Clinic.  They also are using MarkLogic for web content and are also using semantic web technologies.  Their diagrams show that there are many ways for these systems to interact.

Read more

Ontologies Here, There, and Everywhere

What do the Open Travel Association, the Filoli historic site of the National Trust for Historic Preservation, and the Ontology Platform Special Interest Group (PSIG) at the Object Management Group all have in common? Ontologies, of course!

Starting at the most obvious point, the Ontology PSIG developed the Ontology Definition Metamodel (ODM), and at the upcoming SemTech conference, Elisa Kendall, CEO of Sandpiper Software, co -editor of ODM and co-chair of the Ontology PSIG, will discuss some of the latest work underway there. Of which, she tells The Semantic Web Blog, there is “a ton.”

Among the efforts underway are making ODM current with W3C specs including support for OWL 2 (which should be available towards year’s end), and others that depend to some degree on the standard and building on that baseline.

These include vertical industry efforts such as Common Terminology Services (CTS) 2 from the health care sector’s HL7 standardization body. Kendall says this builds on the first version of ODM, with the focus on using ontologies and depending on semantics for providing the terminology, translation and cross-correlation of the maze of hospital and insurance codes to enable interchange of this data among parties.

The CTS2 effort has been generalized so that it can support terminology services for other verticals as well, which the OMG Ontology PSIG group hopes will make it more broadly useful. “We’ll have to see how that plays out in practice, since it’s only just being published this summer,” Kendall says.

Read more

NEXT PAGE >>