Semantic Data Integration for the Enterprise – Oracle Semantic Technologies

The Semantic Web vision of the World Wide Web Consortium is to extend the current Web, so that “information is given well-defined meaning, better enabling computers and people to work in cooperation.”1  This is important, as the mix of content on the web and in applications built using web architectures is shifting from exclusively human-oriented content to computer-mediated content.

In the Semantic Web, data are defined and linked in a way that enables its use for more effective discovery, automation, integration, and re-use across various applications. Toward this end, the W3C has adopted standards and tools such as RDF and OWL to advance the use of semantic technologies.

The data representations defined by Semantic Web initiatives can be seen as the next step in the evolution of data management.  One of the challenges of data management is the ability to share and analyze data stored by independent applications.  Precursors to the semantic technology such as data exchange formats have maintained the distinction between data and the data describing the data (schema or metadata). Minimizing that distinction in data representation enables semantic technologies to move one step closer to data sharing and integration.

Oracle Database 11g now supports both RDF and OWL data management, affording developers with the industry’s leading software infrastructure for scalable and secure semantic applications. Commercial applications are now using this technology to solve complex problems in defense and national intelligence, life sciences,  and geospatial applications.

As part of Oracle Spatial 11g, an option for Oracle Database 11g Enterprise Edition, Oracle delivers an advanced semantic data management capability not found in any other commercial or open source triple store. With native support for RDF/RDFS/OWL standards, this semantic data store enables application developers to benefit from an open, scalable, secure, integrated, efficient platform for RDF and OWL-based applications. These semantic database features enable storing, loading, and DML access to RDF/OWL data and ontologies, inference using OWL and RDFS semantics and user-defined rules, querying of RDF/OWL data and ontologies using SPARQL-like graph patterns embedded in SQL, and ontology-assisted querying of enterprise (relational) data.

Figure 1: Oracle Database 11g Semantic Technologies

Figure 1: Oracle Database 11g Semantic Technologies

Store, Load, and DML operations on the Semantic Data Store: Oracle Semantic Database features support storing, loading and DML operations on RDF/OWL models. Each model contains a set of subject – object – relationship triples organized as an RDF/OWL graph of directed labeled edges. The edge is the link (or relationship) that connects a subject node to an object node and is labeled by a predicate. A normalized storage architecture manages the complexity arising from repeated usage of typically long URIs and literal values associated with the subjects, objects and predicates across triples. This leads to space-efficient storage, and scalable and performant loading, querying, and inference of RDF/OWL data.

Native Inference engine for OWL, RDFS, and user-defined rules: Application developers can add meaning to data and metadata by defining a set of terms and the relationships between them. These sets of terms (“ontologies”) enable enhanced query, analysis and actions based on semantic content, rather than simply data values.  Ontologies are increasingly used to build applications that utilize domain-specific knowledge. Ontological data sets, often containing 100s of millions of data items and relationships, can be stored in groups of three, or "triples" using the RDF data model. Oracle Database 11g enables such respositories to scale into the billions of triples, thereby meeting the needs of the most demanding applications.

Oracle Semantic Database features include a native inference engine for efficient and scalable inference using the most used subset of OWL semantics. This OWL inference engine makes the existing native inference for RDF, RDFS, and user-defined rules (used for additional specialized inference capabilities) more efficient and scalable. Inference can be done using any combination of these supported entailment regimes.

Some organizations are using semantic approaches to create an information model (the ontology) based on data schema taken from a particular enterprise organization or industry.  Individual application database schema are mapped to a standard information model in order to make the meaning of the concepts in different, application-specific data schema explicit and relate them to each other.  The resulting information architecture provides a unified view of the data sources in the organization. As shown in Figure 2, application users can begin to query these enterprise semantic (metadata) models, which comprise f RDF data or ontologies.  Standard ontologies reconcile queries needing access to heterogeneous data sources and application-specific schema. This results in solutions that have the power to address unique problems facing enterprise and Web based systems:

  • data integration across a heterogeneous, expanding set of corporate/public data sources,
  • tracking provenance information, and
  • modeling probabilistic data and schema.
Figure 2. Enterprise Integration Workflow

Figure 2. Enterprise Integration Workflow

Query the Semantic Data Store: RDF/OWL data can be queried in Oracle Database 11g using SQL. The SEM_MATCH table function, which can be embedded in a SQL query, has the ability to search for an arbitrary pattern against the RDF/OWL models, and optionally, data inferred using RDFS, OWL, and user-defined rules. The SEM_MATCH function meets most of the requirements identified by W3C SPARQL standard for graph queries.

The ability to embed a graph-pattern match query in a SQL query has several advantages: 1) It allows users to specify a query against RDF/OWL graphs as a graph-pattern match query, thereby avoiding the need to manually translate what is naturally a graph query into a relational query; 2) The results returned from one or more graph-pattern match queries embedded in a SQL query can be further processed using the powerful SQL constructs (e.g., aggregate functions) and/or can be joined with other relational tables; 3) Ability to automatically rewrite the graph-pattern match query into a SQL subquery that gets transplanted into the outer SQL query avoids staging of intermediate results and allows leveraging the power of Oracle SQL optimizer, leading to efficient query processing.

Ontology-assisted Query for Relational Data: Queries can extract more semantically complete results from relational data by associating relational data with ontologies that organize the domain knowledge of the relational data. For example, if a column in a relational table contains names of diseases, a query asking for match on ‘Immunodeficiency Syndrome’, will be able to retrieve rows containing the value ‘AIDS’ if we interpret the values in that column in the context of the NCI Cancer Ontology [NCI] which states that ‘AIDS’ is a type of ‘Immunodeficiency Syndrome’. Oracle Spatial 11g enhancements include support for the new semantic operator SEM_RELATED operator (and optionally its SEM_DISTANCE ancillary operator) for efficient ontology-assisted querying of relational data.

Advanced Performance and Scalability for Semantic Web Applications: Oracle Semantic Database incorporates three key performance and scalability features that address the most demanding enterprise-class semantic web solutions. Oracle Spatial semantic database features exploit the benefits of Advanced Compression and Partitioning, while fully supporting Real Applications Clusters (RAC).  For this reason, users of the RDF/OWL features of the Oracle Spatial option are required to license Oracle Database Enterprise Edition and both the Oracle Advanced Compression and Partitioning options.

Incorporate Leading Partner Tools Into An Open Data Management Solution: The Semantic Database features in Oracle Spatial are directly integrated with the leading semantic technology tool vendors. Since Oracle’s RDF and OWL data type is compliant with open W3C standards, Oracle Database can serve as an interoperable knowledge base. Semantic data can be shared more easily within organizations, and across the enterprise, so you can realize increased return on knowledge bases while reducing costs.

Oracle consistently works to help shape, drive, implement and support the latest open standards for the Semantic Web. Oracle is a W3C member and actively participates in various technical working groups, such as the OWL WG and the Data Access WG. As a result, Oracle is also committed to supporting standards specifications for RDF, RDFS, OWL and SPARQL.

With Oracle Semantic Database features, Oracle brings the power and value of semantic analysis to your business applications.  These advanced knowledge management features support semantic applications in domains ranging from national intelligence and financial fraud detection to the life sciences. Only Oracle provides world-class performance, scalability, security, and manageability to your semantic data assets, while reducing costs, with support from the leading tools vendors.

1. Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001

Announcing Semantic Tech & Business Conference - San Francisco 2012

Semantic Tech & Business Conference is returning to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!