Posts Tagged ‘RDBMS’

Down with the Data Warehouse! Long Live the Semantic Data Warehouse!

East wall of Courtyard brick work, construction of the McKim BuildingI had a call with a Fortune 100 IT team that is looking at using semantic technology as an alternative to the Data Warehouse.  This is my favorite kind of conversation, since I firmly believe the traditional data warehouse is dead but just doesn’t know it yet.

This is the situation the IT team explained:

We need to aggregate information and present it to the user, so we build a warehouse.  We spend all this time building and designing the warehouse, and when it’s done they need something else.  Unfortunately, it’s not so easy to modify a warehouse once it’s running, so we build another one.  And then another.  The cycle has been repeating itself for years and is not sustainable.

Philadelphia Spectrum demolition: brick by brick

The alternative to warehousing is Data Virtualization (EII, Data Federation…lots of terms for it)…or, at least that’s what they, and many others, see!  Essentially, they have been burned by years of working with an inflexible technology, so are looking to dump the approach all together.

I get this.  If a Durian is the only fruit you’ve ever smelled, you’d think all fruit were really stinky.

Read more

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!

Two Kinds of Big Data

Rob Gonzalez, Cambridge SemanticsWith all the hullabaloo around Big Data, I’ve been a little surprised that there hasn’t been more talk about how to consume the vast petabytes that people are talking about…until I realized that there are really two Big Data problems out there!

ReceiptsRoughly speaking, the two primary ways in which data scales is by adding depth and by adding breadth.  The first is what most people mean when they refer to Big Data.  Want to run analytics on every single transaction that Wal*Mart has done over 10 years to analyze trends?  THAT is vertical scale.  Technically, you can characterize it as having lots and lots of similarly structured data.  That is where technologies like Hadoop and column-based data storage make a big difference.

Horizontal Big Data, on the other hand, is like the Linked Data Cloud.   It has all kinds of random information that ranges from highly structured and numeric to highly unstructured.  Significantly, it tends to change quite a bit over time with increasing heterogeneity.  That’s a completely different kind of scale, and one that is not well solved by using highly structured, vertically scaling technologies.

Read more

Introduction to: SPARQL

Hello, my name is SPARQL
SPARQL is the standardized query language for RDF, the same way SQL is the standardized query language for relational databases. If this is the first time you look at SPARQL, but you’re familiar with SQL, you will see some similarities because it shares several keywords such as SELECTWHERE, etc. It also has new keywords that you have never seen if you come from a SQL world such as OPTIONALFILTER and much more.

Recall that RDF is a triple comprised of a subject, predicate and object. A SPARQL query consists of a set of triples where the subject, predicate and/or object can consist of variables. The idea is to match the triples in the SPARQL query with the existing RDF triples and find solutions to the variables. A SPARQL query is executed on a RDF dataset, which can be a native RDF database, or on a Relational Database to RDF (RDB2RDF) system, such as Ultrawrap.  These databases have SPARQL endpoints which accept queries and return results via HTTP.

A basic example

Read more

Native XML Databases and RDF

Royal Enfield sidecarThere are three trends that I observed at SemTech 2011 in San Francisco last week.  First was the increased role of native XML databases used in combination with RDF data stores.  Second was the many natural-language processing tools and vendors at the conference.  And third was the role of semantic annotations and standards directly in web content.  I think these trends are related.

One of the keynote presentations at the SemTech 2011 conference was done by the BBC.  They presented their core architecture for managing web content as having two main components: a native XML database(MarkLogic)  for content and a RDF triple store for “metadata.”  These tools were at the core of their architecture for their web sites.

Another presentation was done by the Mayo Clinic.  They also are using MarkLogic for web content and are also using semantic web technologies.  Their diagrams show that there are many ways for these systems to interact.

Read more

Empire: RDF & SPARQL Meet JPA

Empire is an implementation of the Java Persistence API (JPA) for RDF and the Semantic Web. Instead of another implementation of relational databases, Empire implements JPA for RDF and SPARQL, thus allowing developers who are familiar with JPA, but not with semantic web techologies like RDF, to make an easy transition into this brave, new world. JPA is a specification for managing Java objects, most commonly with an RDBMS; it’s industry standard for Java ORMs.

Read more

TopQuadrant Announces First Semantic Web Enterprise Vocabulary Management Solution

TopBraid EVMS Addresses Core Issue for Improving Data Quality, Search and Analysis

SEMANTIC TECHNOLOGY CONFERENCE 2009 – SAN JOSE, Calif. – June 16, 2009 - TopQuadrant(tm), the global leader in Semantic Web application development technology, today announced TopBraid(tm) Enterprise Vocabulary Management Solution (EVMS) to help organizations connect data vocabulary assets in a modular fashion. TopBraid EVMS leverages W3C Semantic Web standards such as RDF, SPARQL and SKOS to construct a dynamic web of terminology, ultimately improving data quality and cohesiveness, resulting in better utilization of all information assets.

Most enterprises aim for a single, centralized master vocabulary system to create intuitive and understandable connections between data. But the reality is that enterprise vocabularies are managed in a disconnected, distributed manner – typically within spreadsheets and on individual user desktops. Even when enterprise systems include vocabulary capabilities, the result is a host of disconnected vocabularies – which subverts the value of a controlled vocabulary.

By leveraging Semantic Web technology, TopBraid EVMS applies the modularity and extensibility of the Web to vocabulary management. Instead of requiring everyone within an enterprise ecosystem to standardize on all the terms, TopBraid EVMS uses web-standard Uniform Resource Identifiers (URIs) to uniquely identify each concept. Each URI may be mapped to various terms used by the different groups of people to describe the same concept. This modular approach enables a sustainable, shared and distributed vocabulary management web.

"Information is becoming a product for many organizations," said Irene Polikoff, co-founder and CEO of TopQuadrant. "Vocabularies are used to communicate and integrate data across the supply chain and standardize the publication of company information to external partners and customers. But if companies cannot effectively manage reference terms across the organization, information navigation becomes disjointed and results in poor information quality and faulty communication. This directly impacts productivity, quality of customer service, competitiveness and, ultimately, revenues."

TopBraid EVMS offers simplified development and management of controlled vocabularies. Content owners create relationships between terminology elements that are recorded as hyperlinks on the World Wide Web. Connections that previously had to be accomplished with custom software are easily described in a standard, declarative format. TopBraid EVMS provides flexible, customizable approaches for managing taxonomies and business vocabularies to support data integration, search, navigation, content delivery and data analysis.

The following capabilities are available out-of-the-box with TopBraid EVMS:

Vocabulary Processing: Standard hierarchical, associative and equivalency relationships; repositioning and numbering of terms; crosswalk mapping and graph capabilities.

Automatic Processing: Ability to create validation rules and automated script processing via SPIN and SPARQLMotion.

Import/Export: Import/Export from RDBMS, RDF Store, SPARQL endpoints, spreadsheets (CSV), XML, RDF, SKOS and OWL.

Merging: RDF standard universal identifiers provide easy "hooks" for merging vocabularies.

Systems Integration:
Integrate with existing enterprise or vocabulary management systems via web services or APIs.

TopBraid EVMS customers receive specialized models for taxonomy development, governance and version control. The solution also includes SPIN Rules and SPARQLMotion scripts for key vocabulary management functionality, such as approval workflows and commonly used editing activities. TopBraidLive server and TopBraid Ensemble licenses are included in the solution along with TopBraid Ensemble web application templates that are optimized for vocabulary development.

Understanding The Semantic Value Proposition

Part  1 – Understanding The Semantic Value Proposition

The term “Semantic Web” has developed some interesting yet confusing connotations since it was first introduced in the early 2000’s. Those misconceptions include but are not limited to:

Read more