GatesOne of the primary challenges in putting together a good content management system is building a decent permissions model. Whether a particular user or process is able to perform some kind of an action upon a resource or not can be remarkably difficult to establish, especially when there are multiple constraints involved. For an XML-based CMS, this can be even more of a challenge, because the n-dimensional nature of such a constraint model is often difficult to model in hierarchical structures.

However, RDF is far more ideally suited for this particular role. A permissions system is, at its core, a set of assertions about who can do what to what, which fits nicely with the “subject predicate object” model that RDF exemplifies. Moreover, because such models are sparse — the number of assertions is likely to be very small compared to the total potential assertions that are possible — this fits nicely into models where sparseness of data is a common characteristic (again, RDF), as compared to storing this information (expensively) in tabular fields as with a relational database.

I’m working on building an XML-based CMS (specifically on a MarkLogic platform, though I would like to keep it portable), and realized as I was working on it that while the user permissions system that MarkLogic employs is powerful, it’s not portable and there are facets that don’t fit nicely into that particular model. Thus, I decided to chase the RDF triples approach to see if that would work better for this. (The end product may very well be a hybrid approach to take advantage of fast queries, but that’s beyond the scope of this particular article).

As it turns out, I believe this approach has a lot of promise, not only for direct permissions management, but also for defining workflow models that can be integrated into such a system.

A Few Words on Turtle Notation and RDF

I’ve become a fan of RDF, though I’m still very ambivalent about RDF-XML, which tends to be remarkably difficult to use for pedagogical purposes because of the combination of namespaces and the XML notation itself. One alternative notation that is far more user friendly is Turtle, which is a synonym for TRTL, an acronym of Terse RDF Triples Language.

RDF itself is ultimately about assertions of the form “subject-predicate-object”. By breaking things down in this manner, it is possible to create graphs of data, where the object of one such assertion can be the subject of another. Each term of the assertion is a QName, which in turn consists of a namespace URI and a term in that namespace, such as

http://xmltoday.org/xmlns/permissions#read

where the part before the hash is the namespace, the part after the term.

Unfortunately, expressing such triples longhand can make for some very long assertions:

<http://xmltoday.org/resource#resource1>
<http://xmltoday.org/xmlns/permissions#read>
<http://xmltoday.org/domains#domain2> .

Because of this, it’s common to make use of prefixes to replace the URIs, where these are explicitly declared beforehand.

@prefix resource:  <http://xmltoday.org/resource#>
@prefix per:  <http://xmltoday.org/xmlns/permissions#>
@prefix domain:<http://xmltoday.org/domains#>
resource:resource1 per:read domains:domain2 .

This is usually at least somewhat easier to read, especially when, as here, you have multiple potential namespaces to deal with. The period marks the end of the assertion.

There are a few additional shortcuts that I’ve employed as part of TRTL. For starters, the “;” character indicates that a given subject has more than one predicate/object pair associated with it, while the “,” character indicates that a predicate has two or more objects bound to it. Thus,

resource:resource1 per:read domains:domain1,  domains:domain2 ;
                   per:write domains:domain1 .

is a shorthand notation for:

resource:resource1 per:read domains:domain1 .
resource:resource1 per:read domains:domain2 .
resource:resource1 per:write domains:domain1 .

In the examples that I’ve given, I’ve dispensed with the namespace URI prefix definitions altogether – they’re implicitly there (and explicitly will be needed if you write this as Turtle yourself) but as the domains in question are specific to my application, the namespaces themselves are pretty much meaningless beyond simply being unique names.

Understanding Permissions

The permissions ontology itself is fairly simple:

@prefix  per="http://xmltoday.org/application/permissions";
class:per owl:distinctMembers per:read, per:create,  per:update, per:delete, per:purge, per:clone, per:list.

This set describes the permissions that the system itself exposes, most of which should be familiar as the typical CRUD type operations. There are a few subtle distinctions, such as the fact that per:delete indicates that a role has permission to set the delete flag of a resource to true which makes it invisible to standard queries, while per:purge indicates that the resource should be removed from the database upon the next purge cycle.

The per:clone feature says that a given resource can be cloned with a new resource id, which is frequently useful for templatization (some document types don’t permit cloning because of privacy issues, such as might be the case for an electronic health record).

Finally per:list is used to indicate that when a query is performed, an item that otherwise might be included in the query shouldn’t show up. This may be the case if the item in question has a rights management or security restriction placed upon it.

One of the more important points to consider in this model is that a permission is always a relationship between two distinct objects, rather than a state that a given object has. Thus,

user:user1 per:read resource:r1.

establishes a relationship that says that a given user has a read relationship on a resource. Note that class relationships hold as well:

role:role1 per:read resourceType:rt1.

indicates that for the given role role1, a resource type rt1 can be read by anyone with that role. Note that inheritance plays a role here as well – if a resource r1 is of type rt1, then at the time of creation of rt1 that resource will inherit the per:read state. Note that in this model, changing the permission attribute on the role will not change the permission of the individual resource once created. Note also that if a given user has more than one role, if any role has that permission, then the resource will have that same permission as well.

Because of the user relationship with roles and the resource relationship with resourceType, you can use a SPARQL ask query to determine whether a given user’s roles include the appropriate permission to read a given resource, even if there isn’t a direct relationship (as there probably won’t be):

ASK WHERE {
user:user1 rdf:type ?role .
resource:r1 rdf:type ?resourceType .
?role per:read ?resourceType .
}

If the result of this query is true, then the designated user has the right to read the given resource, without necessarily needing to know which role gave that permission.

Binding Permissions

One of the challenges posed to me in putting together this CMS was that the underlying resources – articles, advertisements and so forth, might be used by multiple domains. As a consequence, one of the first questions that arose was how to associate permissions for a document with a given domain

All resources have domains of applicability – put another way, if you treat each domain as a term in an ontology (domain:domain1, domain:domain2, etc.) then each resource has a set of assertions:

resource:r1 per:read domain:domain1, domain:domain2,  domain:domain3;
        per:create  domain:domain1, domain:domain3;
        per:update  domain:domain1, domain:domain3;
        per:noread  domain:domain4;
        per:nocreate  domain:domain5;
        per:noupdate  domain:domain5;

Additionally, resource types may be set for specific domains:

resourceType:t1 per:read domain:domain1 .
resource:r1 rdf:type resourceType:t1 .

This means that a given document is accessible within a given domain if it either has a specific per:read for the document or it has a per:read for the type and no per:noread for the document itself.

A document also has assertions at the role and user levels that follow the same behavior:

role:role1 per:read resourceType:t1,resourceType:t2;
           per:create  resourceType:t1;
           per:update  resourceType:t2.
role:role2 per:read resourceType:t1.
resource:r1 rdf:type resourceType:t1.
user:user2 rdf:type role:role1.

This means that r1 can be read by anyone with roles 1 or roles 2, or by user1, but not by user2, even though user2 is a member of role1.

It may be worth considering a representation constraint:

role:role1 per:read face:face1, face:face2;
           per:create  face:face1;
           per:update  face:face1.

In this case, if face1 is .xml for instance and face2 is html, then a role1 can read, create and update the xml face but can only read the html representation. Because resources inherit from their roles initially, then resources will in fact this relationship as well:

resource:r1 rdf:type role:role1.
resource:r1 per:read face:face1, face:face2;
            per:update face:face1.

Note again that such inheritance only occurs at the time of creation – in theory a given resource document can have additional representational faces added or removed once the resource has been created, and if the permissions given above for the roles are changed, they will not propagate down to the resources already created – these will have to be changed directly.

Integrating Workflows with Permissions

A workflow can similarly be defined using RDF triples in this manner.

workflow:workflow1 workflow:hasState      wkflToken:draft1,
            wkflToken:approval1,
            wkflToken:published1,
            wkflToken:deleted1.
wkflToken:draft1 workflow:nextState  wkflToken:approval1, wkflToken:deleted1;
      workflow:nextPrimaryState  wkflToken:approval1;
      workflow:hasPermissions per:read, per:create, per:update, per:delete.
wkflToken:approval1 workflow:nextState  wkflToken:draft1,published1,  wkflToken:deleted1;
      workflow:nextPrimaryState  wkflToken:published1;
      workflow:hasPermissions per:read, per:delete.
wkflToken:published1 workflow:nextState  wkflToken:draft1, wkflToken:deleted1;
      workflow:hasPermissions per:read, per:delete.
wkflToken:deleted1 workflow:nextState  wkflToken:draft1;
      workflow:hasPermissions per:delete.
resourceType:t1 workflow:hasWorkflow  workflow:workflow1.
resource:r1 workflow:workflowState wkflToken:draft1;
            workflow:hasWorkflow  workflow:workflow1.

In this particular case, a workflow workflow1 is created that has four states – draft1,approval1,published1, deleted1. Each of these are tokens that are unique to this workflow, which means that a published1 state in workflow1 will be different from published2 state in workflow2. A given resource type may have an associated workflow (it doesn’t necessarily have to, which just indicates that all states are available at all points a document’s life-cycle).

At each stage in a workflow, the document will have one or more workflow:nextState values – these are the states that the document can transition to from the current state, and either zero or one nextPrimaryState value, which indicates the state that will be considered the primary transition alternative (e.g., the one associated with the “submit” button).

Each workflow state has a set of permissions applicable to either the resource or resourceType. Thus, in workflow1, draft1 mode enables read, create, update and delete, while approval only enables read or delete. Note that if a document has an associated workflow, this workflow will take priority over any per-document permissions (in essence, the workflow determines the permissions of the document, bypassing any user control).

It is possible given this scheme for a document to actually be under two or more workflows simultaneously, such as under a publishing pipeline and a separate promotional pipeline. This can mean that determining the specific scope and precedence model for properties can get to be fairly complex even given this.

Summary

This particular document explores the basics of how an RDF based permissions system could be put together, but there are obviously larger issues – how do you integrate such an issue into your CMS? How do you deal with precedence? How are ingestion and setting of such permissions made? I will be covering these in a subsequent article.

My goal here was to illustrate the basic principle of how such a system might look. Because there are often multiple constraints that apply to such models at various levels within a given permissions system, the ability to use RDF and SPARQL opens up the possibility of managing resources in multiple dimensions at once, which has applications well beyond content management systems.

Kurt Cagle is an information architect for Avalon Consulting (http://www.avalonconsult.com), specializing in XML and Semantic Web related issues, MarkLogic XQuery programming, and W3C web standards. He is a member of the XForms working group, and is the author of seventeen books on web technology, with another one on (HTML5 Web Graphics with SVG) to be published by O’Reilly Media later in 2011. He can be reached at caglek@avalonconsult.com, and is the managing editor for XML Today (http://www.xmltoday.org).