Posts Tagged ‘ETL’

Keep On Keeping On

“There is nothing more difficult to plan, more doubtful of success, nor more dangerous to manage than the creation of a new order of things…. Whenever his enemies have the ability to attack the innovator, they do so with the passion of partisans, while the others defend him sluggishly, so that the innovator and his party alike are vulnerable.”
–Niccolò Machiavelli, The Prince (1513)

Atlanta's flying car laneIn case you missed it, a series of recent articles have made a Big Announcement:

The Semantic Web is not here yet.

Additionally, neither are flying cars, the cure for cancer, humans traveling to Mars or a bunch of other futuristic ideas that still have merit.

A problem with many of these articles is that they conflate the Vision of the Semantic Web with the practical technologies associated with the standards. While the Whole Enchilada has yet to emerge (and may never do so), the individual technologies are finding their way into ever more systems in a wide variety of industries. These are not all necessarily on the public Web, they are simply Webs of Data. There are plenty of examples of this happening and I won’t reiterate them here.

Instead, I want to highlight some other things that are going on in this discussion that are largely left out of these narrowly-focused, provocative articles.

First, the Semantic Web has a name attached to its vision and it has for quite some time. As such, it is easy to remember and it is easy to remember that it Hasn’t Gotten Here Yet. Every year or so, we have another round of articles that are more about cursing the darkness than lighting candles.

In that same timeframe, however, we’ve seen the ascent and burn out failure of Service-Oriented Architectures (SOA), Enterprise Service Buses (ESBs), various MVC frameworks, server side architectures, etc. Everyone likes to announce $20 million sales of an ESB to clients. No one generally reports on the $100 million write-downs on failed initiatives when they surface in annual reports a few years later. So we are left with a skewed perspective on the efficacy of these big “conventional” initiatives.

Read more

Two Kinds of Big Data

Rob Gonzalez, Cambridge SemanticsWith all the hullabaloo around Big Data, I’ve been a little surprised that there hasn’t been more talk about how to consume the vast petabytes that people are talking about…until I realized that there are really two Big Data problems out there!

ReceiptsRoughly speaking, the two primary ways in which data scales is by adding depth and by adding breadth.  The first is what most people mean when they refer to Big Data.  Want to run analytics on every single transaction that Wal*Mart has done over 10 years to analyze trends?  THAT is vertical scale.  Technically, you can characterize it as having lots and lots of similarly structured data.  That is where technologies like Hadoop and column-based data storage make a big difference.

Horizontal Big Data, on the other hand, is like the Linked Data Cloud.   It has all kinds of random information that ranges from highly structured and numeric to highly unstructured.  Significantly, it tends to change quite a bit over time with increasing heterogeneity.  That’s a completely different kind of scale, and one that is not well solved by using highly structured, vertically scaling technologies.

Read more

Time for Semantic ETL?

What’s the link between the trends of more and more objects and even commercial transactions on the web being described in a machine-readable, semantic format and the endless streaming of all that data? Revenue-funded startup First Retail, whose principals Anne Jude Hunt and Simon G. Handley will be speaking at the upcoming Semantic Technology Conference in June, thinks the answer is semantic ETL.

Extract, transform, load (ETL) is a widely known concept in the well-charted terrain of the IT world. That’s about transforming a bunch of heterogeneous data to unify it within a data warehouse and get some use out of it.

Semantic ETL, says Hunt, is brought on by the fact that today people want to deal with the growing loads of streaming data while it’s streaming and that “people want intelligent data, machine-readable tags,[they want] to slice and dice it for BI in lots of different ways, so the  traditional data warehouse and relational database approach is just not working for people.” Cleansed and integrated semantic data loaded into distributed, scalable triple stores can come to the rescue.

Read more