This week Dandelion, which bills itself as the one-stop shop for smart, high-quality Geo and Linked Data from trusted sources, starts its private beta. The service, which promises end users quality, normalized, linked and enriched data for their apps and reports; developers a simple API for any kind of language on any kind of platform; and corporate and government entities a way to publish and profit from their data, comes from SpazioDati.
That company is the creation of four Italian entrepreneurs – CEO Michele Barbera, president Gabriele Antonelli, partnerships director Andrea Di Benedetto, and Luca Pieraccini – who lived first-hand the frustrating experience of trying to find and leverage useful data for the custom web and mobile apps they were developing while running and working in small IT consulting companies. In an attempt to reverse the ratio of finding and cleaning data to actually building apps, says Barbera, the founders began participating in several EU-funded research projects and in the Open Data movement in Europe and Italy, including founding the non-profit Linked Open Data Italy. They also started experimenting with Semantic Web technologies.
“Open Data helps us to find valuable data and to build value-added web and mobile apps,” says Barbera. “So, let’s say that we solved partly the first problem of finding data, but not the second one, normalizing and cleaning data, since it is still very difficult to merge different data sources to put data in context.”
SpazioDati was founded to take a new approach to making data accessible to all users, not just scientists and data geeks, as Barbera puts it. “Yes, Linked Data is trying to solve this problem. However, people do not ‘think in the graph,’ yet. They think in tables. They’re not familiar with SPARQL and RDF, yet. The idea of querying, linking, smushing, [and] materializing data from several sources and then finally interpreting and visualizing it to extract valuable information is still far too difficult,” he says, adding that the proliferation of terms and licenses to deal with only adds to the challenge.
“Besides, if you want to manipulate geographic data, things are even worse since you need to know and understand other myriad protocols, formats, projections, acronyms, technologies and tools. Once you reach the point of having at your disposal a huge set of data, still you have to face big data technologies (such as NoSQL and Map Reduce just to name a few), which are not yet always easy to grasp.”
SpazioDati, which also has as part of its management team semantic web index Sindice founder Giovanni Tummarello, received seed funding from Trentorise and from the Chamber of Commerce of Pisa. It started developing Dandelion.eu this January. “We strongly believe that the real value is putting data in context,” says Barbera. The private beta will be based on its “data backbone,” which is a set of data coming from DBpedia (SpazioDati maintains the Italian DBpedia), Openstreetmap, Geonames and some government Open Data sources. The private beta will feature two datasets: Italian points of interest and local events. “The aim of the private beta is to test the usability of our APIs so it’s mainly targeted to developers,” says Barbera.
Going forward, Dandelion envisions establishing many partnerships to bring more data into its market, including with private data providers who want to distribute their data for free or for a fee. It can link private corporate data coming from partners with government- and community-generated Open Data. “Each data source, public or private, is then connected to our data ingestion and curation pipeline to be cleaned, enriched, linked and finally published on Dandelion,” says Barbera. At the moment the curation pipeline is operated by SpazioDati, but by the end of the year it plans to make it available to users, who will then be able to themselves publish their own datasets on Dandelion. Currently, users can access the curation pipeline for its entity extraction and linking API, which Barbera says works well even on very short texts in English and Italian.
Dandelion is familiar with the fate of other data marketplaces, such as the Kasabi Linked Data Marketplace, which explained last year when it closed that it was still too early in the open data revolution for third-party data marketplaces to really have a clear niche. That raises some questions about making it in the data marketplace arena. Kasabi, Barbera says, was actually a source of inspiration for Dandelion, and he and the team at SpazioDati were sad to see it go. SpazioDati agrees that at the time, it was too early for data markets, but not anymore. Secondly, says Barbera, while Kasabi was very powerful it was still too difficult for many users. And, he adds, Kasabi was relying too much (and too early) on being an exchange rather than a market – “that is, it was focused on offering tools for data producers to publish and distribute their data rather than offering high-quality datasets to its users.”
Dandelion, in contrast, is focusing on the demand side. “The possibility for users to publish their own data will come next, when there will be already a sufficient critical mass of data to which user-generated datasets can be linked,” he says. And to avoid the complexity concern, it’s keeping “‘the graph’ and semantic jargon completely hidden,” just exposing simple, easy-to-use REST APIs to consume tabular data. “Look at Open Data (and Linked Data) initiatives and projects: there’s clearly a shortage of applications based on this data,” he says, and you can credit that to data still being difficult to use even when it’s found. That, Barbera says, is where Dandelion comes in.
“It’s just a matter of time for data producers and data consumers to understand the potential of this approach,” he says. As for the chasm that exists between data owners and end users, Dandelion wants to bridge it by leveraging the community of developers and IT pros at small and medium enterprises, who can mix in their creativity with the data to produce smart value-added applications that end up in the hands of everyday end users. As for the end-user group of non-developer data professionals — statisticians, architects, policy makers and marketers – “we’ll offer the possibility to use the data to build reports and visualizations that they can easily export and use,” he says.
While its primary target right now, then, are developers and IT departments at smaller companies, SpazioDati does want to be a resource for corporations that may not even know what treasure resides in their own data. “Serendipity is the secret ingredient: data that originates in a corporation in a certain sector might be used for doing something different in a completely unrelated context,” he says. “The way we approach data providers depends on their business model. We basically offer them the opportunity to extract the hidden value in their data by making it available for free or for a fee to hundreds of consumers.”
Plans are for Dandelion to launch the first live product, which will still be focused on data about Italy, this summer. “By the end of the year we’ll go international by publishing data with a European focus, and we’ll allow users to upload, curate, link and publish their own datasets,” says Barbera. “Gradually, we’ll start ‘exposing the graph.’ By the end of the year we’ll also extend the set of languages supported by the curation pipeline.”