Posts Tagged ‘Gil Elbaz’

Blekko Data Donation Is A Big Benefit To Common Crawl

Common Crawl, the non-profit organization creating a repository of openly and freely accessible web crawl data, is getting a present from search engine provider blekko. It’s donating its metadata on search engine ranking for 140 million websites and 22 billion webpages to Common Crawl.

“The blekko data donation is a huge benefit to Common Crawl,” Common Crawl director Lisa Green told The Semantic Web Blog. “Knowing what the blekko team is crawling and how they rate those pages allows us to improve our crawler and enrich our corpus for high-value webpages.”

Read more

Data Markets & the Data Economy

Gil Elbaz has written an article for TechCrunch in which he shares his take on emerging data markets: “The term data market brings to mind a traditional structure in which vendors sell data for money. Indeed, this form of market is on the rise with companies large and small jumping in. Think of Azure Data Marketplace (Microsoft), data.com (Salesforce.com), InfoChimps.com, and DataMarket.com. While this model allows organizations to acquire valuable data, the term is evolving to include a variety of forms, each with varying degrees of adoption success. At the heart of it, data markets enable organizations to access data in new ways, where the currency does not only have to be money, but can be in the form of data or insight.” Read more

Facebook’s Instagram Acquisition: Fueling More Startup Fever and Semantic Startups’ Dreams

The news of Facebook’s acquisition of mobile photo-sharing service Instagram for $1 billion this week may be fueling the dreams of tech start-ups of every stripe, including those in the semantic tech community. In fact, they may have even greater reason to be inspired: A recent  report has it that Instagram has been slowly rolling out an Open Graph integration for the app accomplished in collaboration with Facebook for seamlessly publishing photos to users’ Timelines in what may be the first of similar partner-deals down the road.

Other startups infused with semantic tech smarts may be on high lookout for funding opportunities as an important part of making those dreams come true. Thomson Reuters and The National Venture Capital Association this week released funding stats for the first quarter of 2012 that could put a bit of a damper on things: It found a 35 percent decrease by dollar commitments and a 9 percent decline by number of funds, compared to the first quarter of 2011. But, according to a statement by Mark Heesen, president of the NVCA, venture firms “appear to be more optimistic about the fundraising environment in 2012.”

Read more

Common Crawl Founder Gil Elbaz Speaks About New Relationship With Amazon, Semantic Web Projects Using Its Corpus, And Why Open Web Crawls Matter To Developing Big Data Expertise

The Common Crawl Foundation’s repository of openly and freely accessible web crawl data is about to go live as a Public Data Set on Amazon Web Services.  The non-profit Common Crawl is the vision of Gil Elbaz, who founded Applied Semantics and the AdSense technology for which Google acquired it , as well as the Factual open data aggregation platform, and it counts Nova Spivack  — who’s been behind semantic services from Twine to Bottlenose – among its board of directors.

Elbaz’ goal in developing the repository: “You can’t access, let alone download, the Google or the Bing crawl data. So certainly we’re differentiated in being very open and transparent about what we’re crawling and actually making it available to developers,” he says.

“You might ask why is it going to be revolutionary to allow many more engineers and researchers and developers and students access to this data, whereas historically you have to work for one of the big search engines…. The question is, the world has the largest-ever corpus of knowledge out there on the web, and is there more that one can do with it than Google and Microsoft and a handful of other search engines are already doing? And the answer is unquestionably yes. ”

Read more