Posts Tagged ‘Diffbot’

Diffbot – the Next Google?


Wade Roush of Xconomy last week wrote, “In tech journalism, it’s inadvisable to call any company ‘the next Google.’ It’s almost always breathless hype or marked naïveté. After all, people have been predicting the search giant’s demise for nearly as long as the company has existed. I wrote a Technology Review cover story called ‘Search Beyond Google’ nearly 10 years ago. But with unlimited brainpower and money at its disposal, the company has managed to stay at the forefront in search, while also getting very good at other things, like mobile hardware. So when I tell you that a seven-employee company called Diffbot really could be the next Google, I need to be very specific about what I mean.” Read more

Diffbot and Semantria Team Up for Better Text Analytics

DiffbotJordan Novet of Venture Beat reports, “Analyzing text on the Internet to measure how positive it is — product reviews on, for example — has become easier and less expensive with tools from AlchemyAPI, Semantria, and other companies. But finding the text actually worth mining can be a chore in itself. To do this, Semantria has announced a formal partnership with a company called Diffbot that does the grunt work of finding important passages. Diffbot uses what it calls ‘computer vision’ technology to scour websites for meaningful information, shedding things like complex surrounding Web code. It then churns out clean text for analysis. Once Diffbot supplies Semantria with the structured text, Semantria assesses its meaning and tone. Semantria’s goal is to “bring text and sentiment analysis into the hands of a nontechnical person in under 3 minutes and for less than $1,000,” according to founder and chief executive Oleg Rogynskyy.” Read more

Diffbot Is Teaching Robots to Shop

John Davi of the Diffbot blog recently wrote, “Diffbot’s human wranglers are proud today to announce the release of our newest product: an API for… products! The Product API can be used for extracting clean, structured data from any e-commerce product page. It automatically makes available all the product data you’d expect: price, discount/savings amount, shipping cost, product description, any relevant product images, SKU and/or other product IDs. Even cooler: pair the Product API with Crawlbot, our intelligent site-spidering tool, and let Diffbot determine which pages are products, then automatically structure the entire catalog.” Read more

The Future of E-Commerce Data Interpretation: Semantic Markup, or Computer Vision?

How will webpage data be interpreted in the next few years?  The Semantic Web community has high hopes for ever evolving semantic standards to help systems identify and extract rich data found on the web, ultimately making it more useful.  With the announcement of support for GoodRelations  in November, it seems clear semantic progress is now being made on the e-commerce front, and at an accelerated rate.  Martin Hepp, founder of GoodRelations, estimates the rate of adoption of rich, structured e-commerce data to significantly increase this year.

diffbot logo and semantic web cubeHowever, Mike Tung, founder and CEO of a data parsing service called DiffBot, has less faith that the standards necessary for a true Semantic Web will ever be completely and effectively implemented.  In an interview on Xconomy he states that for semantic standards to work correctly content owners must markup the content once for the web and a second time for the semantic standards.  This requires extra work, and affords them the opportunity to perform content stuffing (SEO spam).

Read more

“Semantic Hack” Hackathon Announced for Semantic Technology & Business Conference

Semantic Hack - June 1, 2013 at the Semantic Technology & Business Conference

What could you build if the entire web was your database?

A hackathon has been added to the agenda of the Semantic Technology & Business Conference. Semantic Hack, organized by and Diffbot, will be an opportunity for developers and designers to work with RDF, SPARQL, OWL, entity extraction, natural language processing, sentiment analysis, newly available datasets, and other semantic technologies that help make the web more readable, accessible and dynamic for humans and more interpretable by machines. Semantic Hack is free to attend and prior experience with semantics is NOT required to participate.

Registration is open, but space is limited. Hackathon organizers are currently seeking coaches and sponsors; those interested in either role should contact the organizers.

  • Who: Developers, designers, and others interested in semantic technology
  • What: A day-long hackathon to build applications that help further expand the semantic web, or demonstrate the power of accessible web data
  • Where: Hilton San Francisco Union Square
  • When: Saturday, June 1, 2013, 9am – 9pm

Current sponsors include Bosatsu Consulting, The National Center for Biomedical Ontology, Protégé, and Stardog.

Diffbot Launches Page Classifier

Diffbot, a web content analysis startup that we spoke with last year, has launched a new beta API called Page Classifier. Sean Ludwig reports that Page Classifier can “can reveal the page type and language behind any URL… Diffbot’s first APIs emphasized the scanning, parsing, and extracting of information from web pages. Developers could use these APIs to scan articles or homepages to pull the most meaningful content. Now it will expand its dev cred with Page Classifier, which could have a variety of uses.” Read more

Diffbot – Finding Meaning Visually

Diffbot logoWe sat down with Mike Tung, CEO of Diffbot to learn more about this innovative technology that takes a different approach to deriving meaning from web pages. What is Diffbot?
Mike Tung: Diffbot is a technology that allows software applications to interpret web pages the way human beings do–visually.  We offer an API to developers that lets them visually extract semantic information from web pages depending on the page type.  We’ve observed that the entire web can be classified into roughly 30 structural page types and have trained our visual extraction algorithm on two of those page types so far–frontpage and article pages.

Read more

Diffbot Announces Winners from Web Mining Hack Day

SemanticWeb recently sponsored a Web Mining Hack Day. Diffbot, one of the vent organizers reports, “over 80 of Silicon Valley’s top hackers, designers, and students gathered at the Diffbot offices in Palo Alto in the hopes of building the next great Web 3.0 app. Over the next 13 hours, participants learned about data and analysis APIs, formed teams, and wrote code. At the end of the night, only one team could win the prize for best App.” Read more

Calling All Coders! Web Mining Hack Day Scheduled for June 25!

Web Mining Hack Day is pleased to announce that we are sponsoring a Web Mining Hack Day, Saturday, June 25, 2011 in Palo Alto, California. More details below after the jump.

Hosted by AOL, and organized by Diffbot and StartX (the Stanford University incubator), the Hack Day promises to be a great opportunity for back-end coders and UI/UX design experts to get together with the goal of building exciting semantic applications. The organizers suggest that participants will be able to:

  1. Meet and network with other web mining experts, hackers, and students.
  2. Learn about new semantic technologies and open web APIs.
  3. See the new the AOL West Coast Headquarters, StartX, Stanford University’s startup accelerator.  Have some pizza on us.
  4. Hack on new ideas and show off your projects.

Read more

Semantic Web Jobs: Diffbot

Diffbot, a semantic start-up in Palo Alto, CA, is looking for Machine Learning Interns and Web Development Interns. According to the post, “At Diffbot, we apply computer vision techniques to web documents to extract out semantic metadata. These services are used within hundreds of products at companies such as Cisco, Evernote, StumbleUpon, and AOL. We also offer free access to our technology to developers via an open API. Internally, we are using our technology to develop the next generation semantic results engine for the web. Check out for more information about our technology and APIs.” Read more