Wade Roush of Xconomy last week wrote, “In tech journalism, it’s inadvisable to call any company ‘the next Google.’ It’s almost always breathless hype or marked naïveté. After all, people have been predicting the search giant’s demise for nearly as long as the company has existed. I wrote a Technology Review cover story called ‘Search Beyond Google’ nearly 10 years ago. But with unlimited brainpower and money at its disposal, the company has managed to stay at the forefront in search, while also getting very good at other things, like mobile hardware. So when I tell you that a seven-employee company called Diffbot really could be the next Google, I need to be very specific about what I mean.” Read more
Posts Tagged ‘Diffbot’
Jordan Novet of Venture Beat reports, “Analyzing text on the Internet to measure how positive it is — product reviews on Amazon.com, for example — has become easier and less expensive with tools from AlchemyAPI, Semantria, and other companies. But finding the text actually worth mining can be a chore in itself. To do this, Semantria has announced a formal partnership with a company called Diffbot that does the grunt work of finding important passages. Diffbot uses what it calls ‘computer vision’ technology to scour websites for meaningful information, shedding things like complex surrounding Web code. It then churns out clean text for analysis. Once Diffbot supplies Semantria with the structured text, Semantria assesses its meaning and tone. Semantria’s goal is to “bring text and sentiment analysis into the hands of a nontechnical person in under 3 minutes and for less than $1,000,” according to founder and chief executive Oleg Rogynskyy.” Read more
John Davi of the Diffbot blog recently wrote, “Diffbot’s human wranglers are proud today to announce the release of our newest product: an API for… products! The Product API can be used for extracting clean, structured data from any e-commerce product page. It automatically makes available all the product data you’d expect: price, discount/savings amount, shipping cost, product description, any relevant product images, SKU and/or other product IDs. Even cooler: pair the Product API with Crawlbot, our intelligent site-spidering tool, and let Diffbot determine which pages are products, then automatically structure the entire catalog.” Read more
How will webpage data be interpreted in the next few years? The Semantic Web community has high hopes for ever evolving semantic standards to help systems identify and extract rich data found on the web, ultimately making it more useful. With the announcement of Schema.org support for GoodRelations in November, it seems clear semantic progress is now being made on the e-commerce front, and at an accelerated rate. Martin Hepp, founder of GoodRelations, estimates the rate of adoption of rich, structured e-commerce data to significantly increase this year.
However, Mike Tung, founder and CEO of a data parsing service called DiffBot, has less faith that the standards necessary for a true Semantic Web will ever be completely and effectively implemented. In an interview on Xconomy he states that for semantic standards to work correctly content owners must markup the content once for the web and a second time for the semantic standards. This requires extra work, and affords them the opportunity to perform content stuffing (SEO spam).
What could you build if the entire web was your database?
A hackathon has been added to the agenda of the Semantic Technology & Business Conference. Semantic Hack, organized by SemanticWeb.com and Diffbot, will be an opportunity for developers and designers to work with RDF, SPARQL, OWL, entity extraction, natural language processing, sentiment analysis, newly available datasets, and other semantic technologies that help make the web more readable, accessible and dynamic for humans and more interpretable by machines. Semantic Hack is free to attend and prior experience with semantics is NOT required to participate.
Registration is open, but space is limited. Hackathon organizers are currently seeking coaches and sponsors; those interested in either role should contact the organizers.
- Who: Developers, designers, and others interested in semantic technology
- What: A day-long hackathon to build applications that help further expand the semantic web, or demonstrate the power of accessible web data
- Where: Hilton San Francisco Union Square
- When: Saturday, June 1, 2013, 9am – 9pm
Diffbot, a web content analysis startup that we spoke with last year, has launched a new beta API called Page Classifier. Sean Ludwig reports that Page Classifier can “can reveal the page type and language behind any URL… Diffbot’s first APIs emphasized the scanning, parsing, and extracting of information from web pages. Developers could use these APIs to scan articles or homepages to pull the most meaningful content. Now it will expand its dev cred with Page Classifier, which could have a variety of uses.” Read more
SemanticWeb.com: What is Diffbot?
Mike Tung: Diffbot is a technology that allows software applications to interpret web pages the way human beings do–visually. We offer an API to developers that lets them visually extract semantic information from web pages depending on the page type. We’ve observed that the entire web can be classified into roughly 30 structural page types and have trained our visual extraction algorithm on two of those page types so far–frontpage and article pages.
SemanticWeb recently sponsored a Web Mining Hack Day. Diffbot, one of the vent organizers reports, “over 80 of Silicon Valley’s top hackers, designers, and students gathered at the Diffbot offices in Palo Alto in the hopes of building the next great Web 3.0 app. Over the next 13 hours, participants learned about data and analysis APIs, formed teams, and wrote code. At the end of the night, only one team could win the prize for best App.” Read more
SemanticWeb.com is pleased to announce that we are sponsoring a Web Mining Hack Day, Saturday, June 25, 2011 in Palo Alto, California. More details below after the jump.
Hosted by AOL, and organized by Diffbot and StartX (the Stanford University incubator), the Hack Day promises to be a great opportunity for back-end coders and UI/UX design experts to get together with the goal of building exciting semantic applications. The organizers suggest that participants will be able to:
- Meet and network with other web mining experts, hackers, and students.
- Learn about new semantic technologies and open web APIs.
- See the new the AOL West Coast Headquarters, StartX, Stanford University’s startup accelerator. Have some pizza on us.
- Hack on new ideas and show off your projects.
Diffbot, a semantic start-up in Palo Alto, CA, is looking for Machine Learning Interns and Web Development Interns. According to the post, “At Diffbot, we apply computer vision techniques to web documents to extract out semantic metadata. These services are used within hundreds of products at companies such as Cisco, Evernote, StumbleUpon, and AOL. We also offer free access to our technology to developers via an open API. Internally, we are using our technology to develop the next generation semantic results engine for the web. Check out http://diffbot.com for more information about our technology and APIs.” Read more