Bringing structured data to users from unstructured web content – that’s what Webhose.io is offering in the way of a new API service that works with millions of blogs, forums, reviews and news sites, and comments posts. The company is leveraging its technology roots as a message board search engine, Omgili, to the work of getting posts’ clean text content, dates, authors, links, language and so on out in JSON, XML and RSS formats rather than as unstructured content via an HTML file.
For Omgili to do its job in the message board space, a world untouched by microformats and other semantic web structures, required creating a crawler that used heuristic techniques to extract text, titles and other details from those posts, he says. “Once we created that we were able to extract data from less complex sources like blogs, news sites and so on,” Geva says.
The output is similar to what import.io does (see The Semantic Web Blog post on that technology here), but on a much larger scale,” says founder Ran Geva. The import.io service, he says, is great when users want specific data from one or two sites but more is required for heavy lifting when you want to leverage a lot of data. “We save and download millions of posts per day,” he says. “When it comes to getting structured content out of complicated sources on a mass scale, that’s a unique technology challenge we conquered a while ago.”
Radiance Technologies is looking for a Senior Software Engineer in Rome, NY. The post states, “Radiance Technologies, a rapidly growing employee owned company supporting Air Force Research Laboratory Information Directorate is searching for a talented, Software Engineer to join our development team. We are looking for a self-starter programer experienced working with Semantic Web Technologies to create and maintain Domain Ontologies for government projects.” Read more
Cambridge Semantics Adds Keylines Network Visualization Tool to Deliver New Clarity to Big Data Analytics
CAMBRIDGE, Mass. (PRWEB) December 18, 2014 — Cambridge Semantics, the leading provider of smart data solutions driven by Semantic Web technology, today announced the integration of its Anzo Smart Data Platform (Anzo SDP) with the KeyLines network visualization tool. The advancement will enable business analysts and IT professionals in Global 2000 companies to gain new “big picture” business insights from their big data queries on diverse data.
“The combination of our Anzo SDP semantic technology-driven solution and the KeyLines graph visualization technology enables our customers to enjoy fast, easy and accurate views of relationships, hierarchies and patterns within the data to facilitate better investigations, root cause analysis and new business insights,” said Alok Prasad, president of Cambridge Semantics. Read more
Deep Speech is a new system for speech, built with the goal of improving accuracy in noisy environments (for example, restaurants, cars and public transportation), as well as other challenging environments (highly reverberant and far-field situations).
Key to the Deep Speech approach is a well-optimized recurrent neural net (RNN) training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allowed Baidu researchers to efficiently obtain a large amount of varied data for training. Read more
Booz Allen Hamilton is looking for a Senior Systems Administrator in McLean, VA. According to the post, this position will “Perform primary or alternate systems and site administration duties in support of FSG infrastructure, including IBM Rational Suite, Atlassian Suite, and HP Quality Management Suite and custom applications, including Process Director, SharePoint Program Management Environment, and Data Drill. Perform custom application life cycle management systems integration and software and database component development. Perform service desk functions, including customer support, account management, change and configuration management, user communications, and incident and problem management.” Read more
NEW YORK & TEL AVIV, Israel–(BUSINESS WIRE)–Jinni, the world’s leading semantic discovery solution provider for linear TV and on-demand content announced today that it is powering the semantic taste-based discovery experience features for VUDU, a Walmart company and a leader in OTT video streaming with the world’s largest HD library. VUDU has joined other leading US content providers who have adopted Jinni’s Emmy® award winning semantic technology to provide users with an intuitive content discovery experience based on their personal tastes and mood. Jinni’s solution is part of VUDU’s next generation UI providing millions of online US users with mood based search and taste based recommendations over smart TVs, desktops, game consoles, second screen and internet-connected DVD/Blu-ray players. Read more
Amit Chowdhry of Forbes reports, “Facebook has over a trillion status updates, text posts, photos and pieces of content archived, which is why the social network company has been heavily focused on improving its search engine. Over the last few years, Facebook displayed results from Bing.com for keyword searches since they had a partnership with Microsoft. However, Facebook recently decided to completely remove Microsoft Bing search results. ‘We’re not currently showing web search results in Facebook Search because we’re focused on helping people find what’s been shared with them on Facebook,’ said a Facebook spokesperson in an interview with Reuters. ‘We continue to have a great partnership with Microsoft in lots of different areas’.” Read more
The former lead architect at the BBC who handled its semantic publishing projects, such as its FIFA World Cup 2010, 2012 Olympics and redesigned BBC Sports Site, today is building up the semantic infrastructure at another media might, The Financial Times
Jem Rayfield, who holds the title Head of Solution Architecture Technology there, is rebuilding the Financial Times’ whole publishing stack to use semantic technology. “We are working on republishing the architecture on the back end, basically engineering the whole of the backend stack to use the RDF model” for data interchange, he says. Its work involves modeling ontologies for companies, organizations, brands, exchanges, shares, financial instruments and other key business terms.
Syapse is looking for a Senior Semantic Server Engineer in Palo Alto, CA. According to the post, “We are looking for an experienced back end developer to join us in building out our server-side semantic data stack. At Syapse, semantic technologies (RDF, SPARQL, OWL) are our foundation for organizing and integrating biomedical, genomics, and clinical data. Building this foundation out involves a mix of applying innovative technologies, and building an industrial strength, scalable back end. You will be joining a team that includes some semantic technology veterans, but has plenty of room for fresh thinking and design. You will be working with a mix of open-source and proprietary software to construct the back end for our data platform and application suite.” Read more
NEXT PAGE >>