Lars Hard of Beta News recently wrote, “Artificial intelligence (AI) has become a bit of a buzzword among technology professionals (and even within the mainstream public) but truthfully, most people do not know how it works or how it is already being integrated within leading enterprise businesses. AI for businesses is today mostly made up of machine learning, wherein algorithms are applied in order to teach systems to learn from data to automate and optimize processes and predict outcomes and gain insights. This simplifies, scales and even introduces new important processes and solutions for complex business problems as machine learning applications learn and improve over time. From medical diagnostics systems, search and recommendation engines, robotics, risk management systems, to security systems, in the future nearly everything connected to the internet will use a form of a machine learning algorithm in order to bring value.” Read more
Chloe Green of Information Age recently wrote, “Handling immense data sets requires a combination of scientific and technological skills to determine how data is stored, searched and accessed. In science, the importance of data scientists in ensuring that data is handled correctly from the outset is not underestimated; other industries can learn from the scientific approach. Text-mining tools and the use of relevant taxonomies are essential. If we think about big data as a huge number of data points in some multi-dimensional space, the problem is one of analysis, i.e. frequently finding very similar or very dissimilar points which cannot be compared. In life sciences, taxonomies assign data points a class, thus comparison of two points is as easy as looking up other data points in the same class.” Read more
Seth Grimes, president and principal consultant of Alta Plana Corp. and founding chair of the Sentiment Analysis Symposium, has put together a thorough new report, Text Analytics 2014: User Perspectives on Solutions and Providers. Among the interesting findings of the report is that “growth in text analytics, as a vendor market category, has slackened, even while adoption of text analytics, as a technique, has continued to expand rapidly.”
Grimes explains that in a fragmented market, consisting of everything from text analytics services to solution-embedded technologies, the opportunities for users to practice text analytics is strong, but that increasingly text analytics is not the main focal point of the solutions being leveraged.
Reflecting the diversity of options, respondents listed among their providers a number of open-source offerings such as Apache OpenNLP and GATE, API services such as AlchemyAPI and Semantria, and enterprise software solution and business suite providers like SAP. The word cloud above was generated by Alta Plana at Wordle.net to show how users responded to the question of companies they know provide text/content analytics functionality. Nearly 50 percent of users are likely to recommend their most important provider.
OpenText yesterday made its secure file sharing and synchronization product, Tempo Box, available for free to customers using its OpenText Content Suite enterprise information management tool.
“A lot of our customers have major concerns about employees sharing documents with cloud tools like Dropbox,” says Lubor Ptacek, vp of strategic marketing. They want them to be available, synched and sharable across all their devices, but using such services can create security and compliance problems. By deploying Tempo Box on top of their existing infrastructure, at no charge to all internal employees and any external parties they may need to share content with, companies get a seamless and cost-effective way to share files in the cloud without compromising security, records management requirements and storage optimization, he says – “the things that enterprise customers care about, especially those operating in regulated environments.”
Among those capabilities is applying automatic content classification, which is usually required for records management reasons – for example, helping companies determine if a document is an employee record they must keep for five years or a tax record they have to hold for seven years. That under-the-hood classification engine is an outgrowth of OpenText’s acquisition a few years back of text mining, analytics and search company Nstein. Since the acquisition, says Ptacek, the company has been looking at ways to apply the technology to specific business problems and make it part of its applications.
Data scientists can add another tool to their toolset today: GraphLab has launched GraphLab Create 1.0, which bundles up everything starting from tools for data cleaning and engineering through to state-of-the-art machine learning and predictive analytics capabilities.
Think of it, company execs say, as the single platform that data scientists or engineers can leverage to unleash their creativity in building new data products, enabling them to write code at scale on their own laptops. The driving concept behind the solution, they say, is to make large-scale machine learning and predictive analytics easy enough that companies won’t have to hire huge teams of data scientists and engineers and build the big hardware infrastructures that lie behind many of today’s Big Data-intensive products. And, the data scientists and engineers that do use it won’t need to be experts at machine-learning algorithms – just experienced enough to write Python code.
Versium Leverages Microsoft Azure Machine Learning For New Predictive GivingScore Solution To Improve Fundraising
Versium, which earlier this year launched its Predictive FraudScore solution (covered here) today releases its Predictive GivingScore solution, designed to help charitable institutions and political organizations better predict who is likely to donate, be a repeat donator, or make the more significant contribution. PredictiveGiving Score is the latest of the company’s predictive Score products, which also include churn, social influencer and shopper scoring – and it’s by no means the last.
It was built with Microsoft Azure Machine Learning, a managed cloud service for building predictive analytics solutions publicly unveiled just a short time ago. CEO Chris Matty says that platform is an aid to Versium in rapidly building its new score solutions. (Just shy of ten Versium scoring products are currently in use or in development.) Azure ML, Matty notes, contains dozens of machine learning algorithms and mathematical computation models it leverages to easily and effectively experiment, create and tune models to get the highest accuracy in predictive scoring solutions.
“Once we have a score built it just takes little tuning. But when we are building a new score we need to look at some different models and see what works better,” he says. “We want to move quickly by evaluating the different models, and we can visualize very easily the process of building the predictive model.”
Daedalus (which The Semantic Web Blog originally covered here) has just made its Textalytics meaning-as-a-service APIs available for Excel and GATE (General Architecture for Text Engineering), a JAVA suite of tools used for natural language processing tasks, including information extraction in many languages. Connecting its semantic analysis tools with these systems is one step in a larger plan to extend its integration capabilities with more API plug-ins.
“For us, integration options are a way to lower barriers to adoption and to foster the development of an ecosystem around Textalytics,” says Antonio Matarranz, who leads marketing and sales for Daedalus. The three main ecosystem scenarios, he says, include personal productivity tools, of which the Excel add-in is an example, and NLP environments, of which GATE is an example. “But UIMA (Unstructured Information Management Applications) is also a target,” he says. The list also is slated to include content management systems and search engines, among them open source systems like WordPress, Drupal, and Elasticsearch.
James Kobielus of InfoWorld recently wrote, “Machine-generated log data is the dark matter of the big data cosmos. It is generated at every layer, node, and component within distributed information technology ecosystems, including smartphones and Internet-of-things endpoints… Clearly, automation is key to finding insights within log data, especially as it all scales into big data territory. Automation can ensure that data collection, analytical processing, and rule- and event-driven responses to what the data reveals are executed as rapidly as the data flows. Key enablers for scalable log-analysis automation include machine-data integration middleware, business rules management systems, semantic analysis, stream computing platforms, and machine-learning algorithms.? Read more
Nancy Gohring of Computerworld recently wrote, “The market for connected devices like fitness wearables, smart watches and smart glasses, not to mention remote sensing devices that track the health of equipment, is expected to soar in the coming years. By 2020, Gartner expects, 26 billion units will make up the Internet of Things, and that excludes PCs, tablets and smartphones. With so many sensors collecting data about equipment status, environmental conditions and human activities, companies are growing rich with information. The question becomes: What to do with it all? How to process it most effectively and use it in the smartest way possible?” Read more
So much for a new take on an old joke. The real answer is about six figures, information I recently came across in a report released earlier this spring: Burtch Works Executive Recruiting survey, Salaries of Data Scientists. The median base salary of data scientist managers is $160,000, it says, while individual contributors average about $120,000. The information comes from 171 data scientists for whom the recruiting firm has complete and current information. Whether a data scientist is at a lower or higher job level, across the board he or she is doing financially better than other Big Data professionals, the report shows.
NEXT PAGE >>