The Semantic Web in Industry Today

by, Nick Berente
Disclaimer: This web site is intended to be a broad introduction to concepts associated with the semantic web and its current application. It is a result of my own investigation into these concepts, and as a non-technical person, it is limited by my understanding.


The World Wide Web enables people to share publish information largely in the form of HTML pages like this one. This data is syntactic. That is, the content of the web is comprised of raw symbols that require a human to interpret (syntax).

In the future, many believe that data will be effectively contextualized to enable computers to interpret meaning (semantics) accurately. When computers can understand the information within the web, a great deal of additional power is expected to be unleashed from the Internet. This vision of the Internet, where the content is understandable by computers and unequivocally understood by humans, is known as "the semantic web."

The technologies underlying the semantic web enable people to associate meaning with the information they put in web pages. This is done through RDF tags that link data to its meaning. Meaning takes the form of URIs, and can be interpreted across groups through standard ontologies, or OWLs. See the “in-depth links� to the left for more detailed explanations.

Most people agree that the realization of a broad semantic web is still quite a way off in the future, if ever (see “Metacrap� to the left). One much-discussed area where semantic web principles are present today are in social networking applications such as “friend of a friend� (FOAF, ). Similar networking implementations exist in other forms such as localized social networking ( ), business networking ( ) and all-purpose networking groups ( ).

There is an often overlooked area, however, where organizations are beginning to apply the principles of the semantic web in what looks to be a promising direction: business data integration.

Business Data Integration

The documented information and knowledge of organizations typically lies in relational databases of specific applications, within text-based documents, web-based collaboration applications, etc. These programs typically do not freely interchange data within an organization, let alone between organizaitons. The current solution to this problem is enterprise application integration (EAI) software. EAI solutions tend to require significant customization and maintenance of common data models, and are therefore often quite expensive, troublesome, and require constant attention.

An initial application for semantic web technologies looks to be an alternative method for extracting data from a wide array of applications in a usable form. Below are five industry examples of pioneering vendors that use semantic web principles in business data integration applications: Autonomy, Celcorp, Network Inference, Stratify, and Verity.

Autonomy created the IDOL server, which uses advanced pattern mapping techniques to associate meaning with unstructured text data. The engine is based on non-linear adaptive signal processing and is "rooted in theories of Bayesian Inference and Claude Shannon ' s Principles of Information." NASA uses IDOL server to help the software engineers that are developing earth and star observing platforms to sort through huge quantities of data in a personalized fashion.

Celcorp ' s products utilize a recorder to capture task information, and stores this information as models in a knowledge base. A reasoning engine and agent technology are employed to interpret this knowledge base and present data accordingly. CIT’s commercial services unit is an example of a successful implementation of Celcorp ' s technology. CIT has reduced the time it takes to settly invoice disputes from over 8 days to under 1 day by using Celcorp software to access invoice information within internal and across multiple external systems. (

Network Inference
Network Inference has developed an inference engine that adaptively interprets data of multiple forms, and mediates conflicts of meaning using standard ontologies. Based on WDC standards, Network Inference addresses a wide array of applications where a semantic web solution is superior to traditional systems that store data in “non-reusable software algorithms.� Network Inference offers an example of an electronic components manufacturer that took their quarterly Wall-Street reporting cycle from six weeks to one day. (

Stratify develops software that automatically categorizes and classifies data from within the applications themselves using APIs. Classifications are generated from the taxonomy server, and standard taxonomies are the means by which users effectively access data and documents from multiple applications. Dialog, an on-line news service, has purchased the stratify taxonomy server to classify real-time news data and documents for their clients. (

A publicly traded business data integration firm that uses semantic web technology, Verity is a $100+ million provier of knowledge management solutions. Within their suite of products are tools for taxonomy management, automatic classification of data, and data extraction from a variety of applications. One unique application for their taxonomy management software is the researching needs for a large multi-national law firm. In addition to other searching enhancements, lawyers from around the world will be able to search for information across multi-language data sources using a uniform taxonomy. (

The organizations listed above are not meant to represent an exhaustive list, or even an accurate cross section. Rather, they were chosen because each company offered an example of their application being used in industry, and each of the five companies describe their mechanism of structuring unstructured data differently. Below are some additional firms that are doing data integration on principles of the semantic web (again, not an exhaustive list) :

Brandsoft - Resource Manager uses semantic web technology to manage enterprise web content and applications

ClearForest - Text-based bridge between structured and unstructured data

Cogito - knowledge management solucion that accesses data from various databases then ' atomizes ' it, automatically creating documents, etc.

Contivo - Vocabulary Management Solution uses dictionary and thesaurus in its integrator to access semantic information from legacy and flat file data

Cyscom - semantic data integration engine that structures MS Office data for ERP and other business applications

Empolis - solutions for rationalizing business processes and processing both structured and unstructured information

Enigmatec - Execution Management System is targeted at companies who want to build agile applications that take advantage of grid computing

HP - HP has a semantic web research group that, among other things, developed ' Jena ' - a semantic web toolkit

IBM - IBM ' s Institute of Search and Text Analysis, among other things, developed an "Unstructured Information Management Architecture"

Metatomix - leverages semantic web based technologies to build enterprise resource interoperability platforms that correlate data from multiple sources

Pantero - engine that uses metadata to model data exchange across service oriented architectures (SOAs)

Semagix - Freedom architecture at the core of semantic web based solutions for content & knowledge management, homeland security, and anti-money laundering

Semaview - semantic web based calendaring (excellent semantic web intro white paper)

TopQuadrant - offers semantic web based consulting services working off of ' capability cases, ' or best practices for specific issues

Tucana - semantic web suite of products and services targeted at enterprise information integration

UB Access - Semantic Web Accessibility Platform enables companies to make web content accessible through "non-invasive" technologies

Unicorn - consulting-based product that creates custom information model and mapes existing data to that model

Based on this cursory analysis, it appears that there are two main types of data integration applications for semantic web based technologies. The first is to somehow access unstructured data, and put meaning around it, giving it structure. The second is to map existing forms of structured data and map them to each other, essentially bridging their structures. Of course, some companies look to do both of these.

Two organizaitons that are encouraging certain semantic-based standards are XBRL and UDDI.


The eXtensible Business Reporting Language (XBRL) was created by an international group of hundreds of organizations with the purpose of standardizing business reporting, such as financial statement extraction. The organization has developed a taxonomy for business data, and will work with companies to implement XBRL tags within their data so that uniform, standard, and easily-compared reporting data can be quickly generated by member organizations. Companies currently reporting using XBRL include Microsoft, Edgar Online, Reuters, and TSX Group ( Canada ).

The Universal Description, Discovery and Integration (UDDI) protocol was created by a consortium of enterprise software vendors, and is intended to enable dynamic interactions between enterprise applications through web services defined by a standard taxonomy rather than static APIs. An application of a UDDI registry would be to offer partnering businesses a web-based “service broker� linked with the ERP systems of both parties and authenticated using XML Digital Signatures ( ). Developers can register their products as UDDI compliant through IBM, Microsoft, or SAP.

Research Agenda

To study the implementation of semantic web principles in business data integration, a researcher might take a number of approaches. One approach is to fully understand the paradigms, technical foundation, and standards of the competing products and paradigms, then follow the industry as “dominant designs� evolve. A second approach might be to understand the social ramifications of technology that can potentially enable more flexible access to information at broader levels of cross-application and cross-organization transparency.

