Mark Gould wrote a nice overview of Calais and here Because this was an introduction to Calais for a new audience oriented toward brand and marketing – I though it was worthwhile to respond with a basic overview of what Calais is about and why we’re doing it. Given that the response ended up being fairly lengthy – I though I’d share it here as well. Some general thoughts on the Semantic Web vs. The Semantic Stack, barriers to adoption, getting to critical mass and reality vs. philosophy.

First, thanks for taking note of Calais. We’re still deep in the learning curve and the more that different people with different needs think about it, try it out and give us feedback the better.

If you’re just starting to look into this area – a word of warning. It’s very important to distinguish between the vision of the Semantic Web and the stack – the defined set of standards – that will enable the Semantic Web. In my view the Semantic Web is an aspiration comprised of 1) use of the semantic stack and 2) a critical mass of adoption across the web. While we’re seeing many instances of adoption of the technologies – we have a long ways to go before we reach critical mass.

So – how do we move toward critical mass? What Calais is trying to do is address what we see as the central rate-limiting factor for adoption: the generation of high quality semantic metadata for unstructured content such as news, reports, novels – whatever. While the standards are well defined for how to represent this metadata we’re still left with one simple issue: it takes time and it costs money. Given that the “semantic consumer” end of the story is still relatively undeveloped, few writers and publishers can afford to invest that time and money.

Calais doesn’t solve this problem – but it does throw some fuel on the fire. By automating the generation of semantic metadata with a very high degree of accuracy we hope to jumpstart the adoption curve. If there’s lots of semantic content out there people will build great semantically enabled applications. If there are great applications people will invest in semantically enabled content.

The best way to take it for an initial spin is with the Calais viewer application at . Copy a news article or such, paste it in and see how we do. In general you’ll see better results with the viewer than with because the proxy has additional work to do such as cleaning HTML pages. This work can create noise that reduces accuracy.

One last point. You don’t have to believe in or even agree with all of the philosophy around the Semantic Web to take advantage of it. There are a well-defined set of standards from RDF to SPARQL and capabilities such as Calais that can add value to what you’re doing today. Grab a piece of that stack and make something cool happen.


