2006-07-21

Semantic Web - Reinventing the Wheel

Since computer scientists don't bother with history and can't stomach looking through all the stuff that has been published in the field over the last 50 years or so, we constantly have situations where people are actually reinventing the wheel, so to speak. Some examples:

  • Parsing was dead as a doornail. Compiler construction was being removed from curricula all over the world. Then XML shows up as a great way to transfer data between applications because the data is tagged to suggest semantics. I am greatly amused at the feeble efforts some start to try and parse these rather well-formed terms. People tend to be very surprised that there is acutally mathematics behind this (going back to the '50s!) that tell us how to parse, for example, things that can be expressed with a context-free grammar.
  • We've known how to set up good file systems with Unix for years. But Windows keeps making up new systems that are broken. Ditto for graphical systems, databases, operating systems, you name it. People just seem to start from scratch without looking at best practices.
  • The Semantic Web takes the cake, in my opinion. I am violently in disagreement with the opinion that one can set up a hierarchy of key words ("Ontology") for describing all of something. I have worked too long in library systems to believe this. If only people would look at how libraries look for materials, we would all be better off.
    Then they go about trying to "match" things. They use extremely primitive methods of matching, often using string matchers. There is an entire body of knowledge on the topic of string matching alone , of which many authors about the Semantic Web seem to be entirely ignorant. They completely ignore most of the work done in mechanical verification about term rewriting (a great book by Tobias Nipkow is "Term Rewriting and all That") and unification, both great mathematical tools for finding things that would be the same if we could force this bit to match that bit.
    Yes, I know. There is a lot of scary mathematics in there. Live with it!
How can we get our colleagues to really look at what has been done before before they start doing stuff? We seem to need to have a web of all the materials already published in computing, and then a semantic web to help us find stuff......

No comments: