What do all these newly unveiled search engines Powerset, Yahoo! Correlator and Wolfram Alpha have in common? Well aside from touting themselves as google-killer semantic search engines, they all rely on Wikipedia for most (if not all) of their information. Semantic Web is a hard thing, mostly because it hasn't caught up with everyone and most of the data out on the wild are not annotated, that is they are not structured and in an easy to use format. To save the day, screen scraping and Natural Language Processing comes to rescue. Wikipedia on the other hand is a huge resource, and with "infoboxes" and other templates in articles (like native of cities, geodata, categories etc.) it seems to be a relatively easy target to mine (put its openness as an advantage also). This is the reason for the proliferation of web sites that try to make use of wikipedia in a more machine learning oriented way.

Apart from all the buzz around commercial applications like Powerset et al., there is an open source project called DBpedia which has been really successful in providing the infrastructure needed for such applications. If you want to learn more about DBpedia and its promises, I suggest you watch their excellent presentation made just a month ago, provided by videolectures. I am really excited about all the stuff that can be done with this kind of leverage. I am already thinking about using DBpedia in my current project. It seems that semantic web and screen scraping & NLP are moving to a convergence. I wonder how things will go in the next few years.