Get Your Site Submitted for Free in the World's Largest B2B Directory!

Email Address:
* URL:
*
*Indicates Mandatory Field

Terms & Conditions

TheDevWeb
FlashNewz
DevWebPro





How To Find Semantic Links On A Website

By Mads Kristensen
Expert Author
Article Date: 2008-07-16

Imaging a visitor that enters his website URL into a textbox and when he clicks the submit button, you are able to retrieve all kinds of information from the guy.

His name, company info, online profiles, interests etc. all this from just a URL. It's actually pretty easy if the website contains information about FOAF, APML or SIOC documents.

What you have to do is to download the HTML from the website and look for <link> elements in the header that matches FOAF, APML or SIOC type links. Then retrieve the URL to those documents from the href attribute and load it into an XML document. Now you can use XPath to find all the information you need.

Here's is what a FOAF link element looks like:

<link type="application/rdf+xml" rel="meta" title="FOAF" href="http://example.com/foaf.xml" />

SIOC and APML links uses the same attributes in the same way, so we can use the title attribute to figure out which kind of document it is. All we need is a method that uses regular expressions to retrieve the document URLs from the HTML.

The code

This is a method that finds all the semantic links of a certain type in a HTML string.



Example

To find all the FOAF links in a page you can write something like this:



If you want to search for APML or SIOC then just replace "foaf" with either "apml" or "sioc" in the method parameter. You might also want to take a look at my experimental FOAF parser class.

Comments

About the Author:
Mads Kristensen currently works as a Senior Developer at Traceworks located in Copenhagen, Denmark. Mads graduated from Copenhagen Technical Academy with a multimedia degree in 2003, but has been a professional developer since 2000. His main focus is on ASP.NET but is responsible for Winforms, Windows- and web services in his daily work as well. A true .NET developer with great passion for the simple solution.

http://www.madskristensen.dk/




Newsletter Archive | Article Archive | Submit Article | Advertising Information | About Us | Contact

TheDevWeb is an iEntry, Inc. ® publication - 1998-2008 All Rights Reserved Privacy Policy and Legal