RSS Content Syndication
RSS is a number of things to a number of different communities. RSS is an XML vocabulary for describing a Web site that happens to be ideal for lightweight content syndication. Today, RSS is one of the most widely used Web site XML applications.
Its popularity and wide use has uncovered utility in many more scenarios than originally was anticipated by its creators. Therefore, RSS can also be thought of as a portal content language, as a metadata syndication framework, and even as a content syndication system.
You can see the model for RSS in Figure 13.5. Content providers embed RSS into their HTML pages. These pointers are aggregated and then made available to a larger audience through the aggregator portal.
History of RSS
RSS was originally introduced in 1999 by Netscape as a channel description frame-work for its My Netscape Network (MNN) portal (http://dmoz.org/Netscape/ My_Netscape_Network/). RSS is simply an XML application that provides a novel con-tent-gathering mechanism that’s beneficial to Netscape, those providing content, and those using the content on the Web. RSS enables content gathering by providing a simple “snapshot in a document” for Web sites. This document enables Web sites to acquire an audience through the presence of their content on the My Netscape portal. Also, RSS gives users a centralized location into which content from their favorite Web sites flow to enable a one-stop reading experience.
As a result of My Netscape Network, users soon found that RSS could be used as an XML-based lightweight syndication format for headlines. Using RSS, headlines could be taken outside the My Netscape Network site and used in other RSS-based portals. Examples such as xmlTree (http://www.xmltree.com) began to cater to general subject markets and to specialized vertical markets as well. RSS gained grassroots acceptance and quickly became a viable option to ad-hoc syndication systems being developed by commercial interests. RSS adoption has flourished because it provides for simple syndi-cation without unnecessary complexity or bulk. Today, RSS feeds carry various content types to thousands of Web sites, including CNET, CNN, Disney, Forbes, Motley Fool, Wired, Red Herring, Salon, Slashdot, and ZDNet.
In order for RSS to work, a mechanism for finding RSS feeds was needed. One solution is the RSS registry. The first step toward establishing an RSS registry was Internet Alchemy’s OCS format. This format provides a way of listing RSS channels that have been made available on a Web site. As the number of RSS feeds grew, the next step was the establishment of registries. XmlTree (http://www.xmltree.com) is a registry that provides a facility for RSS content to be registered and classified for end use. UserLand (http://my.userland.com) provides a registry facility as well.
RSS Shift Toward Syndication
If My Netscape Network was the first RSS portal, UserLand was the first RSS aggregator. The main difference between My Netscape Network and UserLand is archiving. My Netscape Network displays only the latest version of RSS channel feeds. UserLand archives snapshots of content on a hourly basis. The revolutionary advance that aggregators brought was the ability to decouple items from the parent channels. This means that RSS can be presented as the intersection of simultaneous feeds from disparate sources to focus on timeliness, not on the channel. Meerkat (http://www.oreillynet. com/meerkat), an open wire service, presents items in reverse chronological order, but also allows for filtering, grouping, sharing, and searching.
The real shift of RSS toward syndication began when RSS 0.91 was released. In this ver-sion, RSS dropped RDF and became a simple XML vocabulary. RSS 0.91 added new item-level <description> tags that enabled RSS to clearly move into content syndica-tion. The description field had a 500-character constraint. This enabled RSS to carry more than a headline but still limited its ability to carry heavyweight content.
As use of RSS increased, the user audience began to voice a need for enhancements. The item-level title and description elements were being overloaded with metadata and HTML, as some tried to use RSS for more than what it was intended. Some people began to insert unofficial ad-hoc elements to augment the metadata facilities within RSS 0.91. Therefore, we see the use of elements such as <category>, <date>, and <author>. The evolution of RSS seemed to be inevitable. RSS needed a richer metadata framework and a way to become extensible. But it also needed to be backward compatible so that the entrenched user base could continue to work with RSS. The issue was how to make this happen in a unified fashion.
It turns out that a new group, RSS-DEV, began to work on a new version of RSS that met its requirements. This version of RSS moved ahead to include namespaces and bring RSS back to RDF for metadata specification. RSS-DEV released RSS 1.0 in December of 2000.
The original version of RSS (RSS 0.9+) is currently being maintained and advanced by the open-source community working with UserLand. One of the goals of the RSS 0.9+ group is to advance RSS capabilities while maintaining its simplicity. According to Dave Winer of UserLand, “Today, RSS is simple, largely because it only builds on XML 1.0 and does not use namespaces or schemas, and it isn’t a dialect of RDF. There’s a logical route forward for RSS that says it should adapt to include all these concepts, but in doing so it would become vastly more complex, and, at the content provider level, would buy us almost nothing for the added complexity.”
This leaves us with a lack of clarity about what RSS is and which version of RSS we should use. The reality is that some sites have a preference for one RSS version over the other. Other sites support both versions of RSS. This is not too much different from the browser wars between Netscape and Microsoft—and the implications for those trying to use the “standard” are much the same! There has been talk of giving new names for each different flavor of RSS, retaining RSS for 0.9+ and earlier, and giving RSS 1.0 a new name. To date, there has not even been consensus among the communities on the name, so for the moment, everyone continues to use “RSS” for both flavors of RSS.
Three easy steps are required to use RSS on your Web site:
Create and maintain RSS files for your Web site.
Register your RSS files with an RSS aggregator.
Publish relevant RSS content from others on your site.
You’ll learn more about using RSS in this section.
Introduction to RSS Elements
Because RSS is an XML vocabulary, it follows the XML well-formedness rule that all RSS elements must nest inside one root element. For RSS, that element is <rss>. RSS has a single, required child element, <channel>. See Listing 13.6 for the XML element declaration for RSS.
LISTING 13.6 Root Element Declaration in RSS .91 DTD
<!ELEMENT rss (channel)> <!ATTLIST rss
version CDATA #REQUIRED><!--version must be filled in here!> -->
RSS is made up a rather simple set of elements and subelements. The basic layout of the RSS file is as follows:
RSS root element
Image listings (optional, you can list several)
Item listings (one or more)
The channel element is made up of a number of channel metadata fields. In RSS .91, these fields are predefined, and hence not extensible. Some fields within <channel> are optional and others are required. Here’s a list of these fields:
• title. The title of the RSS channel. The title is how people identify your service. The title of your channel should be the same as the title of your HTML Web site. The maximum length is 100 characters. This field is required.
• link. A URL pointing to the Web site named in the <title> element. The maxi-mum length is 500 characters. This field is required.
• description. A phrase that describes your channel—your channel’s positioning statement. The maximum length is 500 characters. This field is required.
• language. Indicates the content language of the channel. This is intended to allow aggregators to group all Spanish language sites, for example, on a single page. This field is required (enumerated value selection in RSS specification).
• copyright. The copyright notice for content. The maximum length is 100. This field is optional.
• managingEditor. The e-mail address of the managing editor of the channel. The maximum length is 100. This field is optional.
• webmaster. The e-mail address of the Webmaster of the channel. The maximum length is 100. This field is optional.
• rating. The PICS rating for the channel. The maximum length is 500. This field is optional.
• pubDate. The publication date of the channel. It must conform to the date/time standard (RFC 822). This field is optional.
• lastBuildDate. The last time the content of the channel was updated (RFC 822). This field is optional.
• docs. The URL for the documentation for the coding of the RSS site. This field is optional.
• textInput. Contains the required subelements <title>, <link>, <description>, and <language> for each text input field. This field is optional.
• skipDays. Contains any number of <day> subelements, such as <day>Friday</day>, that indicate days on which aggregrators may not read this channel.
• skipHours. Contains any number of <hour> subelements, such as <hour>14 </hour>, that indicate hours in GMT on which aggregrators may not read this channel.
In addition to the elements that give aggregrators information about the channel, the channel element contains one or more <item> elements. Each <item> element is an item of content, such as a news story. The <item> element is made up of three required subelements designed to assist aggregrators.
• title. The title of the item. The title is how people identify the content within the channel. The maximum length is 100 characters.
• link. A URL pointing to the Web page named in the item <title>. The maximum length is 500 characters.
• description. A phrase that describes the item. The maximum length is 500 characters.
Finally, a channel may contain one or more images. The images contain the following subelements, which enable aggregrators to locate and use images within the channel:
• title. The title of the image. The title is how people identify image. The maxi-mum length is 100 characters. Required.
• url. A URL pointing to the image named in the <title> element. The maximum length is 500 characters. Required.
• link. A URL pointing to the site where the image named in the <title> element can be found. In practice, this should be the same as the URL of the channel. The maximum length is 500 characters. Required.
• description. A phrase that describes the image. The maximum length is 500 char-acters. Optional.
• height. Indicates the height of the image in pixels. The maximum value is 400; the default value is 31. Optional.
• width. Indicates the width of the image in pixels. The maximum value is 144; the default value is 88. Optional.
Creating Your Own RSS File
One of the easiest ways to create an RSS file for your Web content is to look at an exam-ple and modify it to fit your needs. Therefore, let’s look at Listing 13.7.
LISTING 13.7 A Simple RSS File
<?xml version=”1.0” encoding=”ISO-8859-1” ?> <rss version=”0.91”>
<link>http://idealliance.org</link> <description>XML Resources, XML Conferences,
➥ XML Tutorials, User-Driven XML Standards, ➥ XML Files Newsletter, XML Users Association ➥ </description>
<copyright>Copyright 2001, idealliance.org.</copyright>
<title>IDEAlliance Logo</title> <url>http://idealliance.org/images/idealogo.gif</url> <link>http://idealliance.org</link>
<description>Logo for IDEAlliance</description> </image>
<title>XML Files: Monthly Newsletter</title> <link>http://www.idealliance.org/whats_xml/whats_xml_xmlfiles.htm/</link> <description>Monthly XML Newsletter. Highlights
➥ W3C standards development for the month, ➥ XML-related events, XML Book Review
➥ </description> </item> <item>
<title>XML Roadmap</title> <link>http://www.idealliance.org/whats_xml/xmlroadmap/TOC/toc.htm</link> <description>A roadmap to all XML related
➥ standards and vocabularies, completely ➥ indexed and hyperlinked.
This RSS file is an example of RSS 0.91. It describes some content on the IDEAlliance.org Web site. One image and two items have been included in the IDEAlliance RSS channel. The first item makes the XML Files monthly newsletter avail-able for syndication. The second item makes the XML Roadmap available for syndica-tion. Of course, you may add as many items and images as you want when you modify this RSS file for your own uses.
Publishing Your RSS File
When you have created your own RSS file, put it somewhere on your Web server. Remember that the value of your RSS file is only as good at the information in the file itself. This means that you should update your RSS file every time you change the con-tent on your Web site or when your Web site layout changes. If the RSS file is outdated, it is of little value. Once you have created a baseline RSS file for your Web site, you may want to consider writing scripts that will “read” your Web site and automatically update fields within your RSS file.
Registering Your RSS File with RSS Aggregators
You have now created an RSS file and placed it on your site. How can you let others know that you are making content available to them? Well, of course, you can notify oth-ers using e-mail and listservs. However, the best approach is to register with one of the services that posts RSS directories.
Each RSS directory has a slightly different method for registering. Some are automated, and others are not. The major RSS directories include (in alphabetical order) http:// www.MoreOver.com, http://dmoz.org/Netscape/My_Netscape_Network, http:// My.UserLand.com, and http://www.xmlTree.com.
Registering with MoreOver.com
MoreOver.com offers a wide array of possibilities for content syndication. You can add news channels to your own sites by stepping through a wizard on the MoreOver Web site. You just have to select channels, specify their visual appearance, and the code will be mailed to you for inclusion on your Web site.
Getting your content listed with MoreOver is time consuming because it does not have an automated process. To register content, just send an e-mail to newssource@moreover. com that includes a pointer to your RSS file. MoreOver evaluates each listing personally. Your addition to MoreOver may take as long as three months, so be patient.
Registering with My Netscape
My Netscape publishes a huge collection of channels from organizations and individuals. Examples of channels offered through My Netscape include the Weather Channel and Nasdaq. My Netscape offers no support for publishing its channels anywhere else than my.netscape.com.
In order to get your channel included in the listings at my.netscape.com, you must first register with Netscape’s Netcenter at http://www.netscape.com. Only registered mem-bers can submit a channel. Also, each registered member can submit only one RSS file of 8KB or less in size. You must have a valid e-mail address associated with your member-ship in order to register your RSS channel.
Registering with UserLand
UserLand also enables users to submit their RSS channels. UserLand divides between frontend and backend: the Web interface for reading news is the frontend, whereas the backend offers the same content in various formats, over different protocols. For exam-ple, content may be XML offered over SOAP.
UserLand uses an aggregator tool to update its RSS listings. To list your channel, you must first go to http://aggregator.userland.com and register. The UserLand aggrega-tor reads all the registered XML files every hour and picks up all new items. It flows the items out to the affiliate sites using XML-RPC.
Publishing RSS Content from Others on Your Site
Now that you have made your content available to others using RSS, you may want to add content from the outside to your own Web site.
My Netscape will be of limited use here. The channels on My Netscape are designed for use on your own personalized interface at http://my.netscape.com. There, you can build and customize your own page. But that is really the extent of this use of RSS.
Options for including content from UserLand are much more viable. Here, you will want to go to http://backend.userland.com. Backend is an open technology that enables you to build your own applications based on its content flow. Most content is archived in XML form and is publicly accessible through HTTP.
MoreOver.com currently has over 250 publicly available free news categories. The headlines of these free categories can be read at http://www.MoreOver.com. MoreOver harvests news headline links from 1,500 online news sources and uses both human-and computer-editing to produce the newsfeeds in various formats, such as Java-Script and XML.