RSS Content Syndication
RSS is a number of things to a number of different communities. RSS is
an XML vocabulary for describing a Web site that happens to be ideal for
lightweight content syndication. Today, RSS is one of the most widely used Web
site XML applications.
Its popularity and wide use has uncovered utility in many more scenarios
than originally was anticipated by its creators. Therefore, RSS can also be
thought of as a portal content language, as a metadata syndication framework,
and even as a content syndication system.
You can see the model for RSS in Figure 13.5. Content providers embed
RSS into their HTML pages. These pointers are aggregated and then made
available to a larger audience through the aggregator portal.
History of RSS
RSS was originally introduced in 1999 by Netscape as a channel
description frame-work for its My Netscape Network (MNN) portal (http://dmoz.org/Netscape/ My_Netscape_Network/). RSS is simply an XML
application that provides a novel con-tent-gathering mechanism that’s
beneficial to Netscape, those providing content, and those using the content on
the Web. RSS enables content gathering by providing a simple “snapshot in a
document” for Web sites. This document enables Web sites to acquire an audience
through the presence of their content on the My Netscape portal. Also, RSS
gives users a centralized location into which content from their favorite Web
sites flow to enable a one-stop reading experience.
As a result of My Netscape Network, users soon found that RSS could be
used as an XML-based lightweight syndication format for headlines. Using RSS,
headlines could be taken outside the My Netscape Network site and used in other
RSS-based portals. Examples such as xmlTree (http://www.xmltree.com) began to cater to general subject markets and to specialized vertical
markets as well. RSS gained grassroots acceptance and quickly became a viable
option to ad-hoc syndication systems being developed by commercial interests.
RSS adoption has flourished because it provides for simple syndi-cation without
unnecessary complexity or bulk. Today, RSS feeds carry various content types to
thousands of Web sites, including CNET, CNN, Disney, Forbes, Motley Fool,
Wired, Red Herring, Salon, Slashdot, and ZDNet.
In order for RSS to work, a mechanism for finding RSS feeds was needed.
One solution is the RSS registry. The first step toward establishing an RSS
registry was Internet Alchemy’s OCS format. This format provides a way of
listing RSS channels that have been made available on a Web site. As the number
of RSS feeds grew, the next step was the establishment of registries. XmlTree (http://www.xmltree.com) is a registry that provides a
facility for RSS content to be registered and classified for end use. UserLand
(http://my.userland.com) provides a registry facility as well.
RSS Shift Toward Syndication
If My Netscape Network was the first RSS portal, UserLand was the first RSS aggregator.
The main difference between My Netscape Network and UserLand is archiving. My Netscape Network displays
only the latest version of RSS channel feeds. UserLand archives snapshots of
content on a hourly basis. The revolutionary advance that aggregators brought
was the ability to decouple items from the parent channels. This means that RSS
can be presented as the intersection of simultaneous feeds from disparate
sources to focus on timeliness, not on the channel. Meerkat (http://www.oreillynet. com/meerkat), an open wire service, presents items in reverse chronological order,
but also allows for filtering,
grouping, sharing, and searching.
The real shift of RSS toward syndication began when RSS 0.91 was
released. In this ver-sion, RSS dropped RDF and became a simple XML vocabulary.
RSS 0.91 added new item-level <description> tags that enabled RSS to clearly move into content syndica-tion. The
description field had a 500-character constraint. This enabled RSS to carry
more than a headline but still limited its ability to carry heavyweight
As use of RSS increased, the user audience began to voice a need for
enhancements. The item-level title and description elements were being
overloaded with metadata and HTML, as some tried to use RSS for more than what
it was intended. Some people began to insert unofficial ad-hoc elements to
augment the metadata facilities within RSS 0.91. Therefore, we see the use of
elements such as <category>, <date>, and <author>. The evolution of RSS seemed to be inevitable. RSS needed a richer
metadata framework and a way to become extensible. But it also needed to be
backward compatible so that the entrenched user base could continue to work
with RSS. The issue was how to make this happen in a unified fashion.
It turns out that a new group, RSS-DEV, began to work on a new version
of RSS that met its requirements. This version of RSS moved ahead to include
namespaces and bring RSS back to RDF for metadata specification. RSS-DEV
released RSS 1.0 in December of 2000.
The original version of RSS (RSS 0.9+) is currently being maintained and
advanced by the open-source community working with UserLand. One of the goals
of the RSS 0.9+ group is to advance RSS capabilities while maintaining its
simplicity. According to Dave Winer of UserLand, “Today, RSS is simple, largely because it only
builds on XML 1.0 and does not use namespaces or schemas, and it isn’t a
dialect of RDF. There’s a logical route forward for RSS that says it should
adapt to include all these concepts, but in doing so it would become vastly
more complex, and, at the content provider level, would buy us almost nothing
for the added complexity.”
This leaves us with a lack of clarity about what RSS is and which
version of RSS we should use. The reality is that some sites have a preference
for one RSS version over the other. Other sites support both versions of RSS.
This is not too much different from the browser wars between Netscape and
Microsoft—and the implications for those trying to use the “standard” are much
the same! There has been talk of giving new names for each different flavor of
RSS, retaining RSS for 0.9+ and earlier, and giving RSS 1.0 a new name. To
date, there has not even been consensus among the communities on the name, so
for the moment, everyone continues to use “RSS” for both flavors of RSS.
Three easy steps are required to use RSS on your Web site:
Create and maintain RSS files for
your Web site.
Register your RSS files with an
Publish relevant RSS content from
others on your site.
You’ll learn more about using RSS in this section.
Introduction to RSS Elements
Because RSS is an XML vocabulary, it follows the XML well-formedness
rule that all RSS elements must nest inside one root element. For RSS, that
element is <rss>. RSS has a single, required child element, <channel>. See Listing 13.6 for the XML
element declaration for RSS.
LISTING 13.6 Root Element Declaration in
RSS .91 DTD
<!ELEMENT rss (channel)> <!ATTLIST rss
version CDATA #REQUIRED><!--version must
be filled in
RSS is made up a rather simple set of elements and subelements. The
basic layout of the RSS file is as follows:
RSS root element
Image listings (optional, you can
Item listings (one or more)
The channel element is made up of a number of channel metadata fields.
In RSS .91, these fields are predefined, and hence not extensible. Some fields
within <channel> are optional and others are required. Here’s a list of these fields:
• title. The title of the RSS channel. The title is how people identify your
service. The title
of your channel should be the same as the title of your HTML Web site. The
maximum length is 100 characters. This field is required.
• link. A URL pointing to the Web site named in the <title> element. The maxi-mum length is
500 characters. This field is required.
• description. A phrase that describes your channel—your channel’s positioning statement. The maximum length is
500 characters. This field is required.
• language. Indicates the content language of the channel. This is intended to
allow aggregators to group all Spanish
language sites, for example, on a single page. This field is required
(enumerated value selection in RSS specification).
• copyright. The copyright notice for content. The maximum length is 100. This field is optional.
• managingEditor. The e-mail address of the managing editor of the channel. The maximum length is 100. This field
• webmaster. The e-mail address of the Webmaster of the channel. The maximum length is 100. This field is
• rating. The PICS rating for the channel. The maximum length is 500. This field
• pubDate. The publication date of the channel. It must conform to the date/time standard (RFC 822). This field is
• lastBuildDate. The last time the content of the channel was updated (RFC 822). This field is optional.
• docs. The URL for the documentation for the coding of the RSS site. This
field is optional.
• textInput. Contains the required subelements
<description>, and <language> for each text input field. This field is optional.
• skipDays. Contains any number of <day> subelements, such as <day>Friday</day>, that indicate days on which aggregrators may not read this channel.
• skipHours. Contains any number of <hour> subelements, such as <hour>14 </hour>, that indicate hours in GMT on which aggregrators may not read this channel.
In addition to the elements that give aggregrators information about the
channel, the channel element contains one or more <item> elements. Each <item> element is an item of content,
such as a news story. The <item> element is made up of three required subelements designed to assist
• title. The title of the item. The title is how people identify the content
within the channel.
The maximum length is 100 characters.
• link. A URL pointing to the Web page named in the item <title>. The maximum length is 500 characters.
• description. A phrase that describes the item. The maximum length is 500 characters.
Finally, a channel may contain one or more images. The images contain
the following subelements, which enable aggregrators to locate and use images
within the channel:
• title. The title of the image. The title is how people identify image. The
maxi-mum length is 100 characters. Required.
• url. A URL pointing to the image named in the
<title> element. The maximum length is 500 characters. Required.
• link. A URL pointing to the site where the image named in the <title> element can be found. In practice, this
should be the same as the URL of the channel. The maximum length is 500
• description. A phrase that describes the image. The maximum length is 500
• height. Indicates the height of the image in pixels. The maximum value is 400;
the default value is 31. Optional.
• width. Indicates the width of the image in pixels. The maximum value is 144;
the default value is 88. Optional.
Creating Your Own RSS File
One of the easiest ways to create an RSS file for your Web content is to
look at an exam-ple and modify it to fit your needs. Therefore, let’s look at
LISTING 13.7 A Simple RSS File
<?xml version=”1.0” encoding=”ISO-8859-1” ?> <rss version=”0.91”>
<description>XML Resources, XML Conferences,
➥ XML Tutorials, User-Driven XML Standards, ➥ XML Files Newsletter, XML Users Association ➥ </description>
<title>XML Files: Monthly
<description>Monthly XML Newsletter. Highlights
➥ W3C standards development for
the month, ➥ XML-related events, XML Book
<description>A roadmap to all XML related
➥ standards and vocabularies, completely ➥ indexed and hyperlinked.
This RSS file is an example of RSS 0.91. It describes some content on
the IDEAlliance.org Web site. One image and two items have been included in the
IDEAlliance RSS channel. The first item makes the XML Files monthly newsletter
avail-able for syndication. The second item makes the XML Roadmap available for
syndica-tion. Of course, you may add as many items and images as you want when
you modify this RSS file for your own uses.
Publishing Your RSS File
When you have created your own RSS file, put it somewhere on your Web
server. Remember that the value of your RSS file is only as good at the
information in the file itself. This means that you should update your RSS file
every time you change the con-tent on your Web site or when your Web site
layout changes. If the RSS file is outdated, it is of little value. Once you
have created a baseline RSS file for your Web site, you may want to consider
writing scripts that will “read” your Web site and automatically update fields
within your RSS file.
Registering Your RSS File with RSS Aggregators
You have now created an RSS file and placed it on your site. How can you
let others know that you are making content available to them? Well, of course,
you can notify oth-ers using e-mail and listservs. However, the best approach
is to register with one of the services that posts RSS directories.
Each RSS directory has a slightly different method for registering. Some
are automated, and others are not. The major RSS directories include (in
alphabetical order) http:// www.MoreOver.com, http://dmoz.org/Netscape/My_Netscape_Network, http:// My.UserLand.com, and http://www.xmlTree.com.
Registering with MoreOver.com
MoreOver.com offers a wide array of possibilities for content
syndication. You can add news channels to your own sites by stepping through a
wizard on the MoreOver Web site. You just have to select channels, specify
their visual appearance, and the code will be mailed to you for inclusion on
your Web site.
Getting your content listed with MoreOver is time consuming because it
does not have an automated process. To register content, just send an e-mail to
newssource@moreover. com that includes a pointer to your
RSS file. MoreOver evaluates each listing personally. Your addition to MoreOver may
take as long as three months, so be patient.
Registering with My Netscape
My Netscape publishes a huge collection of channels from organizations
and individuals. Examples of channels offered through My Netscape include the
Weather Channel and Nasdaq. My Netscape offers no support for publishing its
channels anywhere else than my.netscape.com.
In order to get your channel included in the listings at my.netscape.com, you must first register with
Netscape’s Netcenter at http://www.netscape.com. Only registered mem-bers can submit a channel. Also, each registered
member can submit only one RSS file of 8KB or less in size. You must have a
valid e-mail address associated with your member-ship in order to register your
full URL of your RSS 0.91 file, and select an update fre-quency for your
channel (the interval at which you would like Netscape to retrieve your RSS
file). When My Netscape retrieves your RSS file, it will send you an e-mail to
let you know that you are now listed. It will also provide you with an “add
this site” button for your site that enables others to add your content to
Registering with UserLand
UserLand also enables users to submit their RSS channels. UserLand
divides between frontend and backend: the Web interface for reading news is the
frontend, whereas the backend offers the same content in various formats, over
different protocols. For exam-ple, content may be XML offered over SOAP.
UserLand uses an aggregator tool to update its RSS listings. To list
your channel, you must first go to http://aggregator.userland.com and register. The UserLand aggrega-tor reads all the registered XML
files every hour and picks up all new items. It flows the items out to the
affiliate sites using XML-RPC.
Publishing RSS Content from Others on Your Site
Now that you have made your content available to others using RSS, you
may want to add content from the outside to your own Web site.
My Netscape will be of limited use here. The channels on My Netscape are
designed for use on your own personalized interface at http://my.netscape.com. There, you can build and
customize your own page. But that is really the extent of this use of RSS.
Options for including content from UserLand are much more viable. Here,
you will want to go to http://backend.userland.com. Backend is an open technology that enables you to build your own
applications based on its content flow. Most content is archived in XML form
and is publicly accessible through HTTP.
MoreOver.com currently has over 250 publicly available free news categories. The headlines of these free categories can be read at http://www.MoreOver.com. MoreOver harvests news headline links from 1,500 online news sources and uses both human-and computer-editing to produce the newsfeeds in various formats, such as Java-Script and XML.