The Personal Adaptive Web Sentinel, PAWS, provides a personal Web library of information, automatically updated, filled with new information, and purged of old information. Based on the users preferences, information to the library is automatically collected from different index services, like Alta Vista, Yahoo, Aliweb, Lycos(1), WAIS, Harvest, NetFind etc. The library is then continuously updated and cleaned in response to the users feedback on the collected information.
PAWS will be used as a base for further specialised information services utilizing agent to agent communication implementing such features as group annotations and profile subscriptions enabling a host of applications
Software Agents, WWW, Genetic Algorithms, Search engines/indexes, Oz, KQML, Tk, Adaptive Filtering.
Since the introduction of World Wide Web, WWW, in 1991 the Web has grown with a tremendous speed. In a press release in October 1995, Lycos [1] claims that current calculations put the number of public Uniform Resource Locators, URL's, at 11,745,521 hosted around the world on 103,059 Web servers.To be able to tap this immense amount of data we need new tools to help with navigation and filtering.
Etzioni and Weld [2] writes:
Forecast: In the next two years, a new breed of indexing agents will emerge. These agents, which we might call automated information brokers, will choose among existing indexes on the basis of factors such as cost of access, coverage and speed. Information brokers will prune the results returned by indexing agents, using clues about the contents of each page.
The Web can be seen as a huge virtual library where information is constantly added, updated and removed. It would be desirable to see a part of this vast information space, structured according to personal preferences and automatically updated and maintained. A possible way of doing this would be to extend existing search indexes with mechanisms to continuously explore the search space and adjust according to variations of the user's preferences. To achieve this we have implemented the required adaptivity in a separate system that uses existing index services. This system use information provided by the user regarding areas of interest, referred to as profiles in this paper, to search for and recommend Web pages.
The full potential of PAWS can only be tapped when agent to agent communication is added. This will include functionality like subscription of other users public libraries and group annotations. The architecture also allows for personalized visualization tools.
We will first give an overview of the system, its interface and adaptivity. The rest of the paper covers agent collaboration, impact on index server providers, and related work.
PAWS builds a personal library of information from the Web by using the functionality of already existing index servers combined with adaptive retrievers and filters to provide the user with automated profile based retrieval of new information.
The system is implemented in a mixture of Oz [3] and Perl [4]. Oz was chosen for a number of reasons:
Perl was chosen because of its extensive and proven libraries for WWW-communication which makes it very easy to implement the Index Access Agents and the Link Maintenance Module
PAWS is designed to be part of a multi agent system, influenced by the agent architecture proposed by the Knowledge Sharing Effort [5], using KQML [6] for agent communication.
The system consists of one or more Service Brokers, a PAWS agent for each user and numerous Index Access Agents. PAWS can use Service Brokers, Index Access Agents, and communicate with other agents anywhere on the Internet provided that these agents use KQML.
PAWS itself is a set of small communicating modules, where the kernel is responsible for controlling the other modules and for the communication with external agents.
The Keyword Search Module and Filter Module works closely together to adapt the content of the library to the user's preferences. An Index Access Agent is a simple interface agent that allows the Keyword Search Module to pose queries to the index service and translate the result to a format that PAWS can understand. This information is pruned by the Filter Module to remove Web pages already seen by the system and filter out pages based on content. These modules fill the library with information. The Link Maintenance Module is responsible for keeping the library consistent. It removes outdated and broken links, notifies the user when a Web page in the library has changed and when possible, updates references to moved pages.
The user interacts with PAWS through a graphical user interface. This interface allows the user to train the profiles, give feedback while viewing recommended Web pages, and edit the profiles. It also enables the user to change library parameters and to request information about the agents behaviour.
PAWS is based on Internet technology in general and WWW technology in particular. It was therefore an obvious decision to implement a Web interface as was the choice to separate PAWS from any specific browser. As no agent enabled WWW-browser exists we could not integrate the entire interface within the browser and had to move parts of the functionality to a Tk [7] window. The library, status of the profiles and recommended pages are presented in the Web interface and the profiles are trained and edited using the Tk window.
PAWS is initiated by defining one or more profiles describing the library. Profiles are activated by either Web pages or keywords provided by the user. This kind of training as well as feedback on recommended pages are necessary to keep the profiles and library active and adaptive.
PAWS can be trained while the user is browsing for information on the Web. Below is an example of how the profile editing tool might look during a session.
The agent keeps track of the latest Web page being accessed and this URL is displayed in the Current URL field. When the page is an example of information that a profile should retrieve, the user selects that profile from the Profiles menu which updates the Profile field. The Add URL button then adds the current URL to the current profile.
Keywords can be added to the profile in one of the categories supported by PAWS:
Under the Edit menu there are tools for editing of profiles and parameters controlling the profile. This include selection of which Index Access Agents to use, limits of unseen recommendations and the size of the library.
Recommended pages are accessed through the Web interface. Below is an example of what the page of recommended pages could look like. It depicts a page that has not been given any feedback from the user
By giving feedback the user trains the profile and keeps it active. PAWS use feedback of the form:
This relieves the user from rating all documents, just marking extreme cases, really good or really bad recommendations. The feedback is forwarded to the profile that recommended the page as training data for the next update.
The same page when the user have given positive feedback.
It is important to see why a specific Web page was recommended. PAWS implements functionality to explains its behaviour to the user.
PAWS use genetic algorithms [8] to implement adaptivity in the Keyword Search and Filter Modules. For the Keyword Search Module, sets of keywords are grouped into queries and these are treated as the gene pool for each profile. The gene pools, one for each profile, used by the genetic algorithm in the Filter Module consist of structured representations [9] of Web pages that the user has recommended and given feedback on. The Filter Module use essentially the same filter techniques as Newt [10], which is a netnews filter agent that use adaptive keyword based filters implemented with genetic algorithms. We can't use the same approach because netnews gets delivered continuously which provides the filters with new information while PAWS relies on information retrieved from index servers. To feed the Filter Module with new information the Keyword Search Module must be adaptive. If it was static the queries would soon deplete the index server of new relevant information.
The picture below presents the major events in the retrieval, filter and feedback cycle.
To allow the user to control the rate of adaptivity we parameterise certain aspects of the feedback. Genes that bring the user an excellent page are rewarded by getting a higher fitness value. Fit genes get to reproduce and are thus favoured by the genetic algorithm. Genes that recommend irrelevant pages to the user however get their fitness values decreased.
In the Keyword Search Module, the fitness value of genes are not only adjusted by feedback from the user, but also by finding pages already rated by the user. By using a parameter to modify the feedback values when the gene finds already rated pages, we can control the keyword search adaptivity. If the parameter value is high, the gene is rewarded for finding already seen Web pages. If on the other hand the value is low, the gene is encouraged to find new Web pages.
By introducing aged fitness, a second parameter to control the adaptivity is added. Both fitness values are given the same feedback but the aged fitness is reduced over time. Fast ageing rewards genes that has found new information lately. Both fitness values are used when creating a new generation of genes. This will reward both overall performance and those genes that have performed best lately.
All parameters are local to the profile to enable the profiles to have different behaviour. By changing the parameters during the lifetime of the profile it is possible to change the behaviour of the profile. This can for example be used to freeze a profile that the user is satisfied with or to thaw a profile that is too specialised.
For issues of resource use, the number of queries issued to the index servers must be minimized. To achieve this and still maintain a high level of adaptivity, a kind of time-sharing is implemented. By weighted random sampling of the gene pools we can control the probability of any given gene being selected for use. This is done based on two criteria, fitness of the individual gene and the fitness of its profile. By adjusting these weights the user can control another aspect of the systems adaptivity.
PAWS as presented above is a personal library manager. By enabling PAWS to communicate with its peers it will no longer be restricted to just manage a personal library. Subscribing to other users public profiles, viewing the public part of other libraries, group annotations etc. are powerful building blocks for new information services.
Profiles can be either private, public or given access privileges on a user by user basis. By allowing other users to subscribe to profiles PAWS provides the WWW version of moderated newsgroups. If Tom has a great profile for SDH and is willing to share, others interested in SDH can benefit from his profile. If Sue subscribes to one of Tom's profiles, Tom's agent will send any reference that it presents to Tom to Sue's agent (or just those that Tom actually views and doesn't give negative feedback on).
For shared annotations PAWS could co-exist with Standfords ComMentor [11] system but we propose a different sharing mechanism analog to the sharing mechanism for profiles. Each user own and administer their own annotation database. Analog to profiles, annotations can be private, public or given access to on a user by user basis. Annotations are tagged with the profile from which the annotated document comes from (if any). These annotations could then be viewed, commented upon etc. by other users. This would enable queries of the form "Let me view all the Web pages regarding ATM that Sue recommends". A slight extension to the annotation mechanism would give us SOAPs (Seals of Approval).
It is possible to add aspects of social filtering by comparing subscriptions. If Sue and Tom both share a lot of subscriptions (i.e. both subscribe to the same profiles) Sue could be notified if Tom subscribes to a new profile.
A property of genetic algorithms is that the gene pool has to be large to make the adaption efficient. As genes represent queries, not restricting PAWS with respect to the number of profiles and queries would make the load on both index servers and the net as a whole unacceptable.
Possible solutions:
We propose a solution that will reduce the number of queries issued and consult the index servers less frequently (see last paragraph chapter 4). Restricting the size of parts of the library will reduce the number of queries. Furthermore, by setting a limit on the total number of queries issued each day, the users browsing activity will influence the number of queries sent to the index servers. If a user is inactive, the library will soon be filled reducing the number of queries. An active user will of course keep the profiles busy and thus hit the limit. Subscription to other users profiles will also reduce the total number of queries sent for a population of users.
PAWS is tested inside our own network using our own index servers and will not use resources on Internet until acceptable levels of resource use are achieved. PAWS will follow the robot exclusion protocol [12] thus not consulting index servers barring robots.
By imposing set limits on the number of queries to any given index server each day and by following the robot exclusion protocol we hope that PAWS is accepted by both the net community and index server providers.
PAWS can easily replace a bookmark list. The ability to add and save interesting Web pages is maintained but the functionality is significantly enhanced since the agent will search for, recommend, and add similar information. Like newer versions of bookmark lists PAWS will also maintain the information so that outdated and broken links are removed, and updated information is highlighted. However, as mentioned in the chapter on agent-to-agent interaction, we see PAWS also as a base for building more specialized information services.
Some index servers finance their service by advertisements and would probably not be interested in agents that strip away the advertisements. Is it possible that these index servers would allow agents that include the advertisements in their search result? In such a case it could be worthwhile to extend the protocol between the Index Access Agents and PAWS to include advertisements and to extend the Presentation Module to display them. This problem is general for all sponsored information providers and has already been noted by E. Selberg and O. Etziano in their paper on the MetaCrawler [13].
We have collected the parts of PAWS that interact with the Index Access Agents and presents the results into a separate tool, similar, if simpler, to MetaCrawler.
It would have simplified our design had we been able to use MetaCrawler as the base for the entire keyword based search. That would however prevent access to local indexes, hide the agents identity from the index server providers and swamp the MetaCrawler with queries.
We are experimenting with parameters to control how often the library, or a part of the library is updated and also parameters to shut down the library, or parts of it. It may be wise to allow the user to reduce the update-interval when going away for longer periods of time. Parts of the library serving as data for monitoring a specific area could also be turned off, only using the verification and update facilities, PAWS would then act much like a smart bookmarks-list.
The size of the library is also open for editing, how many documents or rather how many links to documents the library should contain. This can be done for several reasons, to keep the memory/disk usage at a specified level, to keep the library at a manageable size, etc.
Another area to look into would be the presentation of the library. Right now it is presented in HTML in a Web browser, it should be possible to view the library graphically. Providing a graphical viewer [14] for the library would allow a personalised view, where it would be possible to choose the representation of the library as well as the representation of a Web Page.
Work on agents and tools to provide assistance while browsing the Web and finding information on the Web has really taken off the last year. We see search-indexes, based on different information gathering strategies ranging from Web-robots to information volunteered by users.
Webdoggie [15] and SILK [16] are examples of agents and tools for searching for and recommending potentially interesting information from the Web. Both of these resembles PAWS in some ways, SILK for searching the Web and Webdoggie for recommending information. However none of these maintains a library in the same way as PAWS.
WebWatcher [17] and Letizia [18] are agents that will assist you while browsing. Both of these agents will track and observe a users behaviour and reactions as the user traverse the Web in search of information. These agents will based on the collected data, explore and examine links ahead of the user and recommend the most promising ones. This is a different tool than PAWS which explores the Web not from the page browsed but rather from collections of already indexed Web pages.
Softbot [19] use a model of what resource are available on the network an how to access them. The agent will then use the appropriate resources on the network to perform a given task. ILA [20] is an agent that will automatically learn about new information sources on the network. ILA will also learn how to interact with and use these information sources.
ComMentor provides a platform for sharing structured in-place annotations. This allows for implementation of services like collaborative filtering, seals of approval, tours, etc. These annotations are stored in annotation sets on annotation servers and access to these sets are set up by a group administrator. PAWS differs in that a user's annotations are handled by the user's agent thereby giving the user full control of access privileges.