Thoughts on an Information Utility

Thoughts on an Information Utility

James H. Morris

Information Technology Center

Carnegie Mellon University

June 24, 1986

This essay motivates a proposal that Carnegie Mellon’s Information Technology Center develop a prototype information utility for universities and the industrial research and development community. What follows are some ideas for things to do and some arguments why such activities are important and appropriate.


Background

In the early part of this decade Carnegie Mellon embarked on an aggressive program to incorporate and exploit computing technology in its educational and research programs. A major part of this program is the Information Technology Center, a joint project with IBM aimed at building the technical underpinnings of a campus computing system. With a highly qualified and motivated technical staff numbering about twenty-five the ITC has created the Andrew system, consisting of a network, a file system, and workstation applications, all integrated to support the creation of new educational software and communications services suitable for a university.1

Andrew has been deployed for use at Carnegie Mellon and distributed to the general university community for review. It is hoped that it becomes a standard for future university computing systems.

Carnegie Mellon plans to use the Andrew system as its basic computing service beginning in 1987. At that point it would be counterproductive for the ITC, a research-oriented group, to continue upgrading the basic system. Therefore, it is important to choose new technical challenges that exploit the talents of the group and serve the goals of the university, the country, and the general computer industry. Creation of an information utility and its supporting technology is one such challenge. This activity wouldn’t represent the entire charter of the ITC, but it is the part that requires some concentrated, innovative thought.

What is an Information Utility?

Many things can be done with computer systems to provide people with better information and communications. The following list of services is just a sample of the kinds of things we could create. Each of the following projects requires some software development, a small staff of professional “information managers” to administer it on a continuing basis, and a large amount of experimentation with a good user community. While the software requires significant talent to write, our ability to create it should not be in doubt. The general strategy is to try new facilities on the Carnegie Mellon community and export the successful ones via a national network. The common goal of creating the following new services is to create a useful source of information for thousands of people in the university and research and development community. A success measure for this effort is the answer to “What percentage of the information you receive each day comes from your computer?” A good model for many of these services is the newspaper or magazine, and we should understand how they work. Their various departments—news, want ads, letters to the editor, market data—can have interesting, and much more powerful, electronic analogs. Furthermore, many of their visual techniques for presenting information have stood the test of time; we can emulate some of them with current display and printing technology. Finally, in terms of the foregoing success measure, print publishers can be both competitors and collaborators.

News and Information

To start, we should get access to every conceivable information source for users: AP wire, Dow Jones, Encyclopedia, Lockheed’s Dialog, CompuServe, The Source, OCLC, USENet News, ARPA news groups. However, this alone will not make a big change in the way people do business. Each of these information sources can be hard to use and their diversity is bewildering. Intelligent selection, distribution, and presentation of information is necessary—that’s what newspapers do. The same things need to be done for computer information services. Some person (or program) called an editor must make judgments about what will interest the readership of a particular publication. Given suitable search facilities and expertise, one can package news along with its background material. Thus a particular news story can lead to many previous stories, government reports, scientific articles, etc. If one has the choice of reading a magazine or seeing its stories scrolling by on a twenty-four by eighty character screen, one will choose the magazine unless the information has great time value. A conventional computer screen is too difficult to read and does not permit browsing. Large bitmapped displays with good tools for managing the material can redress this balance. Efforts should be made to use all the tools at our disposal to make information easier to understand. High quality printing and fonts, graphics, color, video, sound, etc. are important components of this.

Electronic Forums

The use of electronic mail, bulletin boards, and forums is a realization of Marshall McLuhan’s “Global Village” notion beyond what he might have expected. These systems offer truly open and interactive communication for large communities. From a university’s viewpoint they permit an incredibly enhanced version of the casual dormitory discussion. The discussion can be entirely random, people are free to wander in and out contributing as they wish. What is new is that it can involve hundreds of speakers and listeners and go on for many days!

One can easily imagine that a high proportion of “classroom” discussion will be carried out through electronic mail. The teacher and students will belong to a message group that exchanges messages on chosen topics throughout the semester. In many ways this is a more reasonable medium for thoughtful discussions than face to face group meetings. People have more time to think about what they are saying, indeed they are forced to when they must type it. The teacher who wishes to base grading on classroom discussion will have a written record of who actually said what.

There are a number of technical and organizational improvements to make in the current systems which have grown in an ad hoc fashion. Moderators and digest creators are essential to reduce the chaos and random quality of the information. Methods of controlling access for purposes of authentication, confidentiality, and revenue production are needed. Ways of easily creating new forums are needed.

Want ads, Brokering, and Advertising

Electronic bulletin boards are filled with want ads of various kinds, sometimes to the consternation of their sponsors. Want ads on bulletin boards are attractive because they are immediate and widespread, but terrible in that they clutter up everyone’s general bulletin boards and often don’t reach the right people. We need a special purpose system that will perform the matching functions between buyers and sellers. This can begin very simply as a structured database and become increasingly more sophisticated by allowing negotiation, bidding, and settlements.

Although it seems alien to computer systems, general advertising should not be ignored as a way to get information services paid for. If the advertisers can be guaranteed that their message gets carried along with all copies of information or software they sponsor, the nettlesome copy protection problem evaporates because they will want everyone to make copies of the material. The most obvious commodities to sell over a computer network are ones that can be delivered via that network: information and software. Methods of displaying samples, receiving payment, and preventing retransmission or copying are needed.

Surveys and Group Decision-Making

Unlike broadcast media, computer information services can be used to gather information as well as dispense it. Surveys and polls are easier to administer. Masuda2 has written extensively about the possibilities for more representative governmental decision-making. Consider a department that has a meeting every semester to discuss the progress of graduate students. This might involve getting about thirty faculty members in a room to summarize the progress of 120 students. This can be unwieldy, and the obvious way to save time is to delegate decision-making power. However, it is well known that standards and policies will become inconsistent and probably lower if that is done. Why isn’t some sort of central database used to carry out part of such deliberative efforts?

The creation, refereeing, and editing of scientific journals is an appropriate target of opportunity. To some degree it has already started informally. The EXPRES effort to electrify proposal processing is an example of an ambitious effort in this area.3

International Electronic Dead Letter Office

A major problem these days is discovering someone’s electronic mail address. Someone should take global responsibility for getting electronic mail through. Many universities have their local mail trouble shooters—usually a system programmer with another job—who can sort out one’s mail problems. SRI’s Network Information Center has played this role for the ARPANet. One approach to this problem is to get everyone to agree on procedures and rules; that might take a long time. A better approach is to create a central information service that will try to deliver anything sent to it, no matter how poorly addressed, or that alternatively will serve as an electronic information operator. This service should be staffed by people supported by a programmed expert system.

Problem Solving Courses

Occasionally one hears of students who succeed very well because they know how to search the literature. They take every problem to the library and search until they find a good solution. Sometimes they must do some invention, but they always began with what they can find from the past. Some may feel that this is cheating, but there is no doubt that they are going to be effective practicing professionals if they have access to their information base. Computer designers live with their chip catalogues. Similarly, many of the most effective programmers are the ones who make the greatest use of existing libraries of programs. Operating in this way actually requires more experience and talent than traditional programming. Someone needs to explicitly teach the skills one needs to solve problems with the aid of retrieved information. Some of those skills are the librarian’s. Law and medicine also use information services extensively. Some problem domains suitable for “solution by lookup” need to be identified along with experts who do it. Democratically created, intelligently edited databases will greatly amplify a venerable, if dubious, campus tradition: the fraternity files. If, year after year, one captures all the written communication associated with a course—assignments, tests, papers, etc.—the potential exists to change the nature of how people learn the body of material. It will no longer be just the nominal teacher of the course who does the teaching; it will be generations of unseen students transmitting their solutions to problems and views on issues. This potential can best be realized, however, only if this data is suitably culled for what is truly worth studying.

Research and Development Problems

There are some hard problems to be solved in support of the foregoing facilities. They require high-powered academic research. Some are technical problems requiring innovation, others require extensive development and experimentation, yet others will require complete rethinking of the problem.

Heterogeneous Documents

While the fancy document editor is a commercial reality, it has not approached its potential as a communications tool. We must solve several crucial problems:

· Accommodating heterogeneous visual forms—drawings, equations, tables, scanned images, animations, etc.—in an open-ended way because the foregoing “etc.” will never be filled in.

· Presenting a document on heterogeneous media. The differing capabilities of paper, graphics displays, and character displays must all be dealt with. Secondarily, there are significant differences in the computer and software environments (operating systems, document compilers, etc.) from which documents emerge.

· Exchanging editable documents. While reasonable standards for static documents are emerging (e.g. Postscript). There is nothing beyond ASCII for exchanging documents that can be edited or excerpted.

· Incorporating sound and video into editable documents.

Encryption

How can it be used on a regular basis to provide security and authentication? We need to find hardware structures to make it fast and need to experiment on how it fit into the information system so that it can be used effectively. A university campus is a good place to carry our applied research on security techniques since breakdowns in the system, while embarrassing, are not crippling, and needn’t be covered up.

Copy Protection

Software vendors and buyers are continually bothered by this problem. The existence of networks makes it even harder. Can one use a network to provide both convenience for the user and accountability for the information purveyor?

Tamperproof Software

Here is a problem that is impossible to solve by the usual criteria: Create a program that runs on an unprotected, hostile workstation and talks to your machine in such a way that you are confident that you are talking to it, the original program—not a masquerader. For example, the workstation component of a distributed file system might be such a program. A related problem is to create a program that spits out occasional advertisements on users’ screens but resists their attempts to excise them without altering the function of the program. A mathematical way of posing the problem: Devise a method of obfuscating a program so that n hours of obfuscation effort would force an opponent to spend 2n hours of clarifying effort to understand the program.

Transmission Technology

The installed information transmission facilities of the world are going to improve much more slowly than the cost of computing power and memory. How can one get perceived megabit rates out of the current telephone system? Hints: use huge amounts of processing and memory, transmit only changes from previous messages.

Databases & Knowledge Bases

Simplistically, a knowledge base is what you get if you add an expert system to a database. It isn’t good enough to just accumulate information and index it. We need systems that turn it into useful information. Examples: provide the dead letter office described above, automate the Center for Disease Control so that doctors and pharmacists don’t have to read bulletins, create a liver-transplant brokering system that is medically competent and has explicit, auditable policies.

Intelligent Retrieval Systems

As mentioned above, there is bewildering diversity in the information sources available today. One needs an expert to use what exists effectively. A few companies attempt to package retrieval services, but the systems are very crude. Unlike knowledge bases, this system does not focus on a single subject, but attempts to be an expert librarian, covering the breadth of many subjects.

Information Economics

If the system is changing as much as the futurists claim, we will need a whole new theory to explain how things of value are exchanged. Another aspect of this problem can be stated more prosaically: how does one make money in the pure software business? Hint: Current economics are based almost entirely on non-copyable things, including money. The time value of information is obviously key.

Why is this work important?

Progress in the twenty-first century is going to be dominated by information and its use. People will still eat food and drive cars, but the most significant changes in society will center on how they obtain and use information. Just as competitiveness in agriculture came to depend on using industrial goods, competitiveness in industry will depend upon effective information usage. This thesis has been put forward many times by many people.2,4,5,6 They have done some fascinating speculation about the ways in which business and society might change.

As technologists we can certainly see some interesting trends today: The speed and volume of information is increasing drastically. Technologies such as lasers, fiber optics, and satellites are being exploited to produce an information flood. The variety and quality of media are increasing. Newspapers are using color, compact discs are replacing records and tapes, TV is being broadcast with stereo sound, digital TV is coming, etc. Communication is becoming more personalized. The Xerox copier gave everyone the ability to become their own publisher. Electronic mail gives computer users the ability to create their own wire services. Although the computer appeared as new technology at the same time as television, it has had much less direct impact on society. With the proliferation of personal computers, it is possible for computers to play a more explicit role, and we are in a position to shape that role.

Importance to the Nation

Our economic growth is increasingly dependent on information processing; over half the GNP involves information processing. Information processing is applicable to virtually every field and is unique in that it can be used to facilitate its own development. There can be a significant technological multiplier effect on investments in it. A better communications infrastructure is needed. No one doubts that computers can facilitate productivity, but their ability to do so in isolation is limited. In business, it is obvious that linking one’s computer to the company databases is an important first step. However, any study of communication patterns in business will reveal that most communication is not through formal databases, but through informal channels with colleagues, customers, and suppliers. For example, sophisticated production techniques like “just in time” production control require one to have strong communication links with suppliers. To be useful, computers will have to capture a large share of this informal, cross-organizational communication. We need to think in terms of nationwide, heterogeneous communication systems that accommodate diversity in information representation and transmission. The nation’s research and development community can use a service to enhance communication of important results and activities. While the marketplace might create such services for the commercial world, it won’t for specialized communities. The ARPANet has been an indispensable tool for the computer science community; indeed, it helped create it. Its expansion to other research communities such as NSFNet has begun. Any information utility must begin with such a network as its base. A major goal is to increase the pace of technical communication in every scientific subfield.

Unlike conventional research and development, the work to build an information infrastructure will benefit this country directly and will be less exploitable by foreign competitors. A system built into the existing telecommunications and computing culture of the academic and scientific community cannot be easily translated to another place.

Importance to the Computer Industry

Like the early automobile industry, the personal computer industry cannot grow significantly without infrastructure. When automobiles appeared eighty years ago they were like personal computers today: one bragged about buying one but was not sure what to use it for. Before cars became really useful and finally a necessity, a great deal of infrastructure had to be put in place. The most obvious pieces of infrastructure were the road system and the oil industry. As these basic components grew, many items of soft infrastructure came about: the tourist industry, the supermarket, the suburbs, national restaurant chains. These later changes supported a major evolution in the way people live, making cars a necessity. The electrical industry is an even more obvious example of the need for infrastructure preceding the widespread use of appliances. The current market for personal computers is based upon software that enhances the productivity of individuals in isolation, e.g. word-processing, data management, and spreadsheets. There will be more tools like this invented, but their potential to sell computers will not be dramatic.

The market may also expand by making machines easier to use, but this has a slow process. The price will come down, but who wants to give computers away? Among the potential incentives for people to buy computers, better access to important information has yet to make its impact, which may be very large. When people can use computers to find large amounts of information, relevant to their interests, a true mass market for computers will develop. Computers can consume information in great quantities. If one must type in all the information that a computer consumes, one won’t need a very powerful computer. One small computer can serve a family of four quite nicely while they seem to need two cars, two television sets, several radios, and tape devices.

Appropriateness of Time and Place

Granting the basic correctness of the thesis that information services are important, it is fair to ask some hard questions about making a major push towards that future here and now: Is it too early? What has happened lately to change conditions? Is a university like Carnegie Mellon the right place to pursue it? In 1980 no IBM PCs existed; now there are over five million. Approximately six million personal computers of all types were sold in 1985. That is a big change. Nobody knows the absolute number of computers one must have in place for an information revolution to occur, but it is plausible that the recent increase put us there. Furthermore, unless something surprising occurs, there is unlikely to be another such dramatic acceleration in computer ownership soon. A recent comment in the trade press: “There is one microcomputer now for every five white-collar workers. There will be two for every five in five years, but the real impact of the penetration is over.”7 In other words, there are now unexploited opportunities associated with the new, widespread ownership of computers, and there is no point in waiting for another major change. Although teletext businesses directed at localized, general consumer markets seem to fail, national information services aimed at professionals appear to flourish—Dow Jones, Dialog, Mead Data Central, McGraw Hill, CompuServe, The Source, Lotus’s market data—not to mention burgeoning efforts abroad such as the French phone company’s Minitel. Besides leveraging off the installed base of personal computers, these services take advantage of communications services such as Telnet. National bulletin boards are flourishing. The ARPA and UNIX network communities deal with a flood of messages. One finds the computer trade press reporting things heard on CompuServe forums. A recent news story tells how someone used the Source electronic forum to collect material for a book and to enlist the help of hundreds of contributors and reviewers.

As creators and disseminators of knowledge, universities will play a large role in an information society6; it is natural to build prototypes in this environment. Although computer-aided instruction is obviously in a university’s bailiwick, information services can be just as crucial to the educational process. Educational pundit Jacques Barzun once asserted that the most important buildings on a campus were the dormitories, followed by the library, and the lecture halls. This followed from his claims that a student’s peers are the most important teachers and that libraries are a richer source of knowledge for the motivated than the faculty. Whatever the merits of these claims, giving students, faculty, and staff a good computer-mediated communication system and access to the electronic libraries of the future will have a profound effect on their education, construed broadly.

One of the major problems facing any new communications system is finding a critical mass of subscribers with common interests and contexts. Carnegie Mellon is such a community and has a unique base, the Andrew System1, for its communication system. The network, the file system, the mail and bulletin board systems, the workstations, and the printers constitute a state-of-the-art information system that rivals what commercial enterprises aspire to. As the community commits to using Andrew we are likely to witness new phenomena because of the size of the community and the power of the tools. We are likely to see the shape of new problems here before they appear elsewhere and will have to develop solutions quickly.

Conclusions

The time and the place are ripe for a concentrated effort to construct an information utility. Only by trying some directed experiments will we begin to understand the meaning of this vague term. There is real synergy between the activities of creating new information services and computer science research. Carnegie Mellon has a unique opportunity to make a contribution in this area.

References

1. Morris, James H., et al., “Andrew: A Distributed personal computing environment”, Communications of the ACM, March 1986.

2. Masuda, Yoneji, The Information Society as Post-industrial Society, World Future Society, Bethesda, Maryland, 1980.

3. National Science Foundation, “EXPRES Project Solicitation for Research Groups”, May 1986.

4. Toffler, Alvin., The Third Wave, New York, Bantam Books, 1981.

5. Kahn, Robert, personal communication.

6. Bell, Daniel, The Coming of Post-industrial Society; a venture in social forecasting, New York, Basic Books, 1973.

7. Gantz, John, “Andromeda Theory Fits PC Industry Quite Well,”, InfoWorld, 6/2/1986.