A few years ago I wrote an article about the desire to bring talkers out of their strictly console-based world. I’m reproducing it here so it has a permanent home.
Despite the rapidly rising popularity of instant messaging on the Internet, talkers have maintained a loyal following due to the unrivaled sense of presence and community they offer. However, their implementations have remained largely unchanged since their inception, and they have failed to take advantage of the past decade of developments in Internet technologies. This article presents a case for the collaborative development of standard talker protocols.
The state of talker development
Browsing the source code of any current popular talker implementation will reveal signs of a long heritage of modification upon modification. The most popular talkers have long departed from the stock implementations they began with, each adding a rich diversity of new features. More recent talkers such as Amnuts and PG+ are derived from talker code written in 1992. In software development terms, this is a long time. To put it into historical perspective, when Talkserv and Elsewhere were first released, Microsoft had just released Windows 3.1. These talkers were conceptually based on MUDs implemented as far back as 1978.
Talkers are based upon simple client/server architecture. While it is often claimed that they are TELNET servers, in fact most do not adhere to the TELNET protocol, and TELNET clients are just used in their capacity as terminal emulators. Both talker clients and talker servers have limited terminal functionality, and because of this they are limited to line-based input processing. This results in a non-intuitive and often off-putting user interface. For example, most talkers offer the facility to send messages similar to e-mails between users, but there is very little message-editing functionality. While it would be possible to provide a curses-based interface for such operations, the added complexity of doing so has prevent its adoption.
Futhermore, the look and feel of the user interface is defined in the code of a talker server. When establishing a new talker, a sysop first chooses a talker base code to start from. This is usually based on personal preference for style, with the biggest decision being EWToo-style versus NUTS-style. This decision will usually have a large impact which users will use the talker. The sysop must then customise the talker to give it unique characteristics. Some of these customisations simply involve changing text files supplied in the talker distribution package, but most involve changing or adding to the talker’s source code. This means the sysop must be a programmer, or at least have available a programmer who is willing to donate his time. This is accepted practice, but it’s easy to see how absurd it is by imagining having to recompile your web server in order to update your web site! The customisations come in two forms: modifications or additions to the behaviour of the talker, and modifications to the appearance of the talker. The latter, while relatively easy, is tedious and error prone because each talker’s output is intermingled with its control logic, meaning that many disparate functions must be modified in order to create a new unified visual appearance.
Because of the ad-hoc nature of the additions and the lack of separation between logic and presentation, changes are rarely returned to the stock implementation they derived from, and features are often reimplemented afresh in other talkers. This leads to problems later; when the original code base is updated with important fixes the author of the derivative must then decide whether to attempt to isolate and integrate those fixes, or to abandon his code and begin again with the new code base. As a large proportion of the modifications are customisations that provide the derived talker with its uniqueness, this can be a difficult decision to take. It also means that additional code is not reviewed, increasing the risk of introducing security problems.
The problems
The existing problems identified above can be summarised as follows:
- Non-intuitive UI. Because the talker emulates a text terminal, the server—not the clients—defines the UI, and this UI is restrictive.
- Output interleaved with logic. A developer must change many disparate functions to obtain a new unified visual appearance, even though those functions may contain logic which is unchanged. This is tedious, leads to errors, and prevents code merging.
- EWToo versus NUTS. Choosing one style of talker over another restricts the userbase . Others, such as Nilex-style talkers, have even more limited appeal.
- A sysop must be, or must have, a programmer. Very little of the talker can be changed without changing its source code and recompiling it.
- Ad-hoc design. Talker code has evolved over many years, under different developers, with no common design goals.
- Fork and forget. When a new talker is developed the base code is forked and rarely merged. Fixes in the base code are difficult to isolate and integrate.
- No code review. Single developers develop most talker code. This provides little opportunity for them to receive feedback about code quality.
- Features are reimplemented. Because no widely used talker supports the notion of plug-in components , it is difficult for developers to release packaged features.
User agents
When MUDs and talkers were first developed Internet access was uncommon, and mostly limited to academic users with Unix accounts. These users were quite used to text-based interfaces driven by abbreviated command names, but most of today’s talker users are more familiar with GUIs, multimedia, the World Wide Web and instant messaging software. It’s therefore unsurprising that there are a number of MUD and talker clients, such as Pueblo and Z-MUD, that offer enhancements over basic terminal emulation. Talkers can use Pueblo’s protocol, while MUDs have even more extensions such as MUD Sound Protocol, MUD eXtension Protocol and MUD Client Protocol. These protocols differ in their design, but they all have a common goal: to allow the client to provide a richer user experience while maintaining compatibility with non-enhanced clients.
To help illustrate the kind of experience an enhanced user agent might provide, the following scenarios suggest some likely interactions.
Alice is idly chatting on a talker. She sees a message telling her that Bob is requesting a game of Connect 4 with her. She clicks on the message and a small window containing a playing board opens up. She hears the familiar sound as Bob places his first piece, then clicks on another column to make her own move. She then returns to chatting while Bob ponders his next move.
Charlie logs into a talker for the first time in a couple of weeks. He clicks the who’s online? button on the toolbar and looks at the list of users. He doesn’t recognise Dan, so he clicks on the Dan’s profile icon. He sees that Dan is a new user who joined today, so he clicks on his name to open a private chat with him, and welcomes him to the talker.
Emma is chatting in the main room, but is getting annoyed with Fred. She clicks on his name, and chooses ignore from the menu. She no longer sees anything Fred says.
As Gini logs into her usual talker, a message pops up telling her there are new news items. She chooses to read them now, so a message reading window opens with two messages in it. She reads the messages and deletes them. While she has the message reading window open, she looks back at a couple of old talker mail messages and decides to reply to one of them, before closing the window and returning to chatting.
Hayley is chatting in the main room, but also having a private conversation with Ian. It’s busy in the main room, and she keeps missing messages from Ian, so she opens up a conversation with Ian window. Her messages from Ian now appear in there instead of in the main window, so she can keep track of both conversations.
These are the sort of interactions that users are familiar with from using other GUI applications. UI design is complex and, to some extent, subjective, so no restrictions on how such a client should behave are given here. Instead, it is anticipated several clients would be developed independently, catering for people with differing tastes.
A client/server protocol
If, instead of the current terminal emulation approach, talkers and their clients communicated using a domain-specific protocol, a number of possibilities would open up. Most importantly, it would allow for a radically different kind of user agent that would be able to present information in a much clearer way.
It would also allow other software to communicate with the talker. Bots are a common example of software agents that need to do this. Most bots currently parse the human-readable output from a talker and respond with the same commands a user would use. This is not a foolproof strategy, and depends on the talker’s output for a given event not changing as the talker is developed.
Another recent trend is that of embedding other services into the talker. Examples include the HTTP and SMTP servers in Anthony Biacco’s Ncohafmuta talker code. These are only partial implementations of the respective protocols, and are likely to introduce bugs that, if exploited, may crash the entire talker process. A talker protocol would allow for software agents acting as gateways been web servers and mail servers respectively.
Finally, a standardised protocol would allow for talker-to-talker links between talkers using different implementations, so long as each talker adhered to a common standard.
The diagram here shows the structure of the client, gateway and server tiers. Because of the vastly increased flexibility such a protocol would bring, it would be the cornerstone of new talker developments, and careful design would be vital. Any protocol to be considered would need to satisfy certain requirements:
- The session layer must support the transfer of arbitrary data, including binary types. This does not preclude the use of textual data such as XML documents at the presentation layer.
- It must provide an inline mechanism for negotiating encrypted connections so that such connections would not require a separate port.
- It must support authentication methods including, but not limited to, plain passwords and asymmetric keys.
- It must provide a facility for bi-directional delivery of asynchronous events, and for bi-directional request/response pairs. (It would be acceptable for the latter to be implemented in terms of the former.)
In addition, the application layer would need to not only support the features found in a large subset of current talkers, but also be extensible enough to support future features.
Cryptography
A small number of talker developers have expressed a desire to enable end-to-end encryption between their talker and its clients. This relatively straightforward application of cryptography could be implemented without too much difficulty on the server side, using a free Transport Layer Security implementation, and similarly on the client side if clients such as those described above were used.
Authentication
However, once cryptography has been introduced, it opens up a number of interesting possibilities. The first of these is asymmetric key authentication. Asymmetric key algorithms use a pair of keys: one public, and one private. The two are mathematically related, but to derive one from the other is considered computationally infeasible. Such algorithms are now widespread, and used extensively in protocols such as PGP, SSL/TLS and SSH. This authentication scheme has significant security advantages, because the server need only ever know a user’s public key. This public key can be used on every talker the user connects to, and as long as the corresponding private key is never revealed no security is compromised. Typically, private keys themselves are encrypted using a passphrase. It is this passphrase that a user would type when connecting to a talker, and the passphrase never leaves the user’s computer. In fact, a client could be designed so that once the user enters their passphrase to connect to one talker, they don’t need to enter it again until they restart the client, no matter how many talkers they connect to. This is functionality similar to that provided by SSH agents such as ssh-agent and Pageant.
Trust networks
A problem that talker sysops face regularly is that of user identity. If a sysop wishes to ban a malevolent user, there are no real ways to ensure he stays gone. The user may reconnect at any time using a different name and from a different address, or from an address the sysop knows is used by many users (such as that of a shell server). Because of the high value the Internet places on users’ right to anonymity there is unlikely to be a complete solution to this problem, but it can be approached from a different angle. Instead of trying to track users as they change their identity, we can persuade them to use only a single identity.
Suppose that Alice is a user who wants to use a particular talker called Foo Hills. She uses several other talkers, and she knows Bob, who is already a user of Foo Hills. Bob has used the talker for several months, and the talker’s sysop has indicated his trust in Bob by using the talker’s private key to sign Bob’s public key. Bob knows Alice is also trustworthy, so he similarly indicates this by using his own private key to sign her public key. Now Alice becomes a user of the talker and a chain a trust exists from the sysop, to Bob, to Alice.
Now supposed that Carl wanted to join the talker. He has a public key that has been signed by the private key of a small, little-known talker called The Bar. However, no trust relationship exists between Foo Hills and The Bar, so Carl is considered an untrusted user. Depending on the policy chosen by the Foo Hills sysop, he may be denied access, or be allowed to connect as an untrusted user. This concept of trusted and untrusted users could form the basis of what many talkers refer to as citizenship.
Portable objects
Trust networks require that the talker have its own private key, which can be used to sign users’ public keys. One interesting possibility that arises from this is the ability to export signed data from the talker, such that anything else with access to that talker’s public key can assert two things about that data: that it was indeed exported from that talker, and that it hasn’t been modified since it was exported from that talker. Objects (items that users carry, wear, use etc) could be exported in this fashion, and then used in another talker, providing that the importing talker understood the nature of the objects, and had a trust relationship with the exporting talker. The same is true of the ‘currency’ used on talkers, which suggests that ideas regarding simple economics could be explored. Curiously, there have been several instances of items from MUDs being auctioned off on e-Bay (for real money). This does suggest that such a feature might have some appeal.
Directories
The information stored about a user on a talker can be divided into three categories: transient state, local profile, and user information.
Transient state is implementation-specific data regarding the user’s session. This data is discarded when the user disconnects. Local profile includes the user’s description, how much currency they have, and which room they connect in. This information is saved when the user disconnects.
Most users use more than one talker. Some users use many talkers, often using the same identity on each. There are many pieces of information associated with users that they must enter manually into each talker. These include name, e-mail address, sex, age or date of birth, IM handles and homepage URL. It would make sense for this information to be kept in one location. An LDAP directory would be one possible solution, even though it leaves a number of details to be considered, such as who would keep the directory online, and what would happen in the event of a failure.
Internationalisation
Talkers don’t currently attempt to deal with internationalisation (i18n) issues. This is understandable; it’s a complex issue. For example, should the sequence of bytes EF BB BF E4 BD A0 E5 A5 BD look like “ï»¿ä½ å¥½” or “你好”? It’s clear to us which one is correct, but not to either the talker or the clients, because the answer depends on the character set being used. Talkers make little attempt to interpret input characters other than the few they directly act upon, instead passing them straight to the clients. If the two conversing parties are using non-ASCII characters (i.e. those with code points above 127) but are using the same character set, then this isn’t a problem. However, if they’re using incompatible character sets then the non-ASCII characters will be displayed wrongly. The TELNET protocol allows the discovery of a client’s character set using a sub-option, but this isn’t currently used. Even if it were, the talker would have to perform complex conversions between character sets. A new talker architecture could overcome this problem by storing and transferring all text in Unicode , a character coding system that assigns a single unique number to every character used by modern languages today (and then some). When the talker and all its clients know that they’re exchanging Unicode text, agreeing on which characters are being exchanged is no longer a problem. Other issues, such as directionality and normalisation, must still be addressed by client software, but the Unicode Consortium gives clear guidelines on this.
Addressing
Names
Most long-term talkers are hosted on servers that also host several other talkers. The server’s DNS name might be server.example.com, but, as each talker requires a unique rendezvous point, a TCP port number must also be specified. Many talkers can be partially referenced by their own DNS name, such as mytalker.com, but as this still resolves to the same IP address it must still be disambiguated with a port number. We use names for addresses because they’re easier to remember than numbers, but while we don’t have to remember an IP address, we still have to remember a port number.
A similar problem existed in web hosting. Before HTTP 1.1, the solution was to give the network interface of a server multiple IP addresses, and allow one web server to bind to each of these addresses. The extremely rapid expansion of the Web and the limited supply of IP addresses meant a better solution was needed, so the current HTTP protocol requires that the DNS name used to access the web site be specified in each request to the web server. This was a great improvement, but because it required a change to the protocol it was impossible to retrospectively apply the same principle to other services.
DNS SRV resource records offer an even more flexible solution. These records are similar to A records in that they provide an IP address for a particular DNS label, but they also provide a port number, and weighting and priority indicators. The practical result of this is that specifying mytalker.com would be sufficient to direct a next-generation talker client to the desired server. The weight field is intended for use in load balancing situations, and is unlikely to be used by talkers. The priority field, which has the same purpose as the priority field in MX records, could be used to allow clients to automatically failover to a backup server.
URIs
A possible extension to addressing talkers solely by name is to address entities (or resources) within the talker. URIs provide a natural facility for doing this. For example, a talker might have a room called entrance, which could be identified by the URI talker://mytalker.com/rooms/entrance. (Note that this is an example only, and talker is not being suggested as a URI scheme.) An ‘advanced’ option in a talker client might allow such a URI as an indication of which room the user wanted to be in after connecting. URIs might also identify users, groups of users, objects, message boards, messages and administrative controls.
Towards a solution
If a co-ordinated effort is made to develop the next generation of talkers, the problems mentioned can be overcome, and the new features introduced. However, doing so is a delicate task; it is essential to ensure that even if talkers move forwards technologically they still retain the distinct character that separates them from the instant messengers, MUDs and—perhaps most importantly—IRC that they’re competing with.
I think the primary goal is to effect a paradigm shift where the talkers stops becoming a program that people interact with directly, and becomes a service that people use. The Web is a good model of this, having user agents (web browsers such as IE and Mozilla), servers (such as Apache and IIS) and resources (primarily HTML pages, but also graphical and interactive content). The server delivers the resources to the user via the user agent. In the case of talkers the user agent would be the client software and the resource would be something that defines the unique characteristics of a talker.
Providing a way in which a talker is defined separately from the code which delivers it yields a number strong advantages:
- the talker is no longer cluttered with boilerplate code for networking, logging, authentication, loading and saving resources, error handling, etc
- the server can be replaced independently of the talker definition
- while it would be important to create a reference implementation of the server, independent implementations could be created by others, giving a choice to those creating a talker definition
The last point also applies to user agents where the freedom of a user to choose an implementation that suits him or her is even more important.
This is only possible with standardisation. Just as HTML describes web pages, a method of defining talkers would have to be devised—a mixture of static and scripted content. There would be many issues to address here. For example, would a single scripting language be chosen to aid interoperability? Popular contenders would no doubt be Python, Ruby, Lua and ECMAScript (also known as JavaScript), but each has its champions and critics.
The protocol used for communication between clients and servers would also need to be defined. The previous section placed some requirements on this, but still leaves much open for discussion. While I think a purely XML-based protocol such as XMPP (developed for use by Jabber) is inappropriate, there are other possible starting points such as BEEP.
Once these were defined with reference implementations, all that would be left is the hurdle of persuading people to adopt the new technology. Hopefully, those talkers run by the people involved in creating the new specifications would be compelling demonstrations of the way forward.