I’ve been working on instant messaging software for seven years, so I’ve been exposed to a lot of IM protocols. The “protocol” is the structure of bytes that gets sent back and forth between your computer and the IM service.
The major IM protocols (AIM, MSN, Yahoo, etc) are fairly well thought out and logical. But sometimes things go horribly wrong. An example that I recently learned about, and the impetus for this post, is the format used for Yahoo IMs. Here’s a handy pocket reference:
Mixture of ANSI escape sequences and HTML
Bold, italic, underline and font color are specified using ANSI escape sequences, but font size and font face are specified using the <font> HTML tag.
HTML tags aren’t closed
Subsequent tags just override the value of the previous tags. Message formatting is more linear than hierarchical. For example, “<font face=’Georgia’>test1<font face=’Courier’>test2.”
HTML font tag size attribute is in points
For example, “<font size=’14’>test.” Normally the size given in the font tag is a relative value between 1 and 7, with 1 being “small” and 7 being “large.”
Special HTML entities aren’t escaped
For example, if an IM contains a less than sign it is sent as “alien < predator.” Normally < > and & are written as < > & in HTML documents so that programs can accurately determine if a < is the start of an HTML tag or is a literal less than sign.
Why does this matter? It means the user cannot send this IM, because it is interpreted as a font tag instead of plain text: “<font size=’32’>Huge text.” This generally isn’t a problem for normal users, but can be a nuisance for web developers, who may want to IM that text to a friend and have it appear the way they typed it.