Encode::IMAPUTF7 − modification of UTF−7 encoding for IMAP
use Encode
qw/encode decode/;
print encode('IMAP−UTF−7',
'Répertoire');
print decode('IMAP−UTF−7',
R&AOk−pertoire');
IMAP mailbox names are encoded in a modified UTF7 when names contains international characters outside of the printable ASCII range. The modified UTF−7 encoding is defined in RFC2060 (section 5.1.3).
There is another CPAN module with same purpose, Unicode::IMAPUtf7. However, it works correctly only with strings, which encoded form does not contain plus sign. For example, the Cyrillic string \x{043f}\x{0440}\x{0435}\x{0434}\x{043b}\x{043e}\x{0433} is represented in UTF−7 as +BD8EQAQ1BDQEOwQ+BDM− Note the second plus sign 4 characters before the end. Unicode::IMAPUtf7 encodes the above string as +BD8EQAQ1BDQEOwQ&BDM− which is not valid modified UTF−7 (the ampersand and the plus are swapped). The problem is solved by the current module, which is slightly modified Encode::Unicode::UTF7 and has nothing common with Unicode::IMAPUtf7.
By convention, international mailbox names are specified using a modified version of the UTF−7 encoding described in [UTF−7]. The purpose of these modifications is to correct the following problems with UTF−7:
1) UTF−7
uses the "+" character for shifting; this
conflicts with
the common use of "+" in mailbox names, in
particular USENET
newsgroup names.
2)
UTF−7’s encoding is BASE64 which uses the
"/" character; this
conflicts with the use of "/" as a popular
hierarchy delimiter.
3) UTF−7
prohibits the unencoded usage of "\"; this
conflicts with
the use of "\" as a popular hierarchy
delimiter.
4) UTF−7
prohibits the unencoded usage of "˜"; this
conflicts with
the use of "˜" in some servers as a home
directory indicator.
5) UTF−7
permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can
be
represented in encoded form.
In modified UTF−7, printable US-ASCII characters except for "&" represent themselves; that is, characters with octet values 0x20−0x25 and 0x27−0x7e. The character "&" (0x26) is represented by the two− octet sequence "&−".
All other characters (octet values 0x00−0x1f, 0x7f−0xff, and all Unicode 16−bit octets) are represented in modified BASE64, with a further modification from [UTF−7] that "," is used instead of "/". Modified BASE64 MUST NOT be used to represent any printing US-ASCII character which can represent itself.
"&" is used to shift to modified BASE64 and "−" to shift back to US− ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that is, a name that ends with a Unicode 16−bit octet MUST end with a "− ").
For example, here is a mailbox name which mixes English, Japanese, and Chinese text: ˜peter/mail/&ZeVnLIqe−/&U,BTFw−
Please report any requests, suggestions or bugs via the RT bug-tracking system at http://rt.cpan.org/ or email to bug−Encode−[email protected].
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode−IMAPUTF7 is the RT queue for Encode::IMAPUTF7. Please check to see if your bug has already been reported.
Copyright 2005 Sava Chankov
Sava Chankov, [email protected]
This software may be freely copied and distributed under the same terms and conditions as Perl.
Peter Makholm <[email protected]>, current maintainer
Sava Chankov <[email protected]>, original author
perl(1), Encode.
Hey! The
above document had some coding errors, which are explained
below:
Around line 90:
Non-ASCII character seen before =encoding in ’’Répertoire’);’. Assuming UTF−8