TIP #345: KILL THE 'IDENTITY' ENCODING ======================================== Version: $Revision: 1.2 $ Author: Alexandre Ferrieux State: Draft Type: Project Tcl-Version: 8.7 Vote: Pending Created: Thursday, 05 February 2009 URL: https://tip.tcl-lang.org345.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations. BACKGROUND ============ The contract of string representations in Tcl states that the /bytes/ field (the *strep*) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [*encoding system*]=="identity", or through [*encoding convertfrom identity*]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc. PROPOSED CHANGE ================= This TIP proposes to simply close that single window to the dark side. RATIONALE =========== The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest. REFERENCE EXAMPLE =================== See Bug 2564363 [] COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows