TIP #259: MAKING 'EXEC' OPTIONALLY BINARY SAFE ================================================ Version: $Revision: 1.4 $ Author: Andreas Leitgeb State: Draft Type: Project Tcl-Version: 8.7 Vote: Pending Created: Monday, 12 December 2005 URL: https://tip.tcl-lang.org259.html Post-History: ------------------------------------------------------------------------- ABSTRACT ========== A new option shall be added to the command *exec*, that allows the user to specify that input redirected from immediate data (using *<<*) and/or the data received from the external command shall not undergo any translation within Tcl. MOTIVATION ============ External programs may expect binary data, or write out binary data, or neither or even both. Whether a program reads/writes binary data or platform-encoded data is generally specific to the particular program and known by the programmer who intends to *exec* it from a Tcl script. For example, a hexdump-utility expectably reads binary data and outputs text. DEFICIENCIES OF CURRENT STATE OF EXEC ======================================= *Problem 1:* For passing string-data to external programs, *exec* now behaves arguably incorrect, because it does not pass through \0-bytes. *Problem 2:* For returning the result of external programs, *exec* applies translations based on system-default encoding, which is OK in most cases, except, of course, for programs that output binary data. Problem 1 is actually a bug, but for compatibility reasons some internal function cannot be changed, because it is believed to be used by some extensions (although it is not officially exported), thus blocking the fixing of the bug. This TIP goes beyond that, by not having it fixed to some particular consistent behaviour, but to let the script-developer decide on what is the right behaviour for his needs. Problem 2 prevents the output of binary-outputting programs from being correctly retrieved. PROPOSAL FOR A NEW OPTION =========================== The *exec* command already has an interface for options, and currently supports one option *-keepnewline* and the end-of-options marker *--*. This means that adding a new option will not adversely affect any existing scripts. This TIP proposes a new option, *-binary* /arg/. /arg/ can be either a single boolean value: if a boolean true then both input (if a *<<* redirection is present) and return value are passed verbatim between Tcl and the external program. if a boolean false (which is the default) then behaviour would be like it is now, except for input being \0-safe and system-translation taking place as appropriate (e.g. line-endings). /arg/ can also be one of the keywords *in*, *out*, *both* (which is equivalent to 1) or an empty string (equivalent to 0) for more readable code. The directions are to be seen from external programs perspective. If /arg/ is any true boolean value, *both* or *out*, then the option *-keepnewline* is implied. If some usage of *exec* does not use *<<* string-redirection, then the *in*-bit has no visible effect. For now, no binary flag is defined for stderr. This might be subject of a future TIP or left out due to lack of need. ALTERNATIVES ============== Benjamin Riefenstahl suggested to not make a binary (in the sense of yes/no) decision for each of input and output, but to directly specify the encodings to use (of which "binary" would also be a valid one). This has some subtle disadvantage for usage. As proposed, there is *one* option with one argument of effectively 4 different values. From each of these values it is evident, on which channels conversion takes place. To specify arbitrary encodings independently for two channels, there would need to be a list of encodings. For consistency with other commands, the "stdin"-encoding would have to be first, though it is used rarelier than output-encoding. So most times one would have to pass -encoding {{} binary} (yuck!) to specify encoding for output only. Unlike with general channels (as used for open |... or sockets) the data going through exec is always "limited": before actually calling exec it exists completely in memory as argument to exec, and afterwards output is returned as one single returnvalue. Each of these chunks can be handled as binary for exec, and explicitly converted through "encoding convertto" (for input) or encoding convertfrom (for output). Encoding per system-default is achieved by not adding any -binary option to exec at all, which covers probably >95% of all usages, anyway. (the remaining 5% being the ones for whom this TIP has been written) IMPLEMENTATION ================ No implementation exists right now, although it is possible that this will change in near future. An implementation would have to change these functions: Those functions that are not modifyable due to them being used elsewhere, need to be replaced by an extended version, and the old function can be changed to call the new one with appropriate extra arguments, and eventually be phased out. Tcl_ExecObjCmd: Handle the new option and set new bits for the flags argument of /Tcl_OpenCommandChannel/ or add a new bitset argument. Tcl_OpenCommandChannel: Deal with the new bits, or with new argument and pass them/it on. TclCreatePipeline: needs new arguments /flags/ (for solving bug #768678 along the way) and pass new arguments to a new variant of /TclpCreateTempFile/. TclpCreateTempFile: both Unix and Win version currently get a dumb C-style char-pointer with no length-information. They need two extra arguments, a length and the appropriate binary-bit. Since /TclpCreateTempFile/ is in the stubs-table, we definitely need a new Version of it, e.g. /TclpCreateTempFileEx/, which will take and use these new arguments. COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows