TIP: 297 Title: Integer Type Introspection and Conversion Version: $Revision: 1.3 $ Author: Don Porter State: Draft Type: Project Vote: Pending Created: 20-Nov-2006 Post-History: Tcl-Version: 8.7 Keywords: Tcl, number, expression ~ Abstract This TIP proposes changes to complete the set of commands to test and convert among Tcl's integer types. ~ Background There are four integer types that appear in Tcl's C API. They are ''int'', ''long'', ''Tcl_WideInt'', and ''mp_int''. The corresponding routines to pull a value of each of those types from a ''Tcl_Obj'' are '''Tcl_GetIntFromObj''', '''Tcl_GetLongFromObj''', '''Tcl_GetWideIntFromObj''', and '''Tcl_GetBignumFromObj'''. These integer types form increasing sets. That is, every ''Tcl_Obj'' that can return an ''int'' can also return a ''long'', ''Tcl_WideInt'', or ''mp_int''. Strictly speaking, the set of ''Tcl_Obj'' values that can successfully return either and ''int'', ''long'', or ''Tcl_WideInt'' is platform-dependent, because the size of these types is platform dependent. '''Tcl_GetIntFromObj''' accepts integer values in any format (decimal, binary, octal, hexadecimal, etc., see '''TCL_PARSE_INTEGER_ONLY''' in [249]) that are within the inclusive platform-dependent range ('''-UINT_MAX''', '''UINT_MAX'''). '''Tcl_GetLongFromObj''' accepts integer values in any format that are within the inclusive platform-dependent range ('''-ULONG_MAX''', '''ULONG_MAX'''). '''Tcl_GetWideIntFromObj''' accepts integer values in any format that are within the inclusive platform-dependent range ('''-ULLONG_MAX''', '''ULLONG_MAX'''), or the appropriate equivalent for the platform. '''Tcl_GetBignumFromObj''' accepts integer values in any format with (effectively) no limit on range. The most common example of platform dependence of results seen at the script level is the different results of '''[[expr int(.)]]''' on most 32-bit systems, | % set tcl_platform(wordSize) | 4 | % expr int(1<<31) | -2147483648 compared with LP64 systems. | % set tcl_platform(wordSize) | 8 | % expr int(1<<31) | 2147483648 These differences show up most unfortunately when implementing algorithms designed to operate explicitly on 32-bit buffers, where the only portable way to do the operations in Tcl is with careful application of masking (''& 0xffffffff''). For one well-known example, see the ''sha1'' package in tcllib. The additional operations in Tcl expressions harm performance. There are other Tcl routines that pull values from ''Tcl_Obj'' that accept supersets of one of the integer types. An example is '''Tcl_GetIndexFromObj''' which will accept anything that '''Tcl_GetIntFromObj''' accepts, as well as other string values. There are also Tcl built-in commands that accept arguments that are supersets of one of the integer types. An example is '''uplevel''' which accepts as its level argument anything that '''Tcl_GetIntFromObj''' accepts, as well as other string values. All Tcl commands are ultimately defined by the C command procedures that run to implement them, and when those command procedures use the routines mentioned above to pull values from command arguments, the result is that the Tcl commands will succeed or fail depending on whether or not an integer value of the right type has been provided by the caller. As a simple example: | % lindex {} 0xffffffff | % lindex {} 0x100000000 | bad index "0x100000000": must be integer or end?-integer? In order to avoid errors from commands, a cautious programmer may wish to test whether a value is of an acceptable type before passing it to a command. The '''string is integer''' command has long offered this facility for commands that require (a superset of) an ''int''. | % string is integer 0xffffffff | 1 | % string is integer 0x100000000 | 0 Most of Tcl's built-in commands that accept an integer valued argument require that argument to be acceptable to '''Tcl_GetIntFromObj''' and the existing '''string is integer''' command provides sufficient introspection. [188] created the new command '''string is wideinteger''', and that is suitable for testing values for the small number of Tcl commands that strictly require a value acceptable to '''Tcl_GetWideIntFromObj'''. Those commands are: | after $wide | binary format w $wide | chan seek $chan $wide | chan truncate $chan $wide | clock add $wide | clock format $wide There are some built-in Tcl commands that require an argument that is acceptable to '''Tcl_GetBignumFromObj'''. That is, the argument must be an integer, but no range limit is imposed. Currently there is no test command appropriate for argument checking for these commands. | dict incr $dictVar $bignumkey $bignum | expr srand($bignum) | expr ~$bignum | expr $bignum % $bignum | expr $bignum << $int | expr $bignum >> $bignum | expr $bignum & $bignum | expr $bignum ^ $bignum | expr $bignum | $bignum | incr $bignumVar $bignum | format $integerSpecifier $bignum There are some built-in Tcl commands that require an argument that is acceptable to '''Tcl_GetLongFromObj'''. Currently there is no test command appropriate for argument checking for these commands. | binary format i $long | binary format s $long | binary format c $long | file atime $path $long | file attributes $path -permissions $long | file mtime $path $long Note that the accepted ranges of the '''Tcl_GetFooFromObj''' routines can lead to surprising results. For example, '''Tcl_GetIntFromObj''' accepts values from '''-UINT_MAX''' to '''UINT_MAX'''. For some things this is good, since it supports | binary format i 0x80000000 on 32-bit platforms, which is a common coding style. However the same range acceptance leads to surprising (and arguably incorrect, in the presence of bignum support) things like: | % string repeat a -4294967290 | aaaaaa It seems there are good uses for both strict and liberal routines for pulling integer ranges from a ''Tcl_Obj''. Compatibility concerns would favor keeping the existing routines liberal, and adding strict counterparts. If this is pursued, however, another collection of '''string is''' test commands would be needed as well. ~ Proposed Changes Still pondering how best to react to this background. Discussion invited on TCLCORE. ~ Compatibility ~ Reference Implementation ~ Copyright This document has been placed in the public domain.