TIP #22: MULTIPLE INDEX ARGUMENTS TO LINDEX ============================================= Version: $Revision: 1.22 $ Author: David Cuthbert Kevin Kenny Don Porter Donal K. Fellows State: Final Type: Project Tcl-Version: 8.4a2 Vote: Done Created: Friday, 19 January 2001 URL: https://tip.tcl-lang.org22.html Discussions-To: news:comp.lang.tcl mailto:kennykb@acm.org Post-History: ------------------------------------------------------------------------- ABSTRACT ========== Obtaining access to elements of sublists in Tcl often requires nested calls to the /lindex/ command. The indices are syntactically listed in most-nested to least-nested order, which is the reverse from other notations. In addition, the nesting of command substitution brackets further decreases readability. This proposal describes an extension to the /lindex/ command that allows it to accept multiple index arguments, in least-nested to most-nested order, to automatically extract elements of sublists. RATIONALE =========== The heterogeneous nature of Tcl lists allows them to be applied to a number of useful data structures. In particular, lists can contain elements that are, themselves, valid lists. In this document, these elements are referred to as /sublists./ Extracting elements from sublists often requires nested calls to /lindex./ Consider, for example, the following Tcl script that prints the center element of a 3-by-3 matrix: set A {{1 2 3} {4 5 6} {7 8 9}} puts [lindex [lindex $A 2] 2] When these calls are deeply nested - e.g., embedded in an /expr/ arithmetic expression, having results extracted through /lrange,/ etc. - the results are difficult to read: # Print the sum of the center indices of two 3x3 matrices set p [expr {[lindex [lindex $A 2] 2] + [lindex [lindex $A 2] 2]}] # Get all but the last font in the following parsed structure: set pstruct {text {ignored-data { ... } } {valid-styles {justifiction {left centered right full}} {font {courier helvetica times}} } } return [lrange [lindex [lindex [lindex $pstruct 1] 2] 2] 0 end-1] Note that the list of indices in the latter example is listed in the reverse order of vector indices. In most other languages/domains, the last line might take on one of the following forms: return list_range(pstruct[2][2][1], 0, end-1); return pstruct[[2, 2, 1]][[0:-1]] temp = pstruct(2, 2, 1); result = range(temp, 0, length(temp) - 1); Allowing the /lindex/ command to accept multiple arguments would allow this more-natural style of coding to be written in Tcl. SPECIFICATION =============== 1. Under this proposal, the syntax for the /lindex/ command is to be modified to accept one of two forms: lindex list indexList or lindex list index1 index2... In either form of the command, the /list/ parameter is expected to be a well-formed Tcl list. In the first form of the command, the /indexList/ argument is expected to be a Tcl list comprising one or more list indices, each of which must be an integer, the literal string /end/, or the literal string /end-/ followed by an integer with no intervening whitespace. The existing /lindex/ command is a degenerate form of this first form, where the list comprises a single index. In the second form of the command, each of the /index/ arguments is expected to be a list index, which again may be an integer, the literal string /end/, or the literal string /end-/ followed by an integer with no intervening whitespace. 2. In either form of the command, once the /indexList/ parameter is expanded, there is a single /list/ parameter and one or more /index/ parameters. If there is a single /index/ parameter /N/, the behavior is identical to today's /lindex/ command, and the command returns the /N/th element of /list/; if /N/ is less than zero or at least the length of the list, a null string is returned. 3. If more than one /index/ parameter is given, then the behavior is defined recursively; the result of lindex $list $index0 $index1 ... or lindex $list [list $index0 $index1 ...] is indentical to that of lindex [lindex $list $index0] $index1... or, equivalently, lindex [lindex $list $index0] [list $index1...] (This specification does not constrain the implementation, which may be iterative, recursive, or even expanded inline.) 4. When an invalid index is given, an error of the form, /bad index "invalid_index": must be integer or end?-integer?/, where /invalid_index/ is the first invalid index encountered, must be returned. 5. If the list argument is malformed, the error resulting from an attempt to convert the list argument to a list must be returned. This behaviour is unchanged from the current implementation. SIDE EFFECTS ============== 1. Whether the result of the /lindex/ operation is successful, the underlying Tcl_Obj that represents the list argument may have its internal representation invalidated or changed to that of a list. DISCUSSION ============ Some attention must be paid to giving the /lindex/ command adequate performance. In particular, the implementation should address the common case of lindex $list $i where /$i/ is a "pure" integer (that is, one whose string representation has not yet been formed). If the above specification is followed naively, the flow will be as follows. Since /objc/ is three, /objv[2]/ is expected to be a list. Since it is not, it must be converted from its string representation. It does not have one yet, so the string representation must be formed. Now the string representation is parsed as a list. An array (of length one) of Tcl_Obj pointers is allocated to hold the list in question, and a Tcl_Obj is allocated to hold the single element. Memory is allocated to hold the element's string representation. Now the /lindex/ command converts the first element of the list to an index (in this case an integer). This elaborate ballet of type shimmering requires converting the integer to a string and back again. It also requires four calls to /ckalloc:/ 1. Allocate a buffer for the string representation. 2. Allocate the array of Tcl_Obj pointers for the list representation. 3. Allocate the Tcl_Obj that represents the first (and only) element of the list. 4. Allocate a buffer for the string representation of that element. And at the end, the result is the same integer that was passed as a parameter originally. To avoid all this overhead in the common case, the proposed implementation shall (in the case where /objc==3/) 1. Test whether /objv[2]/ designates an object whose internal representation holds an integer. If so, simply use it as an index. 2. Test whether /objv[2]/ designates an object whose internal representation holds a list. If so, perform the recursive extraction of indexed elements from sublists described above. 3. Form the string representation of /objv[2]/ and test whether it is /end/ or /end-/ followed by an integer. If so, use it as an index. 4. Attempt to coerce /objv[2]/ to an integer; if successful, use the result as an integer. 5. Attempt to coerce /objv[2]/ to a list; if successful, use the result as an index list. 6. Report a malformed /index/ argument; the /indexList/ parameter is not a well-formed list. This logic handles all the cases of singleton lists transparently; it is effectively a simple-minded type inference that optimizes away needless conversions. Assuming that the related [TIP #33] is approved, this logic will most likely be combined with the identical logic required in that proposal for parsing /index/ arguments to the /lset/ command. ------------------------------------------------------------------------- COMMENTS ========== /Don Porter / I agree that it would be helpful to many programmers to provide a multi-dimensional array data structure that can be accessed in the manner described in this TIP. In the /struct/ module of /tcllib/, several other data structures are being developed: graph, tree, queue, stack. I would support adding another data structure to that module that provides an interface like the one described in this TIP, with the intent that all of these helpful data structures find their way into the BI distribution. I don't see any advantage to adding complexity to [lindex] as an alternative to development of a multi-dimensional array structure. Without a compelling advantage, I'm inclined against making [lindex] more complex. I like having Tcl's built-in commands provide primitive operations, and leave it to packages to combine the primitives into more useful, more complex resources. This TIP should also consider how any changes to [lindex] mesh with the whole [listx] overhaul of Tcl's [list] command that has been discussed. /Dave Cuthbert responds/ Don makes a good point -- with a good set of data structures in tcllib, the need for this TIP is lessened or even eliminated. Nonetheless, I see this as a way of implementing the structures he describes. In other words, a more powerful primitive (which, in reality, adds fairly little complexity when measured in number of lines of code changed) would benefit these structures. As for the [listx] overhaul, there are many competing proposals for the specification it is difficult to come up with a metric. In writing this TIP, I assumed a vacuum -- that is, a listx command would not be added to the core in the near future. /Donal K. Fellows points out/ Although there is tcllib and [listx] to think about, they are certainly not reasons for rejecting this TIP out of hand. The availability of tcllib is not currently anything like universal (not stating whether this is a good, bad or ugly thing) and all the [listx] work will need its own TIP to make it into the core (you tend to have availability problems if it is an extension.) It is not as if the core is short of syntactic sugar right now (the [foreach] command is ample demonstration of this.) /Don Porter follows up/ I'll leave the discussion above in place so the history of this TIP is preserved, but I have withdrawn my objection. ------------------------------------------------------------------------- There was quite a discussion on about using lindex to return multiple arguments. For example: % set list {a {b1 b2} c d e} % lindex $list 1 b1 b2 % lindex $list 1 0 {b1 b2} a % lindex $list {1 0} b1 In other words, the list index arguments can, themselves, be lists. Only when the argument is a list would the "recursive selection" procedure of the TIP be used. For multiple arguments, the behaviour is akin to lindex $list a b c -> list [lindex $list a] [lindex $list b] [lindex $list c] /Summarised by Dave Cuthbert / /Donal K. Fellows points out/ The problems with the above version of multiple indexing are that it loses the property that [lindex] always returns a single element (making writing robust code harder) and that it forces use of the [list] constructor a lot or inefficient type handling when some of the indices must be computed. Then there is the whole question of what happens when you have indexes that are lists of lists, which is a major can of worms. Luckily, we could always put this sort of behaviour into a separate command (e.g. called [lselect]) which addresses at least the majority of my concerns, and which (in my opinion) need not even form part of this TIP. /Dave Cuthbert adds/ I intentionally left [lselect] out of the original TIP (and it is still not present in the 02-April-2001 version). As Donal points out, it is a major can of worms and, though there was general agreement on c.l.t that such a command would be useful, people had differing opinions on what form it should take. Perhaps my view on TIPs is incorrect, but I try to include only sure-fire "yeah, we ought to have done that a few versions ago" items. ------------------------------------------------------------------------- NOTES ON HISTORY OF THIS TIP ============================== /This TIP was originally written by Dave Cuthbert , but ownership has passed (beginning of April) to Kevin Kenny ./ This TIP underwent substantial revision in May of 2001, to add the syntax where all the /index/ parameters could be grouped as a list rather than placed inline on the command line. SEE ALSO ========== [TIP #33]. COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows