The problem is finding those 3-5 items. Choosing every possible combination of 5 would take too long. Choosing every combination of 3 is fast enough, but leaves a lot of extras. The simple solution is to first try all the combos of 3, then try all the combos of 4 on the remainder, then try all the combos of 5 on the remainder of that. The smaller and smaller pile makes the larger and larger choices feasible.
My boss, who is a competent practical programmer, suggests we have 3 procedures: One that does a 3 nested loop, one that does a 4, and one that does a 5. Like this (in Tcl):
set items {apple orange banana grape strawberry}
set count 5
for {set i 0} {$i < $count} {incr i} {
for {set j [expr $i + 1]} {$j < $count} {incr j} {
for {set k [expr $j + 1]} {$k < $count} {incr k} {
set string [lindex $items $i]
lappend string [lindex $items $j]
lappend string [lindex $items $k]
puts $string
}
}
}
That's just the 3 level loop because the other two look almost the same. The 4 and 5 level loops are identical except for having additional levels. As a programmer who values elegance over readability, the phrase "identical except for" is a red flag. Why have 3 separate procedures when all you are adjusting is a single parameter? What I need is a new control structure that is basically a for loop but lets me control how many nested fors there are.
Tcl makes it easy to create new control structures. Here's the control structure definition:
# Usage:
# indexcount - how many items are being chosen
# itemcount - how many items are being chosen from
# indexvar - variable to hold current combination of indexes
# body - code to execute for each combination
proc chooseloop {indexcount itemcount indexvar body} {
if {$indexcount > $itemcount} {
error "More indexes than items"
}
if {$indexcount == 0} { return }
set indexes {}
for {set i 0} {$i < $indexcount} {incr i} {
lappend indexes $i
}
set maxindexval [expr $itemcount - 1]
while {1} {
# make new body that sets indexvar first
set newbody "set $indexvar {$indexes}\n$body"
# do this iteration
uplevel 1 [list eval $newbody]
# find incrementable index
set found no
for {set i 0} {$i < $indexcount} {incr i} {
set index [lindex $indexes end-$i]
set thismax [expr $maxindexval - $i]
if {$index < $thismax} {
set found yes
break
}
}
if {!$found} { break }
# increment this index and set all following ones
set incrindex [expr ($indexcount - 1) - $i]
set precedingval [lindex $indexes $incrindex]
for {set j $incrindex} {$j < $indexcount} {incr j} {
lset indexes $j [expr $precedingval + 1]
set precedingval [lindex $indexes $j]
}
}
}
Now to get my 3 level loop, I can call like this:
chooseloop 3 5 indexes {
set str ""
foreach index $indexes {
append str "[lindex $items $index] "
}
puts $str
}In order to do 3, then 4, then 5, I can call like this:
for {set i 3} {$i < 6} {incr i} {
chooseloop $i 5 indexes {
set str ""
foreach index $indexes {
append str "[lindex $items $index] "
}
puts $str
}
puts "--"
}
The output of that last one is:
apple orange banana apple orange grape apple orange strawberry apple banana grape apple banana strawberry apple grape strawberry orange banana grape orange banana strawberry orange grape strawberry banana grape strawberry -- apple orange banana grape apple orange banana strawberry apple orange grape strawberry apple banana grape strawberry orange banana grape strawberry -- apple orange banana grape strawberry --
Tcl represent.
ReplyDeleteI thought you'd like that.
ReplyDeleteAnd the great news is that for a given nesting level it only takes 10 times longer than the nested loop version! There's nothing Tim loves more than putting elegance over speed, so I expect a big raise.
I made this program mainly as a joke, but actually now I'm a little puzzled: Why exactly is it so slow? If I put the exprs in {}, I shave off about 10-20%, I can see that.
ReplyDeleteThe main problem seems to be looping through the indexes in the caller's body. If I could somehow set some named variables, I think it would be nearly as fast as the nested version. But of course I'd have to have a constant-size list of named variables, which doesn't help me.