## Wednesday, September 10, 2008

### New Control Structure Considered Useful

We have a large data processing problem at work. Basically, we get thousands of items every day and have to figure out which ones go together. The only way to know if they go together is to try them. Some items may not go with anything, some may fit with a hundred or more other items. The one nice thing is that once we've found 3-5 items that go together, we can zip through all the other members of the group.

The problem is finding those 3-5 items. Choosing every possible combination of 5 would take too long. Choosing every combination of 3 is fast enough, but leaves a lot of extras. The simple solution is to first try all the combos of 3, then try all the combos of 4 on the remainder, then try all the combos of 5 on the remainder of that. The smaller and smaller pile makes the larger and larger choices feasible.

My boss, who is a competent practical programmer, suggests we have 3 procedures: One that does a 3 nested loop, one that does a 4, and one that does a 5. Like this (in Tcl):

```set items {apple orange banana grape strawberry}
set count 5

for {set i 0} {\$i < \$count} {incr i} {
for {set j [expr \$i + 1]} {\$j < \$count} {incr j} {
for {set k [expr \$j + 1]} {\$k < \$count} {incr k} {
set string [lindex \$items \$i]
lappend string [lindex \$items \$j]
lappend string [lindex \$items \$k]
puts \$string
}
}
}
```

That's just the 3 level loop because the other two look almost the same. The 4 and 5 level loops are identical except for having additional levels. As a programmer who values elegance over readability, the phrase "identical except for" is a red flag. Why have 3 separate procedures when all you are adjusting is a single parameter? What I need is a new control structure that is basically a for loop but lets me control how many nested fors there are.

Tcl makes it easy to create new control structures. Here's the control structure definition:

```# Usage:
#  indexcount - how many items are being chosen
#  itemcount  - how many items are being chosen from
#  indexvar   - variable to hold current combination of indexes
#  body       - code to execute for each combination
proc chooseloop {indexcount itemcount indexvar body} {

if {\$indexcount > \$itemcount} {
error "More indexes than items"
}

if {\$indexcount == 0} { return }

set indexes {}
for {set i 0} {\$i < \$indexcount} {incr i} {
lappend indexes \$i
}

set maxindexval [expr \$itemcount - 1]

while {1} {

# make new body that sets indexvar first
set newbody "set \$indexvar {\$indexes}\n\$body"

# do this iteration
uplevel 1 [list eval \$newbody]

# find incrementable index
set found no
for {set i 0} {\$i < \$indexcount} {incr i} {
set index [lindex \$indexes end-\$i]
set thismax [expr \$maxindexval - \$i]
if {\$index < \$thismax} {
set found yes
break
}
}

if {!\$found} { break }

# increment this index and set all following ones
set incrindex [expr (\$indexcount - 1) - \$i]
set precedingval [lindex \$indexes \$incrindex]
for {set j \$incrindex} {\$j < \$indexcount} {incr j} {
lset indexes \$j [expr \$precedingval + 1]
set precedingval [lindex \$indexes \$j]
}
}
}
```

Now to get my 3 level loop, I can call like this:

```chooseloop 3 5 indexes {
set str ""
foreach index \$indexes {
append str "[lindex \$items \$index] "
}
puts \$str
}```

In order to do 3, then 4, then 5, I can call like this:

```for {set i 3} {\$i < 6} {incr i} {
chooseloop \$i 5 indexes {
set str ""
foreach index \$indexes {
append str "[lindex \$items \$index] "
}
puts \$str
}
puts "--"
}
```

The output of that last one is:

```apple orange banana
apple orange grape
apple orange strawberry
apple banana grape
apple banana strawberry
apple grape strawberry
orange banana grape
orange banana strawberry
orange grape strawberry
banana grape strawberry
--
apple orange banana grape
apple orange banana strawberry
apple orange grape strawberry
apple banana grape strawberry
orange banana grape strawberry
--
apple orange banana grape strawberry
--

```