KWIC, quickly
β’ 359 words β’ ~2 min read β²
I recently reread David Parnas classic paper On the Criteria To Be Used in Decomposing Systems into Modules. Parnas shows two different approaches to program decompostion illustrated on one example problem, the KWIC index. He writes:
This is a small system. Except under extreme circumstances (huge data base, no supporting software), such a system could be produced by a good programmer within a week or two.
The remark about the effort, which is actually only an insignificant aside in the paper, caught my eye. Two weeks is the typical length of a sprint, so most working programmers will have an intuition about the amount of work that is on average fitted into such a timeframe. Here is what would supposedly have been a reasonable expectation in 1972:
The KWIC index system accepts an ordered set of lines, each line is an ordered set of words, and each word is an ordered set of characters. Any line may be "circularly shifted" by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a listing of all circular shifts of all lines in alphabetical order.
The following implementation, straight from that spec, took less than 20 minutes, which included 10 minutes of reading up on the various configuration options of localeCompare (TL;DR: the defaults are quite decent):
function circularShift(line) {
let [head, ...rest] = line;
return [...rest, head];
}
function circularShifts(line) {
let all = [line];
let last = line;
for (let i = 1; i < line.length; i++) {
last = circularShift(last);
all.push(last);
}
return all;
}
function lineComparator(l1, l2) {
let first = l1.join(" ");
let second = l2.join(" ");
return first.localeCompare(second);
}
/**
* @param {string[][]} lines
*/
function kwicIndex(lines) {
return lines
.map(line => circularShifts(line))
.flat()
.sort(lineComparator);
}
Under the assumption Parnas' estimate of the required effort was not hyperbolic, and let's take the lower end of it (a 40h work week), that results in a factor of 120 in improved productivity (120 * 20min = 40h), simply for having bog-standard Javascript available (I presume it will translate to the same order of magnitude in most higher programming languages), and all without having to bother myself with a Large Looting Model.