% -*- lang: icon -*-

\documentclass[11pt]{article}
\usepackage[letterpaper]{geometry}
\usepackage {noweb}
\usepackage{alltt}
\usepackage[hypertex,colorlinks=true,linkcolor=blue,extension=dvi]{hyperref}
\pagestyle {noweb}


\newcommand{\chunkref}[1] {$\langle$\subpageref{#1}$\rangle$}


\title {Mpp, a Multilingual Pretty-Printer for Noweb}
\author {Kostas N. Oikonomou \\  \textsf{oikonomou@att.com}}

\begin {document}
\maketitle
\tableofcontents

\section {Features and usage}

[[mpp]] is a pretty-printer, written in Icon, for the [[noweb]] system.  Its
main features are
\begin{itemize}
  \item Any chunk can be written in any language, and the pretty-printer will
    switch among languages.  The language in which a chunk is written is
    indicated by writing its name in parentheses at the end of the chunk
    name: e.g. [[@<<Comparisons of reals (C)@>>]] or [[@<<Filtering (Icon)@>>]]
    \footnote{A file [[legit_lang_names]] lists the strings that are
      considered to name languages, so when [[mpp]] sees, for example,
      [[@<<Comparisons of reals (again)@>>]] it will not attempt to switch to
      language ``again''.  Actually there is a more general mechanism: the
      options [[-d1]], [[-d2]] allow the user to specify two strings that
      delimit the language name; these default to parentheses.}.
  \item If no language is specified, a chunk inherits the language of its
    parent.
  \item A default language is specified when invoking [[mpp]].  So for a
    single-language [[noweb]] file, no chunk needs to specify a language.
  \item Languages are described by external ASCII files.  Users can add their
    own, or modify some of the supplied language files without touching
    [[mpp]]'s code.
  \item [[mpp]] does not touch the user's indentation or line-breaking.
  \item The spec file for language $L$ specifies $L$'s reserved words, which are
    typeset in bold, strings such as [[>=]] which are typeset in math mode and
    appear as $\ge$, and arbitrary translations of strings into \TeX\ code.
  \item The strings that denote comments or quotes are customizable, read from
    the spec file for $L$.  Comments are typeset in roman font, and \TeX 's math
    mode is active in comments.
\end {itemize}
[[mpp]] is a [[noweb]] filter, invoked as
\begin{quote}
  [[mpp -lib]] $\langle$\textit{path}$\rangle$ [[-L]]
  $\langle$\textit{language}$\rangle$
  [ [[-d1]] $\langle$\textit{s1}$\rangle$ [[-d2]] $\langle$\textit{s2}$\rangle$]
\end{quote}
where the (full) library path specifies where the language spec files are to be
found, and the language is the initial or default one.


\section {The basic design}

The pretty-printer's design is based on the following two premises:
\begin {itemize}
  \item It should be as independent of the target language as possible, and
  \item We don't want to write a full-blown scanner for the target language.
\end {itemize}
Strings of characters of the target language which we want to typeset specially
are called ``interesting tokens''.  There are three categories of interesting
tokens:
\begin {enumerate}
  \item Reserved words of the target language: we want to typeset them in bold,
    say.
  \item Other strings that we want to typeset specially, usually in math mode:
    e.g. $\le$ for [[<=]].
  \item Comment and quoting tokens (characters): we want what follows them or
    what is enclosed by them to be typeset literally.
\end {enumerate}
By reading the language spec file, a table [[trans]] is constructed that defines
a translation into \TeX\ code of every interesting token in the target language.
Here is an excerpt from the language spec file for Icon; lines beginning with
[[#]] are comments.
\begin{alltt}
  # Reserved words
  +by
  +break
  +case
  # Keywords
  +&ascii
  +&clock
  # Translator directives
  +$include
  +$line
  # Mathematical translations
  $<=   \verb|\|le
  $>=   \verb|\|ge
  $>>   \verb|\|succ
  # Arbitrary translations
  .\verb|{|     \verb|\{|
  .\verb|\|     \verb|\\|
  .~==
 \end{alltt}
Entries beginning with a ``$+$'' are typeset in bold, those beginning with a ``\$''
define math mode translations, and entries beginning with a ``.'' followed by a
pair of strings substitute the second string for the first.  We use four sets of
strings to define the tokens in categories 2 and 3:
\begin {center}
  [[special]], [[comment1]], [[comment2]], [[quote2]].
\end {center}
[[comment1]] is for unbalanced comment strings (e.g.\ [[%]] in Turing, [[#]] in
Icon, [[!]] in Fortran), [[comment2]] is for balanced comment strings (e.g.\ [[/*]]
and [[*/]] in C, or [[(*]] and [[*)]] in Mathematica), and [[quote2]] is for
literal quotes, such as [["]], which we assume to be balanced.

Our approach to recognizing the interesting tokens while scanning a line is to
have a set of characters [[interesting]] (an Icon cset), containing all the
characters by which an interesting token may begin. [[interesting]] is the union
of
\begin {itemize}
  \item the cset defining the characters which may begin a reserved word
  \item the cset containing the initial characters of all strings in the special,
    comment, and quote sets.
\end {itemize}
The basic idea is this: given a line of text, we scan up to a character in
[[interesting]], and, depending on what this character is, we may try to
complete the token by further scanning.  If this succeeds, we look up the token
in the [[trans]] table, and if the token is found, we output its translation,
otherwise we output the token itself unchanged.  When comment or quote tokens
are recognized, further processing of the line may stop altogether, or
temporarily, until a matching (closing) token is found.
<<*>>=
link options, strings, fullimag
<<Data structure holding a language spec>>
global language, trans
global res_word_chars, special, comment1, comment2, quote2, interesting
global begin_res_word, begin_special, begin_comment1, begin_comment2, begin_quote2
global in_comment1, in_comment2, in_quote
global libpath, line_num
<<Reading the language spec file>>
<<Switching languages>>
<<Language-independent procedures>>
<<[[main]] procedure>>
<<Utilities>>
@

@ 
<<Data structure holding a language spec>>=
record lang_spec(res_word_chars, special, comment1, comment2, quote2, 
  begin_res_word, begin_special, begin_comment1, begin_comment2, begin_quote2, 
  interesting, trans)
global known_langs       # table of \texttt{lang\_spec} indexed by language.
global legit_lang_names  # set of strings that can be language names
@


@ [[main]] interacts with [[TeXify]] when it comes to comments and quotes.
\enlargethispage{2cm}
<<[[main]] procedure>>=
procedure main(args)
  local line, chunk_name, kind, keyword, rest, L, p0
  local opts, d1, d2, f
  legit_lang_names := set() # strings that can be language names
  known_langs := table()    # languages whose specs have been loaded
  language := table() # indexed by chunk name, gives the language used by the chunk
  <<Set the library path and the initial language>>
  <<{\TeX} definitions at the top of the output file>>
  line_num := 0 # line no. in the input file
  while line := read() do {
    line_num +:= 1
    line ? {
      keyword := tab(upto(' ')|0) &
      rest := if tab(match(" ")) then {p0 := &pos;  tab(0)} else &null
      }
    case keyword of {
      "@begin" : {
        rest ? kind := tab(many(&letters))
        write(line)
        }
      "@defn" : {
        if kind == "code" then {
          <<Does this chunk set a new language $L$?>>
          }
        write(line)
        }
      "@use" : {
        <<Set the language of any chunk used by this chunk to $L$>>
        write(line)
        }
      "@text" : 
        if \kind == "code" then TeXify(rest,L,p0) else write(line)
      "@nl" : {
        if \in_comment1 then { # unbalanced comment
          write("@literal \\endcom{}");  in_comment1 := &null
          }
        write(line)
        }
      "@index" | "@xref" : {
        <<Suppress if in comment or quote>>
        }
      default : write(line)
      }
  }
end
@


@ The pretty-printer must be called with two arguments:
<<Set the library path and the initial language>>=
opts := options(args, "-lib:-L:-d1:-d2:")
libpath := (\opts["lib"] || "/") | 
  stop("mpp: you must specify the full path for the language spec files!")
L := \opts["L"] | stop("mpp: you must specify a language as an argument!")
f := open(libpath || "legit_lang_names") |
  stop("mpp: can't open `", libpath || "legit_lang_names", "'!")
while line := read(f) do
  every insert(legit_lang_names, words(line))
if member(legit_lang_names, L) then
  if /known_langs[L] then load_language_spec(L) else switch_to_language(L)
else
  stop("mpp: language `", L, "' is not in `legit_lang_names'!")
d1 := \opts["d1"] | "("
d2 := \opts["d2"] | ")"
@


@ To switch languages, chunk name must end with the name of a legitimate
language in parentheses.  See \chunkref{c:getl}.
<<Does this chunk set a new language $L$?>>=
chunk_name := rest
if L := get_language(chunk_name,d1,d2) then {
  if /known_langs[L] then load_language_spec(L) else switch_to_language(L)
  write("@language ", map(L, &ucase, &lcase))
  }
@


@
<<Set the language of any chunk used by this chunk to $L$>>=
assert(\L)
chunk_name := rest
if \language[chunk_name] ~== L then 
  stop("mpp: <", chunk_name, ">'s language already set to `", language[chunk_name], "'!")
else
  language[chunk_name] := L
@


@
\section {Language files, translation tables, and interesting tokens}
\label{sec:lang}

\subsection {Language specification files}

Language specification files are named [[Icon_pp_spec]], [[C_pp_spec]], etc.
They can contain comments, which are lines beginning with [[#]], and empty
lines.  The file format is described in the following table\footnote{To
  understand this better, look at one of the examples.}.  $s_i$ is a string,
$c_i$ is a character, $C_i$ is one of the standard Icon csets (character sets),
such as [[&letters]] or [[&digits]].
\begin{center}
  \framebox{%
  \begin{tabular}{ll}
    [[res_word_chars]]: & $C_1$ $C_2$ \ldots\ $c_1$ $c_2$ \ldots \\
    [[comment1]]: & $s_1$ $s_2$ \ldots \\
    [[comment2]]: & $s_1$ $s'_1$ \quad $s_2$ $s'_2$ \ldots \\
    [[quote2]]:   & $c_1$ $c'_1$ \quad $c_2$ $c'_2$ \ldots \\
    $\langle$translation table$\rangle$: & see \S\ref{sec:tt}
  \end{tabular}}
\end{center}
<<Reading the language spec file>>=
procedure load_language_spec(L)
  local name, f, line, wlist, w, w1, e, I
  name := libpath || L || "_pp_spec"
  f := open(name, "r") | stop("mpp: can't open file `", name, "'!")
  <<Get next line>>
  w1 == "res_word_chars:" | stop("mpp: `res_word_chars:' expected!")
  res_word_chars := ''
  <<Parse the [[res_word_chars]] line>>
  <<Get next line>>
  w1 == "comment1:" | stop("mpp: `comment1:' expected!")
  comment1 := wlist       # a list
  <<Get next line>>
  w1 == "comment2:" | stop("mpp: `comment2:' expected!")
  comment2 := []          # a list of pairs
  while put(comment2, [get(wlist),get(wlist)])
  <<Get next line>>
  w1 == "quote2:" | stop("mpp: `quote2:' expected!")
  quote2 := []            # a list of pairs
  while put(quote2, [get(wlist),get(wlist)])
  <<Set up the translation table and special tokens for $L$>>
  close(f)
  <<Detecting the beginning of a token>>
  known_langs[L] := lang_spec(res_word_chars, special, comment1, comment2, quote2,
    begin_res_word, begin_special, begin_comment1, begin_comment2, begin_quote2,
    interesting, trans)
end
@

@ Get the next non-comment, non-empty line and its words.
<<Get next line>>=
while line := read(f) do
  if line ~== "" & line[1] ~== "#" then break
wlist := get_words(line);  w1 := get(wlist)
@

@ Can't use [[variable]] and [[name]] for cset keywords because they are not
variables.  So
<<Parse the [[res_word_chars]] line>>=
every w := !wlist do {
  if *w = 1 then # character
    res_word_chars ++:= w
  else {         # cset
    res_word_chars ++:=
      case w of {
        "&letters" : &letters
        "&digits"  : &digits
        "&lcase"   : &lcase
        "&ucase"   : &ucase
        "&ascii"   : &ascii
        "&cset"    : &cset
        default: stop("mpp: unknown cset in `res_word_chars' line!")
        }
    }
}
@


@ Rather nifty code, using Icon's [[variable]] and [[name]] constructs to avoid
a lot of assignments.  For every field of [[Lspec]] named $n$, assign its value
to the global variable named $n$.
<<Switching languages>>=
procedure switch_to_language(L)
  local Lspec, n
  Lspec := \known_langs[L] | stop("mpp : `", L, "'should be known here!")
  every n := name(!Lspec) do {
    n ?:= {tab(find(".")+1) & tab(0)}
    variable(n) := Lspec[n]
    }
end
@


@
\subsection {Translation tables}
\label{sec:tt}

The translation table is now read from the language description file.  There are
four kinds of translations: make bold (entry starts with a ``$+$''), make
slanted (entry starts with a ``\texttt{\~}''), turn into a mathematical symbol
(entry starts with a ``\$''), or arbitrary substitition (line starts with a
``.'', followed by a pair of strings, of which the 2nd may be empty).  As
translations are read, we also create the list of [[special]] tokens, and the
cset [[begin_res_word]].
<<Set up the translation table and special tokens for $L$>>=
trans := table()  # global
special := []
begin_special := begin_res_word := ''
while line := read(f) do {
  if line[1] ~== "#" & line ~== "" then {
    wlist := get_words(line)
    w1 := wlist[1]
    case w1[1] of {
      "+" : w1 ? {      # make bold
        move(1);  w := tab(0)
        trans[w] := "{\\ttb{}" || w || "}"
        begin_res_word ++:= w[1]
        }
      "~" : w1 ? {      # make slanted
        move(1);  w := tab(0)
        trans[w] := "{\\tts{}" || w || "}"
        begin_res_word ++:= w[1]
        }
      "$" :  w1 ? {     # math special token
        move(1);  w := tab(0);
        # We don't use ``\$'' in the translation because ``\$'' might be part of the language $L$.
        trans[w] := "\\(" || wlist[2] || "\\)"
        put(special, w)
        begin_special ++:= w[1]
        }
      "." : w1 ? {      # arbitrary special translation, possibly empty
        move(1);  w := tab(0);
        trans[w] := \wlist[2] | &null
        put(special, w)
        begin_special ++:= w[1]
        }
      default : {
        }
      }
    }
  }
special := sort_by_length(special)
@


@
\section {Language-independent pretty-printing}
\label{sec:ind} 

Find out what is the language of chunk [[chunk_name]].  It must be a legitimate
language name, otherwise the string in parentheses (more generally between
the delimiters [[d1]] and [[d2]]) is ignored.
\nextchunklabel{c:getl}
<<Language-independent procedures>>=
procedure get_language(chunk_name,d1,d2)
  local L,i,n
  n := *chunk_name
  chunk_name ? {
    every i := find(" " || d1)  # get the last occurrence
    if \i > 0 then {
      move(i+(*d1));  L := tab(find(d2))
      if &pos = n then 
        if member(legit_lang_names, L) then return L
      }
    }
end
@

@ First we set up the typewriter bold font [[\ttb]], corresponding to pcrb8r,
and the typewriter slanted font [[\tts]].  Then we define the macros [[\begcom]]
(begin comment) and [[\endcom]].  [[\begcom]]
\begin {itemize}
  \item switches to [[\rmfamily]],
  \item activates [[$]] by changing its catcode to 3,
  \item makes the characters ``\texttt{\^{}}'' and ``[[_]]'' active for
    superscripts and subscripts,
  \item changes the catcode of the space character to 10.  This way comments
    will be typeset normally, and not as if [[\obeyspaces]] were active.
\end {itemize}
<<{\TeX} definitions at the top of the output file>>=
write("@literal \\DeclareFontShape{OT1}{cmtt}{bx}{n}{ <-> pcrb8r }{}")
write("@nl")
write("@literal \\def\\ttb{\\bfseries}")
write("@nl")
write("@literal \\def\\tts{\\slshape}")
write("@nl")
write("@literal \\def\\begcom{\\begingroup\\rmfamily \\catcode`\\$=3_
  \\catcode`\\^=7 \\catcode`\\_=8 \\catcode`\\ =10}")
write("@nl")
write("@literal \\def\\endcom{\\endgroup}")
write("@nl")
@


@ Don't output spurious [[@index use]] or [[@xref]] lines when in a comment or
quote.  ([[@index]] is produced by [[finduses]] and [[@xref]] by [[noidx]].)
However, we do want to output [[@index defn]] lines.
All of this works only if the language filter is run \emph{before} [[noidx]].
\nextchunklabel{c:cqfix}
<<Suppress if in comment or quote>>=
if (/in_comment1 & /in_comment2 & /in_quote) | match("defn", rest) then
  write(line)
@

@ For each interesting category define a cset containing the characters by which
a token in that category may begin and set [[interesting]] to their union.
\nextchunklabel{c:disjoint}
<<Detecting the beginning of a token>>=
begin_comment1 := begin_comment2 := begin_quote2 := ''
every e := !comment1 do begin_comment1 ++:= cset(e[1])
every e := !comment2 do begin_comment2 ++:= cset(e[1][1])
every e := !quote2 do begin_quote2 ++:= cset(e[1])
@ The token recognition method used in procedure [[TeXify]] assumes that the
various subsets of [[interesting]] are mutually disjoint.  If this assumption
does not hold, the results are unpredictable.
<<Detecting the beginning of a token>>=
I := begin_res_word ** begin_comment1 ** begin_comment2 ** begin_quote2 ** begin_special
*I = 0 | stop("mpp: the characters in the set\n", image(I),
  "\n may begin tokens in more than one interesting category!")
interesting := begin_res_word ++ begin_comment1 ++ begin_comment2 ++
  begin_quote2 ++ begin_special
@


@
\subsection {Formatting a line}

This procedure formats [[@text]] lines in the [[noweb]] file.  Note that every
\TeX{}ified line is a ``literal'' in [[noweb]]'s sense. 
<<Language-independent procedures>>=
procedure TeXify(line, L, p0)
  local token, emb, c, i, q, qs, c_open, q_open, closing
  static c_close, q_close, TeXspecial
  initial {TeXspecial := '\\${}&#^_%~'} # The cset of characters treated specially by \TeX.
  writes("@literal ")
  while line ~== "" do
    line ? {
      if \in_comment1 then {
        <<Write unbalanced comment text>>
        }
      else if \in_comment2 then {
        <<Write balanced comment text>>
        }
      else if \in_quote then {
        <<Write quoted text>>
        }
      else {
        <<Not inside a comment or quote>>
        }
      line := tab(0) # There may be more on the line!
      }
  write()
end
@

@
\nextchunklabel{c:notin}
<<Not inside a comment or quote>>=
while writes(tab(upto(interesting))) do
  case &pos+1 of {
    # The \texttt{&pos+1} is because \texttt{any($C,s,i$)} will produce $i+1$ if $s[i]\in C$.
    any(begin_res_word) : {
      <<Reserved word>>
      }
    any(begin_special) : {
      <<Possible ``special'' token>>
      }
    any(begin_comment1) : {
      <<Possible unbalanced comment>>
      }
    any(begin_comment2) : {
      <<Possible balanced comment>>
      }
    any(begin_quote2) : {
      <<Possible quote>>
      }
    default :
      <<Internal error!>>
  }
# Now write out the (uninteresting) rest of the line:
writes(tab(0))
@

@ Well, if we got here there's something wrong in the scanning algorithm. [[p0]] is
the position in the line of the source file where the argument [[line]] of [[TeXify]]
begins.  Note: the reported column is in the Emacs sense, i.e. the first
character is in column 0.
<<Internal error!>>=
stop("\nmpp: error in procedure TeXify:\n  language = ", L, ", input line ",
  line_num, ", column ", p0+&pos-2)
@


@
\subsection {Handling the interesting tokens}
\label{sec:it}

Check for the situation where we have an ``embedded'' reserved word.
E.g. suppose [[when]] is a reserved word and any letter can occur in reserved
words.  We don't want [[when]] matched in [[so_when]].
<<Reserved word>>=
emb := any('_', &subject, &pos-1) | &null
token := tab(many(res_word_chars))
writes((/emb & \trans[token]) | token)
@


@ There are two issues here.  Suppose we want [[=]] and [[==]] to be typeset
specially, but not [[=-]].  So we put [[=]] and [[==]] in [[special]].  Now what
happens when we encounter [[=]]?  First, we have to find out if this is really the
string [[==]].  So (a) we must match the {\em longest\/} token in [[special]], in
case a special token is a prefix of another special token.  Second, we must check
that we do not have the string [[=-]], because we do not want it to appear in the
output as the translation of [[=]] followed by ``[[-]]''.

(a) is easily ensured: [[match(!special)]] will match the longest token if the
list [[special]] is arranged so that longest tokens come first, as specified in
\S\ref{sec:lang}.  (b) is a bigger pain.  We solve it as follows: in the example
given above, we \emph{do} put [[=-]] in [[special]], but \emph{don't} define a
translation for it.  So
<<Possible ``special'' token>>=
if (token := tab(match(!special)) | pos(0)) then
  writes(\trans[token] | token)
else
  writes(move(1))
@


@
\subsection {Comments and quotes}
\label{sec:candq}

In principle, comments and quotes could be handled by Icon procedures such as
[[bal()]], or the more sophisticated ones in [[procs/scan.icn]].  What precludes
this easy solution is the fact that other filters in the [[noweb]] pipeline may
\emph{break up} comments and quotes that begin and end on the same line into
multiple lines.  For example, the [[finduses]] and [[noidx]] filters are
language-independent, and so can insert spurious [[@index]] and [[@xref]] lines
\emph{in the middle} of commented or quoted text of the target language. 

This complicates greatly the handling of balanced comments, and especially of
unbalanced comments and quotes.  In fact, proper handling of unbalanced comments
forces procedures [[filter]] and [[TeXify]] to \emph{interact}, as [[TeXify]]
cannot detect the end of an unbalanced comment that has been broken up into
multiple lines. So [[filter]] and [[TeXify]] interact via the variables
[[in_comment]] and [[in_quote]] when handling comments and quotes, and it is
[[filter]] that detects the end of an unbalanced comment when it encounters a
[[@nl]] line.
@

@ If we match a token in [[comment1]], we output it and the rest of the line as
is, but in [[\rm]] font. Within a comment, characters special to \TeX\ are
active, so \verb+$x^2$+ will produce $x^2$. A problem with this is that if you
comment out the (C) line \verb+printf("Hi there!\n")+, \TeX\ will complain that
[[\n]] is an undefined control sequence.
<<Possible unbalanced comment>>=
if writes(tab(match(!comment1))) then {
  in_comment1 := "yes"
  writes("\\begcom{}" || tab(0))
  break  # We let \texttt{filter} detect the end of the comment.
  }
else
  writes(move(1))  # The character wasn't the beginning of a comment token.
@


@ If we are at this point, it is not necessarily true that we have found a
comment.  For example, in \textsl{Mathematica} comments begin with a [[(]],
which may also appear in [[x+(y+z)]].  The additional complexity comes from the
fact the we have to handle comments extending over many lines.
<<Possible balanced comment>>=
every c := !comment2 do {
  c_open := &null
  writes(c_open := tab(match(c[1]))) & c_close := c[2] & break
}
if \c_open then {
  in_comment2 := "yes";  writes("\\begcom{}")
  <<Write balanced comment text>>
  }
else
  writes(move(1)) # The character wasn't the beginning of a comment after all.
@

@ Quoted strings may extend over multiple lines for the reasons mentioned at the
beginning of \S\ref{sec:candq}.  Except for the formatting, we handle them like
balanced comments.  The possibility of escaped quotation marks inside the quoted
string makes things more difficult.
\nextchunklabel{c:quotes}
<<Possible quote>>=
every q := !quote2 do {
  writes(q_open := tab(match(q[1]))) & q_close := q[2] & break
  }
if \q_open then {
  in_quote := "yes"
  <<Write quoted text>>
  }
else
  writes(move(1)) # The character wasn't the beginning of a quoting token.
@

@
<<Write unbalanced comment text>>=
writes(tab(0))
@

@
<<Write balanced comment text>>=
if writes(tab(find(c_close))) then { # Comment ends here
  writes("\\endcom{}" || move(*c_close))
  in_comment2 := &null
  }
else # Comment doesn't close on this line
  writes(tab(0)) 
@

@ After having encountered a quote we write literally, except that we precede
every character special to \TeX\ by a backslash and follow it by an empty
group\footnote{The empty group is necessary for the characters ``\~{}'' and
``\^{}''.}.  Detecting the end of a quoted string is tricky: a [[q_close]]
character doesn't do it if it is escaped by a backslash, unless the backslash is 
itself escaped by another backslash!  Below, [[qs]] is the string between
quotes, or a piece of it (recall the beginning of \S\ref{sec:candq}).  
<<Write quoted text>>=
if qs := tab(find(q_close)) then closing := "yes" else qs := tab(0)
qs ? {
  while writes(tab(upto(TeXspecial))) do writes("\\" || move(1) || "{}")
  writes(tab(0))
}
# This took a while to get right. Is there a simpler way to express it?
if \closing then {
  if \qs[-1] then {
    if qs[-1] == "\\" then {
      if \qs[-2] then
        if qs[-2] == "\\" then <<Quote ends here>>
      }
    else <<Quote ends here>>
    }
  else <<Quote ends here>>
}
@
<<Quote ends here>>=
{in_quote := closing := &null;  writes(move(1))}  # \texttt{q\_close}
@


@
<<Utilities>>=
procedure get_words(s) # Also see \texttt{words} in the \texttt{strings} library.
  static it
  local i, L
  initial{it := &cset -- ' \t,'} # words are separated by blanks, tabs, or commas
  L := []
  s ? while tab(upto(it)) do {i := tab(many(it)); put(L,i)}
  return L
end
@

@ Sort a list of strings (or other things with size) by the length of its
elements, longest first.
<<Utilities>>=
procedure sort_by_length(L)
  local L1, L2, s, T
  T := table()
  every s := !L do T[s] := -(*s)
  L2 := sort(T,4)
  L1 := []
  while put(L1, get(L2)) do get(L2)
  return L1
end
@

@
<<Utilities>>=
procedure print_language_spec(L,more)
  local n, tt, s
  s := \known_langs[L] | stop("mpp: `", L, "' is unknown!")
  write("res_word_chars: ", fullimage(s.res_word_chars))
  write("comment1: ", fullimage(s.comment1))
  write("comment2: ", fullimage(s.comment2))
  write("quote2: ", fullimage(s.quote2))
  write("special: ", fullimage(s.special))
  if \more then {
    write("begin_quote2: ", fullimage(s.begin_quote2))
  }
  if \more > 1 then {
    tt := sort(s.trans,1)
    every write(fullimage(!tt))
    }
end
@


@
\section {Unresolved issues}
\label{sec:todo}
\begin{enumerate}
  \item Find a good way to handle indexing and cross-referencing when there are
    many languages.
  \item There is a niggling unresolved issue, exemplified by Icon.  [[mpp]]
    translates the symbol ``\&''as ``$\land$'', even though ``\&'' is {\em
      not\/} in Icon's [[special]].  This happens because ``\&'' is in Icon's
    [[res_word_chars]], and a translation for it is defined in
    [[known_langs[icon].trans]].  So when [[TeXify]] encounters it, it
    recognizes it as an Icon reserved word, and uses the translation defined for
    it.  Now if this translation is not wanted, remove ``\&'' from
    [[known_langs[icon].trans]] and don't bother me any more.  However, if this
    translation is ok, we have an inconsistency, in that ``\&'' is not in
    [[special]].
    
    While this is not a real problem, achieving consistency (which may be needed
    in a more general case) is not so easy.  If we add ``\&'' to [[special]],
    the check in \chunkref{c:disjoint} will fail.  To fix this, we could
    \begin {enumerate}
      \item Add a constraint to the recognition of a reserved word: it has to be
        a token of length $>1$.
      \item Revise the [[case]] structure in \chunkref{c:notin}, as it will no
        longer work.
      \end {enumerate}
      We could also consider having a separate translation table for special
      tokens.
\end{enumerate}


@


@
\appendix
\section {Index}
\nowebindex

\end {document}