$Id: INTERNALS,v 1.1 1997/03/26 01:29:39 dps Exp $ Here is how the program works. reader.cc (1.10) read_character reads characters from a word document suitably translated, including dsitingishing between multiple and single ^Gs, etc. The output is fetched by chunk_reader::read_chunk_raw that assembles it into bits ignoring inclusions. chunk_Reader::read_chunk gets these chunks are parcels them out with inclusion seperated out. tok_seq::rd_token adds start and end tags for rows, fields, paragraphs and all the rest storing the tokens in a table on a seperate queue before transfering them all onto the main queue. tok_seq::rd_token also keeps track of the size and detects the probable end of the table. tok_seq::feed_token takes a token off the queue and requests a refill at the appropiate time. At the end of the document it tests a flag and if the flag is not set then adds a document end entry (and then feeds it to the caller). OK, so far? Now the fun begins! If you look at the outptut now you see horrofic stuff like 550 *eq \F(foom bar)= 42 so the input is further processed by tok_seq::math_collect(). math_collect() uses saved_tok as a one byte push back mechamism and will use this token before asking feed_token() for one. Non-paragraphs and non-equations go straight thorugh. When math_collect sees a paragaph is pears at the next item. If this is not an equation it just forwards the token and stashes the item it got in saved_token (saved_token is definately free: either it was used or feed_token supplied something). If it sees an euqation it calls math_reverse_scan to work out whether there is any equation in the string (guesswork but works quite nicely). If math_reverse_scan decides it is all real text the token is just forwarded (with the extra token still stashed in saved_tok). Assuming math_reverse_scan found something to move that material is moved into the equation and ntok and the current token modified. saved_token still pointds to ntok so we use the same structure but new strings. The reduced paragraoh token is returned. ----- When the code sees an equation special (quite possibly saved_tok from the paragraph process above) it ask feed_token() for the next two tokens. The next token is the end token for the special and the one after that interesting, and will be called T (the token itself is *ntok in the code). If T is an equation the end spec token is junked and the two equations joined. One of the equations is then junked. The end special is pushed onto the start of the outpiut for feed_token to find there; saved_tok is pointed to the expanded equations. The code then returns to the original read a token state so further aggregation can take place. If T is a paragraph then the code uses math_forward_scan to see how much of that is consumed as part of the equation. If none then the end special and paragraph tokens are pushed onto the front of the output queue and saved_tok invalided. The code is then returns the current (equation special) token. The end special passes straight through and then the accumulaion can begin again. If T (a paragraph) is partial consumed the current equation and it is adjusted and the same processing as if the paragraph had no formula contents. If T (a paragraph) entirely consumed its contents are added onto to the text, the paragraph junked, the end spec pushed pack. saved_tok is pointed to the current, expanded equations. The code then returns to the original read a token state so further aggregation can take place. The output now contians nice stuff like 550 * \F(foo,bar) = 29 and even horrors that word veiwer renders as displayed equations like 550 * \F(foo,bar) = 29. This output is requested by tok_seq::read_token() which is the public method. It is not devoid of tricks however. Anything other than the start of a paragragh passes straight through. When it sees a paragraph it pushes it onto a seperate queue and acculumates totals of characters and specials in it sees. The loop exits when any of the following applies: The paragaraph character total exceeds then (small, currently 3) treshold. The end of the paragraph is spotted. A non-special, non-pargraph, non-other character is seen (if this happen we add the treshold ot the count to be sure o ebing >= to it. On exit from the loop if the total is less than the critical value the queue is reveresed and inserted at the front of the output queue minus the paragraph items. Since the tokens are inserted as the first character of the ouput they appear in reverse order of insertion (hence the reverse makes the elements appear it the original order on the output queue). This deletes that extraneous and wrong full stop, for example. Otherwise the queue is the elements are transfered to the front of the output queue in the existing order (this actually just sets a couple of pointers). Either way the temporary queue is now empty and is deleted. The first item dequeued is returned. (This is what rtest2 shows you). Futurue development will include processing to stop lists and stuff like that.... as you now know everything is very simple and plain. OH, yes and the *TeX output format includes plently of context queue use too... There is also a bit in the ascii output. Overall this tends towards my idea of a complex AI program using context queues to do the right stuff about what word throws at it!! I hope this is now 100% clear.