These notes will help people wanting to use ISO 8859-1 stuff in WAIS. We have only done this on Sun's running SunOS 4.1.3. To run it successfully under SunOS 4.1.1, one will need patches from Sun to correct problems with COLLDEF(8). 1- Install the collation table. You might already have one, check /etc/locale/LC_COLLATE to see what's in it. You probably have only C and default in there. Due to a bug in the SunOS 4.1.3, it is useless to have the thing in /usr/share/lib/locale/LC_COLLATE since it's ignored. (This caused us much grief.) We decided to use external definitions of the 8859-1 table which we also put in /etc/locale/LC_COLLATE; we called this file iso_charmap but you can use anything as long as the same name appears in the input to COLLDEF(8). As root, /usr/etc/colldef setlocale(LC_CTYPE,"iso_8859_1"); setlocale(LC_COLLATE,"iso_8859_1"); in the source, THIS DOES NOT SUFFICE! One must also have LC_CTYPE=iso_8859_1 LC_COLLATE=iso_8859_1 in the run-time environment. This also caused us much grief. This problem might be specific to Suns under 4.1.3 Note that Sun has put in the right stuff for LC_CTYPE which they have not done for LC_COLLATE. One might also want to add to the stoplist. To more or less quote from the docs, when one generates an index, the -stop stop_list_file_name option permits the use of another list. We've called our (French ISO 8859) stoplist stop.uqam and put it in here. You might want to add the setlocale stuff to xwais if you're using it. In ~/x/xwais.c and ~/x/xwaisq.c, add #include setlocale(LC_CTYPE,"iso_8859_1"); setlocale(LC_COLLATE,"iso_8859_1"); 3- Gopher stuff. If WAIS is called by Gopher, gopherd must be changed for all this to work. The same 3 lines must be added to ~/gopherd/gopherd.c, ie #include setlocale(LC_CTYPE,"iso_8859_1"); setlocale(LC_COLLATE,"iso_8859_1"); and the run-time environment must contain LC_CTYPE=iso_8859_1 LC_COLLATE=iso_8859_1 for the same reasons as in WAIS. Good luck! Sylvie St-Georges st-georges.sylvie@uqam.ca Our iso_charmap file -------------------------------------------------- A-grave \xc0 A-acute \xc1 A-circu \xc2 A-tilde \xc3 A-diaer \xc4 A-ring \xc5 AE \xc6 C-cedil \xc7 E-grave \xc8 E-acute \xc9 E-circu \xca E-diaer \xcb I-grave \xcc I-acute \xcd I-circu \xce I-diaer \xcf ETH \xd0 N-tilde \xd1 O-grave \xd2 O-acute \xd3 O-circu \xd4 O-tilde \xd5 O-diaer \xd6 MULT \xd7 O-stroke \xd8 U-grave \xd9 U-acute \xda U-circu \xdb U-diaer \xdc Y-acute \xdd THORN \xde s-sharp \xdf a-grave \xe0 a-acute \xe1 a-circu \xe2 a-tilde \xe3 a-diaer \xe4 a-ring \xe5 ae \xe6 c-cedil \xe7 e-grave \xe8 e-acute \xe9 e-circu \xea e-diaer \xeb i-grave \xec i-acute \xed i-circu \xee i-diaer \xef eth \xf0 n-tilde \xf1 o-grave \xf2 o-acute \xf3 o-circu \xf4 o-tilde \xf5 o-diaer \xf6 DIVIS \xf7 o-stroke \xf8 u-grave \xf9 u-acute \xfa u-circu \xfb u-diaer \xfc y-acute \xfd thorn \xfe y-diaer \xff -------------------------------------------------- Our input file to colldef -------------------------------------------------- charmap /etc/locale/LC_COLLATE/iso_charmap substitute "\xc6" with "AE" substitute "\xdf" with "ss" substitute "\xe6" with "ae" order \x00;...;\x20;\x21;\x22;\x23;\x24;\x25;\x26;\x27;\x28;\x29;\ \x2A;\x2B;\x2C;\x2D;\x2E;\x2F;0;1;2;3;4;5;6;7;8;9;\ \x3A;\x3B;\x3C;\x3D;\x3E;\x3F;\x40;\ \x5B;\x5C;\x5D;\x5E;\x5F;\x60;\x7B;\x7C;\x7D;\x7E;\x7F;\ (A,,,,,,,\ a,,,,,,);\ (B,b);(C,,c,);(D,d);\ (E,,,,,\ e,,,,);\ (F,f);(G,g);(H,h);\ (I,,,,,\ i,,,,);\ (J,j);(K,k);(L,l);(M,m);(N,,n,);\ (O,,,,,,,\ o,,,,,,);\ (P,p);(Q,q);(R,r);(S,s);(T,t);\ (U,,,,,\ u,,,,);\ (V,v);(W,w);(X,x);(Y,,y,,);(Z,z) -------------------------------------------------- -------------------------------------------------------------------------- Jean-Pierre Kuypers writes the following: About the credits, we'll not forget the works of Sylvie St-Georges (Bonjour, Sylvie) for a ISO 8859-1'able version of WAIS. Pascal Maes uses it to make his proposal. The work is available on ftp.uqam.ca:/pub/WAIS/, where the files alire (French) and README.iso8859 (English) explain what and how to do. I try to install the new jughead version, on a SunOS 4.1.1. After some troubles, as usual, I have now a jughead server able to search non-ASCII items. After uncommenting the CFLAG in the Makefile and making/installing, I must do a lot of things before it works. - I run the "colldef" command with the two Sylvie's files. But I must delete all the "-", "<", and ">" is these files to avoid "Syntax error" messages with colldef. - I must set the LC_COLLATE environment variable (I had already the LC_CTYPE) to iso_8859_1. - I must do a link /etc/locale -> /usr/share/lib/locale. So, I don't put the stuff in /etc/locale/LC_COLLATE/ (as writed by Sylvie), but in /usr/share/lib/locale/LC_COLLATE/. - I must do "jughead -tB" to rebuild the correct index table. It's mandatory! After that, I have a +/- good jughead server. Curiously, e' (e-acute) and e` (e-grave) are seen as t and l. So, re'seau and re`gles match as rtseau and rlgles. But they match correclty! To search "re'seau", I may give "re'seau" or "rtseau". With other letters, as a`, u`, e^, i^, o^, there is no problem. But E^ doesn't work.