FASTR Version 2.04/04-02-01
 *
 * FASt Term Recognizer
 *
 *  fastr/README
 *  Version 2.04/04-02-01
 *
 *  Copyright (C) 2004  Christian Jacquemin, LIMSI-CNRS
 *  BP 133, 91403 ORSAY, FRANCE 
 *  tel +33 (0)1 69 85 80 22 / fax -- 80 88
 *  http://www.limsi.fr/Individu/jacquemi/
 *
 *  This file contains a general presentaion of Fastr
 *

Overview
********

Fastr is a parser for term and variant recognition. 
Fastr take as input a corpus and a list of terms and 
ouputs the indexed corpus in which terms and variants are
recognized.

Fastr can be used in two modes:
 o controlled indexing: input consists of a corpus and
   a list of terms,
 o free indexing: input only consists of a corpus, the list
   of terms is automatically acquired from the corpus.

Fastr uses the following resources:
 o the corpus and the list of terms are tagged by the TreeTagger:
   http://www.ims.uni-stuttgart.de/Tools/DecisionTreeTagger.html
 o if available, a list of morphological families and a list
   of semantic links are used to calculate morphological and
   semantic variation.

The formalism of Fastr is close to PATR-II.

Pointers to the User Manual
***************************

The Fastr distribution includes its own manual page. 
The manual page can be viewed by saying "nroff -man Fastr.1 | less",
"nroff -man Fastrconf.1 | less", "nroff -man Fastrlang.1 | less", 
or "nroff -man Fastrdata.1 | less".

See the online publications at 
http://www.limsi.fr/Individu/jacquemi/
for more information and examples.

Installation
************

Fastr is currently available for the following languages:
 o French,
 o English,
and for the following operating systems:
 o Linux,
 o Solaris.

To install Fastr you MUST do the following: 

 1. get the distribution of Fastr (XXXX is the name of the 
    operating system [linux|sunos|solaris]):
    fastr-XXXX.tar.gz

 2. get the linguistic resources and configuration file 
    corresponding to the desired language(s) (XX is the name
    of the language [en|fr]):
    fastr-language-XX.tar.gz

 3. excute the install-Fastr script.

 4. install the TreeTagger - or use another tagger and adapt
    the file for tag transcription lib/TAGS-TreeTagger-XX (XX is 
    the name of the language [en|fr]).

 5. test your installation for free or controlled indexing
    through the following commands (XX is the name of the 
    language [en|fr]):
    Fastr-controlled-indexing-XX text-XX.txt terms-XX.txt
    Fastr-free-indexing-XX corpus-XX.txt

Improvement
***********

You can enrich your system by adding linguistic resources:
morphological and semantic families. Scripts are provided for
creating the following resources:

    type of links | language | source database | script
    --------------+----------+-----------------+------------------
    semantic      | English  | WordNet 1.6     | WordNetPreProc.sh
    --------------+----------+-----------------+------------------
    morphological | English  | CELEX           | CelexPreProc.sh
    --------------+----------+-----------------+------------------
    semantic      | French   | Microsoft Word97| WordNetPreProc.sh
    --------------+----------+-----------------+------------------

Files
*****

# configuration file for environment variables
/etc/fastr.conf

# linguistic configuration files
# default file
/etc/fastr.conf-empty
${FASTREMPTY}
# files for French and English
# selected by the script depending on the processed language
# can be given as -C argument to fastr
/etc/fastr.conf-en
/etc/fastr.conf-fr

# morphological and semantic link files for French and English
# also defined in scripts
/usr/share/fastr/data/lib/der-families-fr 
${FASTR}/data/lib/der-families-fr 
/usr/share/fastr/data/lib/der-families-en
${FASTR}/data/lib/der-families-en
/usr/share/fastr/data/lib/sem-links-fr 
${FASTR}/data/lib/sem-links-fr
/usr/share/fastr/data/lib/sem-classes-en
${FASTR}/data/lib/sem-classes-en

# tag conversion files from tagger tags to fastr tags
# files for French and English
# selected by the script depending on the processed language
/usr/share/fastr/data/lib/TAGS-TreeTagger-fr
${FASTR}/data/lib/TAGS-TreeTagger-fr
/usr/share/fastr/data/lib/TAGS-TreeTagger-en
${FASTR}/data/lib/TAGS-TreeTagger-en
/usr/share/fastr/data/lib/TAGS-Cordial-fr
${FASTR}/data/lib/TAGS-Cordial-fr

# linguistic data files (mainly metarules) for French and English
# selected by the script depending on the processed language
/usr/share/fastr/data/lib/fastr.lang-en
${FASTR}/data/lib/fastr.lang-en
/usr/share/fastr/data/lib/fastr.lang-fr
${FASTR}/data/lib/fastr.lang-fr
# an empty language file (without metarules) to use fastr without lexicon
/usr/share/fastr/data/lib/fastr.lang-empty
${FASTR}/data/lib/fastr.lang-empty

# fastr resource file (message error strings mainly)
/usr/share/fastr/data/lib/fastr.res
${FASTR}/data/lib/fastr.res

# man files
/usr/share/man/man1/fastrconf.1
/usr/share/man/man1/fastrdata.1
/usr/share/man/man1/fastr.1
/usr/share/man/man1/fastrlang.1

# scripts
/usr/bin/CelexPreProc
/usr/bin/CelexPreProc.prl
/usr/bin/CelexPreProc-new.prl
/usr/bin/fastr-free-indexing-en
/usr/bin/fastr-controlled-indexing-en
/usr/bin/TermerforFastr.prl
/usr/bin/TermtoRules.prl
/usr/bin/TreeTaggertoFastr.prl
/usr/bin/TreeTaggertoTerms.prl
/usr/bin/WordNetPreProc
/usr/bin/WordNetPreProc.prl
/usr/bin/WordtoFamilies.prl
/usr/bin/fastr-controlled-indexing-fr
/usr/bin/fastr-free-indexing-fr
${FASTRBIN}/CelexPreProc
${FASTRBIN}/CelexPreProc.prl
${FASTRBIN}/CelexPreProc-new.prl
${FASTRBIN}/fastr-free-indexing-en
${FASTRBIN}/fastr-controlled-indexing-en
${FASTRBIN}/TermerforFastr.prl
${FASTRBIN}/TermtoRules.prl
${FASTRBIN}/TreeTaggertoFastr.prl
${FASTRBIN}/TreeTaggertoTerms.prl
${FASTRBIN}/WordNetPreProc
${FASTRBIN}/WordNetPreProc.prl
${FASTRBIN}/WordtoFamilies.prl
${FASTRBIN}/fastr-controlled-indexing-fr
${FASTRBIN}/fastr-free-indexing-fr

# binary
/usr/bin/fastr
${FASTRBIN}/fastr

# sample text for free indexing
/usr/share/fastr/data/corpus-en.txt
${FASTR}/data/corpus-en.txt
/usr/share/fastr/data/corpus-fr.txt
${FASTR}/data/corpus-fr.txt
# sample text and associated term files
# for controlled indexing
/usr/share/fastr/data/text-en.txt
${FASTR}/data/text-en.txt
/usr/share/fastr/data/terms-en.txt
${FASTR}/data/terms-en.txt
/usr/share/fastr/data/text-fr.txt
${FASTR}/data/text-fr.txt
/usr/share/fastr/data/terms-fr.txt
${FASTR}/data/terms-fr.txt

# documentation - sample external unifier file
/usr/share/doc/fastr-2.04/test-external-unif.c
/usr/share/doc/fastr-2.04/fastr.unif
# documentation - scientific papers
/usr/share/doc/fastr-2.04/en/jacqklavtzou-ACL97.ps
/usr/share/doc/fastr-2.04/en/jacquemin-ACL99.ps
/usr/share/doc/fastr-2.04/en/MorphoFastr.ps

Acknowledgement
***************
I am very grateful to all the people who supported the development of
fastr, particularly Jean Royaut and Xavier Polanco from INIST,
Batrice Daille and Emmanuel Morin from IRIN, Evelyne Tzoukermann from
Bell Labs and Judith Klavans from Columbia University, Kyo Kageura and
Fuyuki Yoshikane from National Institute of Informatics, Jorge Vivaldi
from Pompeu Fabra. I also thank very much Guillaume Rousse from INRIA
for his help in packaging fastr for the Mandrake distribution.

More Information
****************

Author:
  Christian Jacquemin
  LIMSI-CNRS
  BP 133, 91403 ORSAY, FRANCE 
  URL: http://www.limsi.fr/Individu/jacquemi/


