texttool-architecture/soft-search.tex - diff

Return to soft-search.tex CVS log

Up to [Repository] / texttool-architecture

Diff for /texttool-architecture/soft-search.tex between versions 1.1 and 1.2

-version 1.1, 2004/06/01 12:00:21
+version 1.2, 2004/06/01 12:14:48
  Line 1
- \subsubsection{rec.cgi (register text)}
+ \subsubsection{q1 (corpus-wide search)}
- \label{sec:rec.cgi}
+ \label{q1}
  \paragraph
- On the ECHO server, the registration of new texts is implemented by
+ This section describes the software associated with the ECHO
- means of a cgi script, reg.cgi
+ lemmatized text search
- (archimedes/web/cgi-bin/toc/admin/reg.cgi ). reg.cgi retrieves a
+ \url{http://echo.mpiwg-berlin.mpg.de/ECHOVIEW/ECHO_view.css}
- metadata file  in MPIWG archive metadata format from the entered uri
- (currently only local paths are supported ) and constructs from this
- file a toc.cgi object file (see below) , which it writes to toc.cgi's
- data section. [corpus???] It should be stressed that this is a
- registration procedure developed for a particular implementation of
- toc.cgi and not a part of the core application.
- \paragraph
- reg.cgi takes two parameters, path and show.  Path should give the
- local path to the metadata file for the text that is being
- registered. If ``show'' is set to 1, reg.cgi will return for
- inspection the toc.cgi object file that it has built out of the
- submitted metadata file.
- \paragraph{input metadata file}
- The input metadata file must have the following form
- \paragraph
- \begin{verbatim}
- <resource>
-     ...
-     <meta>
-       <meta>
-                 <bib type=''Book''>
- <title>Mainzer Untergerichtsordnung (von 1534)</title>
- <author>anon</author>
- <year>1580</year>
-         <texttool><display>yes</display>
-     <image>pageimgtif</image>
-     <text>/mpiwg/online/experimental/echo_DRQEdit_test/anon_Mainz_1580/fulltextDW/mainzugo02_utf8.xml</text>
-     <pagebreak>pb</pagebreak><presentation>01-presentation/info.xml</presentation></texttool></meta>
-     </meta>
- \end{verbatim}
- \paragraph{archimedes object registration}
- \subsubsection{toc.cgi (display text)}
- \label{sec:toc.cgi}
- \paragraph{plan of this section }
- \begin{enumerate}
- \item An overview of toc.cgi architecture
- \item A walk-through of typical cgi queries for toc.cgi
- \item An index of cgi parameters and values with short descriptions of function
- \end{enumerate}
- \paragraph{Overview of toc.cgi architecture}
- \subparagraph{}
- toc.cgi is a perl script for displaying collections of xml texts and
- linking them to related resources such as page-images, morphological
- analysis, commentaries, dictionaries, etc. It implements generic methods
- for resource-linking provided by a series of perl modules which are in
- turn based mainly on generic open-source tools for xml manipulation and networking
- written in C.
- \subparagraph{toc.cgi collections--Network transparency}
- Each of the collections in toc.cgi is a ``virtual'' collection, that
- is, a collection of links or uri's to resources that reside somewhere on an accessible
- network, local or remote.
- \subparagraph{toc.cgi collections--remote resources}
- What is at the other end of the link is of no concern to toc.cgi, as
- long as the resource referenced by the link meets minimal toc.cgi
- requirements--how the resource is actually implemented and exposed is
- a matter for the resource provider. The link may, for instance, point
- directly to an xml text or it may point to a container which exposes a
- particular xml view of an underlying resource that is perhaps not in
- xml format at all.
- \subparagraph{resource registry}
- \paragraph{cgi parameters -- standard queries}
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }
- \newline
- \newline
- get a listing of corpora
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }
- \newline
- \newline
- get an xml listing of corpora
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }
- \newline
- \newline
- get a listing of works in default corpus
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }
- \newline
- \newline
- get a listing of works in corpus 1 [default corpus = 0]
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }
- \newline
- \newline
- get an xml listing of works in default corpus
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }
- \newline
- \newline
- get an xml listing of works in corpus 1
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }
- \newline
- \newline
- get a work from default corpus with thumbnail navbar displayed left
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }
- \newline
- \newline
- get a work from default corpus with thumbnail navbar displayed right
- \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }
- \newline
- \newline
- get a page of text from a work from default corpus
- \subsubsection{Indexing}
- \label{sec:indexing}
- \paragraph{Status quo ECHO}
- Currently indexing is not implemented on the ECHO server.
- \paragraph{Plan ECHO}
- \begin{enumerate}
- \item construct remote (141.14.236.86) index for each file at
-   per-change or daily intervals
- \item store indices locally in
- archimedes/data/db/PROJECT_NAME/CORPUS_NAME/WORK
- \item 2 progs on server 1. cgi: indexer 2. backend da_remote
- \item 2 progs on client 1. cgi: sendindex 2. backend getindex
- \item indexing transaction handled by two cgi scripts, one on the
-   server the other on the client [this is the 1st implementation bcs
-   its easiest and there are no port issues, but probably it'd be
-   better to have a separate port].
- \item client cgi: getindex -- sends 1.  list of files to index
-. uri to which xml notification of completion is to be sent. Upon
-   notification, activates backend prog that fetches and installs the
-   indices.
- \item server cgi: indexer receives filelist and notification
-   addess. Activates backend that fetches files, indexes, places
-   completed indexes in a networked location, then sends xml
-   notification back to client.
- \item single script provides backend access to indices
- \item leave front-end issues like display, collection and navigation
-   to web-design programmers. Do only a  sample for now.
- \end{enumerate}
- \subsubsection{Morphology}
- \label{sec:morphology}
- \subsubsection{Dictionary server}
- \label{sec:dictionary-server}
- \subsubsection{helper programs}
- \paragraph{addarch.pl ARCHIMEDES}
- Automatically registers new texts as toc.cgi objects when they appear in
- cvs. Automatically updates relevant morphological indices (slow!) each
- time a cvs update occurs. This program is called by a hook in the cvs
- ``loginfo'' configuration file.
- \paragraph{makelemma.pl ARCHIMEDES}
- Updates lemmatization indices.
- Parameters:
- No parameter--update all lemmatization indices
- [latin | ital | greek | en | nl | de]--  update this language
- \paragraph{makefast.pl ARCHIMEDES}
- Updates the toc.cgi morphology indices
- Parameters
- No parameter--update all lemmatization indices
- [latin | ital | greek | en | nl | de]--  update this language
- \subsubsection{summary of differences btwn the archimedes toc.cgi
-   implementation and the echo toc.cgi impelementation (toc.x.cgi)}
- \paragraph{missing in archimedes}
  \begin{enumerate}
+ \item xml-rpc interface to 141.14.236.86 and implementations (archimedes/bin/getindex
- \item html templates (coded but phased out of cvs branch)
+   and archimedes/bin/make_indices)
- \end{enumerate}
+ \item search module archimedes/code/IncPerl/Archim/Toc/Search.pm and
+   implementation ( archimedes/web/cgi-bin/search/q1 )
- \paragraph{missing in echo}
- \begin{enumerate}
- \item word-coloring?
- \item remote text method may work differently
- \end{enumerate}
- \paragraph{differences}
- \begin{enumerate}
- \item structure of info.xml
- \item resource-discovery algorithm for info.xml
  \end{enumerate}
+ \paragraph

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>

Removed from v.1.1
changed lines
	Added in v.1.2