File:  [Repository] / texttool-architecture / soft-cgi.tex
Revision 1.16: download - view: text, annotated - select for diffs - revision graph
Wed Mar 17 09:31:01 2004 UTC (20 years, 3 months ago) by bcfuchs
Branches: MAIN
CVS tags: HEAD
added stuff about POD format and Modules

    1: \subsubsection{rec.cgi (register text)}
    2: \label{sec:rec.cgi}
    3: 
    4: \paragraph
    5: On the ECHO server, the registration of new texts is implemented by
    6: means of a cgi script, reg.cgi
    7: (archimedes/web/cgi-bin/toc/admin/reg.cgi ). reg.cgi retrieves a
    8: metadata file  in MPIWG archive metadata format from the entered uri
    9: (currently only local paths are supported ) and constructs from this
   10: file a toc.cgi object file (see below) , which it writes to toc.cgi's
   11: data section. [corpus???] It should be stressed that this is a
   12: registration procedure developed for a particular implementation of
   13: toc.cgi and not a part of the core application. 
   14: 
   15: \paragraph
   16: reg.cgi takes two parameters, path and show.  Path should give the
   17: local path to the metadata file for the text that is being
   18: registered. If ``show'' is set to 1, reg.cgi will return for
   19: inspection the toc.cgi object file that it has built out of the
   20: submitted metadata file. 
   21: 
   22: \paragraph{input metadata file}
   23: 
   24: The input metadata file must have the following form
   25: 
   26: \paragraph
   27: 
   28: <resource>
   29:     ...
   30:     <meta>
   31:       <meta>
   32:                 <bib type=''Book''>
   33: 
   34: <title>Mainzer Untergerichtsordnung (von 1534)</title>
   35: <author>anon</author>
   36: <year>1580</year>
   37:         <texttool><display>yes</display>
   38: 	<image>pageimgtif</image>
   39: 	<text>/mpiwg/online/experimental/echo_DRQEdit_test/anon_Mainz_1580/fulltextDW/mainzugo02_utf8.xml</text>
   40: 	<pagebreak>pb</pagebreak><presentation>01-presentation/info.xml</presentation></texttool></meta>
   41: 
   42:     </meta>
   43: 
   44: \paragraph{archimedes object registration}
   45: 
   46: \subsubsection{toc.cgi (display text)}
   47: \label{sec:toc.cgi}
   48: 
   49: \paragraph{plan of this section }
   50: 
   51: \begin{enumeration}
   52: \item An overview of toc.cgi architecture
   53: \item A walk-through of typical cgi queries for toc.cgi
   54: \item An index of cgi parameters and values with short descriptions of function
   55: \item The TOC Perl modules
   56: \end{enumeration}
   57: 
   58: \paragraph{Overview of toc.cgi architecture}
   59: 
   60: \subparagraph{}
   61: toc.cgi is a perl script for displaying collections of xml texts and 
   62: linking them to related resources such as page-images, morphological
   63: analysis, commentaries, dictionaries, etc. It implements generic methods
   64: for resource-linking provided by a series of perl modules which are in
   65: turn based mainly on generic open-source tools for xml manipulation and networking
   66: written in C. 
   67: 
   68: \subparagraph{toc.cgi collections--Network transparency}
   69: Each of the collections in toc.cgi is a ``virtual'' collection, that
   70: is, a collection of links or uri's to resources that reside somewhere on an accessible
   71: network, local or remote.  
   72: 
   73: \subparagraph{toc.cgi collections--remote resources}
   74: 
   75: What is at the other end of the link is of no concern to toc.cgi, as
   76: long as the resource referenced by the link meets minimal toc.cgi
   77: requirements--how the resource is actually implemented and exposed is
   78: a matter for the resource provider. The link may, for instance, point
   79: directly to an xml text or it may point to a container which exposes a
   80: particular xml view of an underlying resource that is perhaps not in
   81: xml format at all. 
   82: 
   83: 
   84: \subparagraph{resource registry}
   85: 
   86: 
   87: 
   88: 
   89: \paragraph{cgi parameters -- standard queries}
   90: 
   91: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }
   92: \newline
   93: \newline
   94: get a listing of corpora
   95: 
   96: 
   97: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }
   98: \newline
   99: \newline
  100: get an xml listing of corpora 
  101: 
  102: 
  103: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }
  104: \newline
  105: \newline
  106: get a listing of works in default corpus
  107: 
  108: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }
  109: \newline
  110: \newline
  111: get a listing of works in corpus 1 [default corpus = 0]
  112: 
  113: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }
  114: \newline
  115: \newline
  116: get an xml listing of works in default corpus 
  117: 
  118: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }
  119: \newline
  120: \newline
  121: get an xml listing of works in corpus 1
  122: 
  123: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }
  124: \newline
  125: \newline
  126: get a work from default corpus with thumbnail navbar displayed left
  127: 
  128: 
  129: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }
  130: \newline
  131: \newline
  132: get a work from default corpus with thumbnail navbar displayed right
  133: 
  134: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }
  135: \newline
  136: \newline
  137: get a page of text from a work from default corpus 
  138: 
  139: 
  140: \paragraph{TOC Perl Modules}
  141: \subparagraph{general}The documentation for the Toc Perl Modules is
  142: located in the modules themselves in POD format. The POD is the
  143: definitive documentation for the modules. 
  144: 
  145: The modules are available to archimedes staff from cvs on the archimedes server at
  146: 141.14.236.86:/perseus/cvsroot in the module
  147: /perseus/cvsroot/mpitexts/perl/perllib. To get them, log on to the
  148: archimedes server and use the commandline command: 
  149: 
  150:         cvs -d /perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib
  151: 
  152: or from a remote location
  153: 
  154:       bash; export CVS_RSH=ssh; cvs -d :ext:myusername@141.14.236.86:/perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib
  155: 
  156: \subsubsection{Indexing}
  157: \label{sec:indexing}
  158: 
  159: \paragraph{Status quo ECHO}
  160: Currently indexing is not implemented on the ECHO server.
  161: 
  162: \paragraph{Plan ECHO}
  163: 
  164: \begin{enumeration}
  165: \item construct remote (141.14.236.86) index for each file at
  166:   per-change or daily intervals
  167: \item store indices locally in
  168: archimedes/data/db/PROJECT_NAME/CORPUS_NAME/WORK
  169: \item 2 progs on server 1. cgi: indexer 2. backend da_remote
  170: \item 2 progs on client 1. cgi: sendindex 2. backend getindex
  171: \item indexing transaction handled by two cgi scripts, one on the
  172:   server the other on the client [this is the 1st implementation bcs
  173:   its easiest and there are no port issues, but probably it'd be
  174:   better to have a separate port]. 
  175: \item client cgi: getindex -- sends 1.  list of files to index
  176:   2. uri to which xml notification of completion is to be sent. Upon
  177:   notification, activates backend prog that fetches and installs the
  178:   indices.  
  179: \item server cgi: indexer receives filelist and notification
  180:   addess. Activates backend that fetches files, indexes, places
  181:   completed indexes in a networked location, then sends xml
  182:   notification back to client. 
  183: \item single script provides backend access to indices 
  184: \item leave front-end issues like display, collection and navigation
  185:   to web-design programmers. Do only a  sample for now. 
  186: \end{enumeration}
  187: 
  188: \subsubsection{Morphology}
  189: \label{sec:morphology}
  190: 
  191: 
  192: \subsubsection{Dictionary server}
  193: \label{sec:dictionary-server}
  194: 
  195: 
  196: \subsubsection{helper programs}
  197: 
  198: \paragraph{addarch.pl ARCHIMEDES} 
  199: 
  200: Automatically registers new texts as toc.cgi objects when they appear in
  201: cvs. Automatically updates relevant morphological indices (slow!) each
  202: time a cvs update occurs. This program is called by a hook in the cvs
  203: ``loginfo'' configuration file. 
  204: 
  205: 
  206: \paragraph{makelemma.pl ARCHIMEDES}
  207: 
  208: Updates lemmatization indices. 
  209: Parameters: 
  210: No parameter--update all lemmatization indices
  211: [latin | ital | greek | en | nl | de]--  update this language
  212: 
  213: \paragraph{makefast.pl ARCHIMEDES} 
  214: 
  215: Updates the toc.cgi morphology indices
  216: Parameters
  217: No parameter--update all lemmatization indices
  218: [latin | ital | greek | en | nl | de]--  update this language
  219: 
  220: \subsubsection{summary of differences btwn the archimedes toc.cgi
  221:   implementation and the echo toc.cgi impelementation (toc.x.cgi)}
  222: 
  223: \paragraph{missing in archimedes}
  224: \begin{enumeration}
  225: 
  226: \item html templates (coded but phased out of cvs branch)
  227: \end{enumeration}
  228: 
  229: \paragraph{missing in echo}
  230: \begin{enumeration}
  231: 
  232: \item word-coloring?
  233: \item remote text method may work differently
  234: 
  235: 
  236: 
  237: \end{enumeration}
  238: \paragraph{differences}
  239: \begin{enumeration}
  240: \item structure of info.xml
  241: \item resource-discovery algorithm for info.xml
  242: \end{enumeration}
  243: 
  244: 
  245: 
  246: %%% Local Variables: 
  247: %%% mode: latex
  248: %%% TeX-master: "texttools"
  249: %%% End: 
  250: 

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>