Annotation of texttool-architecture/soft-cgi.tex, revision 1.16

1.1       casties     1: \subsubsection{rec.cgi (register text)}
                      2: \label{sec:rec.cgi}
                      3: 
1.4       bcfuchs     4: \paragraph
1.7       bcfuchs     5: On the ECHO server, the registration of new texts is implemented by
1.5       bcfuchs     6: means of a cgi script, reg.cgi
                      7: (archimedes/web/cgi-bin/toc/admin/reg.cgi ). reg.cgi retrieves a
                      8: metadata file  in MPIWG archive metadata format from the entered uri
                      9: (currently only local paths are supported ) and constructs from this
                     10: file a toc.cgi object file (see below) , which it writes to toc.cgi's
                     11: data section. [corpus???] It should be stressed that this is a
                     12: registration procedure developed for a particular implementation of
                     13: toc.cgi and not a part of the core application. 
1.4       bcfuchs    14: 
                     15: \paragraph
                     16: reg.cgi takes two parameters, path and show.  Path should give the
                     17: local path to the metadata file for the text that is being
                     18: registered. If ``show'' is set to 1, reg.cgi will return for
                     19: inspection the toc.cgi object file that it has built out of the
                     20: submitted metadata file. 
1.5       bcfuchs    21: 
                     22: \paragraph{input metadata file}
                     23: 
                     24: The input metadata file must have the following form
                     25: 
                     26: \paragraph
                     27: 
                     28: <resource>
                     29:     ...
                     30:     <meta>
                     31:       <meta>
                     32:                 <bib type=''Book''>
                     33: 
                     34: <title>Mainzer Untergerichtsordnung (von 1534)</title>
                     35: <author>anon</author>
                     36: <year>1580</year>
                     37:         <texttool><display>yes</display>
                     38:    <image>pageimgtif</image>
                     39:    <text>/mpiwg/online/experimental/echo_DRQEdit_test/anon_Mainz_1580/fulltextDW/mainzugo02_utf8.xml</text>
                     40:    <pagebreak>pb</pagebreak><presentation>01-presentation/info.xml</presentation></texttool></meta>
                     41: 
                     42:     </meta>
1.4       bcfuchs    43: 
1.7       bcfuchs    44: \paragraph{archimedes object registration}
1.1       casties    45: 
                     46: \subsubsection{toc.cgi (display text)}
                     47: \label{sec:toc.cgi}
                     48: 
1.2       bcfuchs    49: \paragraph{plan of this section }
                     50: 
                     51: \begin{enumeration}
1.9       bcfuchs    52: \item An overview of toc.cgi architecture
1.2       bcfuchs    53: \item A walk-through of typical cgi queries for toc.cgi
                     54: \item An index of cgi parameters and values with short descriptions of function
1.16    ! bcfuchs    55: \item The TOC Perl modules
1.2       bcfuchs    56: \end{enumeration}
                     57: 
1.9       bcfuchs    58: \paragraph{Overview of toc.cgi architecture}
                     59: 
                     60: \subparagraph{}
1.6       bcfuchs    61: toc.cgi is a perl script for displaying collections of xml texts and 
                     62: linking them to related resources such as page-images, morphological
                     63: analysis, commentaries, dictionaries, etc. It implements generic methods
                     64: for resource-linking provided by a series of perl modules which are in
1.9       bcfuchs    65: turn based mainly on generic open-source tools for xml manipulation and networking
1.6       bcfuchs    66: written in C. 
1.9       bcfuchs    67: 
                     68: \subparagraph{toc.cgi collections--Network transparency}
                     69: Each of the collections in toc.cgi is a ``virtual'' collection, that
                     70: is, a collection of links or uri's to resources that reside somewhere on an accessible
                     71: network, local or remote.  
                     72: 
                     73: \subparagraph{toc.cgi collections--remote resources}
                     74: 
                     75: What is at the other end of the link is of no concern to toc.cgi, as
                     76: long as the resource referenced by the link meets minimal toc.cgi
                     77: requirements--how the resource is actually implemented and exposed is
                     78: a matter for the resource provider. The link may, for instance, point
                     79: directly to an xml text or it may point to a container which exposes a
                     80: particular xml view of an underlying resource that is perhaps not in
                     81: xml format at all. 
                     82: 
                     83: 
                     84: \subparagraph{resource registry}
                     85: 
                     86: 
                     87: 
1.6       bcfuchs    88: 
1.2       bcfuchs    89: \paragraph{cgi parameters -- standard queries}
                     90: 
1.3       bcfuchs    91: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=corpus }
                     92: \newline
                     93: \newline
1.2       bcfuchs    94: get a listing of corpora
                     95: 
                     96: 
1.3       bcfuchs    97: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpusmanifest }
                     98: \newline
                     99: \newline
1.2       bcfuchs   100: get an xml listing of corpora 
                    101: 
                    102: 
1.3       bcfuchs   103: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi }
                    104: \newline
                    105: \newline
1.2       bcfuchs   106: get a listing of works in default corpus
                    107: 
1.3       bcfuchs   108: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?corpus=1 }
                    109: \newline
                    110: \newline
1.2       bcfuchs   111: get a listing of works in corpus 1 [default corpus = 0]
                    112: 
1.3       bcfuchs   113: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist }
                    114: \newline
                    115: \newline
1.2       bcfuchs   116: get an xml listing of works in default corpus 
                    117: 
1.3       bcfuchs   118: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?step=xmlcorpuslist;corpus=1 }
                    119: \newline
                    120: \newline
1.2       bcfuchs   121: get an xml listing of works in corpus 1
                    122: 
1.3       bcfuchs   123: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=baifl_renav_006_la_1537;step=thumb }
                    124: \newline
                    125: \newline
1.2       bcfuchs   126: get a work from default corpus with thumbnail navbar displayed left
                    127: 
                    128: 
1.3       bcfuchs   129: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=thumb;ftype=thumbright }
                    130: \newline
                    131: \newline
1.2       bcfuchs   132: get a work from default corpus with thumbnail navbar displayed right
                    133: 
1.3       bcfuchs   134: \htmladdnormallink{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }{ http://archimedes.mpiwg-berlin.mpg.de/cgi-bin/toc/toc.cgi?dir=jorda_ponde_050_la_1533;step=textonly;corpus=;page=22 }
                    135: \newline
                    136: \newline
1.2       bcfuchs   137: get a page of text from a work from default corpus 
                    138: 
                    139: 
1.16    ! bcfuchs   140: \paragraph{TOC Perl Modules}
        !           141: \subparagraph{general}The documentation for the Toc Perl Modules is
        !           142: located in the modules themselves in POD format. The POD is the
        !           143: definitive documentation for the modules. 
1.2       bcfuchs   144: 
1.16    ! bcfuchs   145: The modules are available to archimedes staff from cvs on the archimedes server at
        !           146: 141.14.236.86:/perseus/cvsroot in the module
        !           147: /perseus/cvsroot/mpitexts/perl/perllib. To get them, log on to the
        !           148: archimedes server and use the commandline command: 
        !           149: 
        !           150:         cvs -d /perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib
        !           151: 
        !           152: or from a remote location
        !           153: 
        !           154:       bash; export CVS_RSH=ssh; cvs -d :ext:myusername@141.14.236.86:/perseus/cvsroot co /perseus/cvsroot/mpitexts/perl/perllib
1.1       casties   155: 
                    156: \subsubsection{Indexing}
                    157: \label{sec:indexing}
                    158: 
1.10      bcfuchs   159: \paragraph{Status quo ECHO}
                    160: Currently indexing is not implemented on the ECHO server.
                    161: 
                    162: \paragraph{Plan ECHO}
                    163: 
                    164: \begin{enumeration}
                    165: \item construct remote (141.14.236.86) index for each file at
                    166:   per-change or daily intervals
1.11      bcfuchs   167: \item store indices locally in
                    168: archimedes/data/db/PROJECT_NAME/CORPUS_NAME/WORK
1.12      bcfuchs   169: \item 2 progs on server 1. cgi: indexer 2. backend da_remote
                    170: \item 2 progs on client 1. cgi: sendindex 2. backend getindex
                    171: \item indexing transaction handled by two cgi scripts, one on the
                    172:   server the other on the client [this is the 1st implementation bcs
                    173:   its easiest and there are no port issues, but probably it'd be
                    174:   better to have a separate port]. 
1.13      bcfuchs   175: \item client cgi: getindex -- sends 1.  list of files to index
1.12      bcfuchs   176:   2. uri to which xml notification of completion is to be sent. Upon
                    177:   notification, activates backend prog that fetches and installs the
                    178:   indices.  
                    179: \item server cgi: indexer receives filelist and notification
                    180:   addess. Activates backend that fetches files, indexes, places
                    181:   completed indexes in a networked location, then sends xml
                    182:   notification back to client. 
1.11      bcfuchs   183: \item single script provides backend access to indices 
1.12      bcfuchs   184: \item leave front-end issues like display, collection and navigation
                    185:   to web-design programmers. Do only a  sample for now. 
1.10      bcfuchs   186: \end{enumeration}
1.1       casties   187: 
                    188: \subsubsection{Morphology}
                    189: \label{sec:morphology}
                    190: 
                    191: 
                    192: \subsubsection{Dictionary server}
                    193: \label{sec:dictionary-server}
                    194: 
1.7       bcfuchs   195: 
                    196: \subsubsection{helper programs}
                    197: 
                    198: \paragraph{addarch.pl ARCHIMEDES} 
                    199: 
                    200: Automatically registers new texts as toc.cgi objects when they appear in
1.8       bcfuchs   201: cvs. Automatically updates relevant morphological indices (slow!) each
                    202: time a cvs update occurs. This program is called by a hook in the cvs
                    203: ``loginfo'' configuration file. 
1.7       bcfuchs   204: 
                    205: 
1.8       bcfuchs   206: \paragraph{makelemma.pl ARCHIMEDES}
1.7       bcfuchs   207: 
                    208: Updates lemmatization indices. 
                    209: Parameters: 
                    210: No parameter--update all lemmatization indices
                    211: [latin | ital | greek | en | nl | de]--  update this language
                    212: 
1.8       bcfuchs   213: \paragraph{makefast.pl ARCHIMEDES} 
1.7       bcfuchs   214: 
                    215: Updates the toc.cgi morphology indices
                    216: Parameters
                    217: No parameter--update all lemmatization indices
                    218: [latin | ital | greek | en | nl | de]--  update this language
1.1       casties   219: 
1.14      bcfuchs   220: \subsubsection{summary of differences btwn the archimedes toc.cgi
                    221:   implementation and the echo toc.cgi impelementation (toc.x.cgi)}
                    222: 
                    223: \paragraph{missing in archimedes}
                    224: \begin{enumeration}
                    225: 
                    226: \item html templates (coded but phased out of cvs branch)
                    227: \end{enumeration}
                    228: 
                    229: \paragraph{missing in echo}
                    230: \begin{enumeration}
                    231: 
                    232: \item word-coloring?
                    233: \item remote text method may work differently
                    234: 
                    235: 
                    236: 
1.15      bcfuchs   237: \end{enumeration}
                    238: \paragraph{differences}
                    239: \begin{enumeration}
                    240: \item structure of info.xml
                    241: \item resource-discovery algorithm for info.xml
                    242: \end{enumeration}
1.14      bcfuchs   243: 
                    244: 
                    245: 
1.1       casties   246: %%% Local Variables: 
                    247: %%% mode: latex
                    248: %%% TeX-master: "texttools"
                    249: %%% End: 

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>