Changes between Version 5 and Version 6 of digi-tools-doku


Ignore:
Timestamp:
Jan 29, 2015, 1:32:26 PM (9 years ago)
Author:
Klaus Thoden
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • digi-tools-doku

    v5 v6  
     1= XML Workflow Tool =
     2
    13[[Image(dm2e.png)]]
    24
     
    68
    79[[Image(01_workflowTools.png, 40%, title="XML Workflow tools entry page", alt="Entry page")]]
     10
    811[http://ocropus2.rz-berlin.mpg.de:8080/digitizing-tools/pages/home.iface XML workflow tools]
     12
     13Input for a typical workflow is a transcription that was created following the [wiki:despecs Data Entry Specifications]. The work is divided into three phases: checking, creating a well-formed XML and creating valid XML, all of which are described below in detail.
    914
    1015== Phase 1: Checking ==
    1116[[Image(02_checkTags.png, 40%, title="02_checkTags.png", alt="02_checkTags.png")]]
     17
     18The text document is uploaded via the webpage and the scripts are generally started by clicking on the "Run" buttoon.
     19
    1220[[Image(03_checkTagsOutput.png, 40%, title="03_checkTagsOutput.png", alt="03_checkTagsOutput.png")]]
     21
     22If there are errors, these are displayed in the "error" tab, other information is shown in the "console" tab
     23
    1324[[Image(04_modificationInEditor.png, 40%, title="04_modificationInEditor.png", alt="04_modificationInEditor.png")]]
     25
     26Because of errors being present in the input file, the document has to be modified locally using a text editor.
     27
    1428[[Image(05_reCheckTags.png, 40%, title="05_reCheckTags.png", alt="05_reCheckTags.png")]]
     29
     30After re-running the script, the text has passed the test and the next steps can be taken.
     31
    1532[[Image(06_nextWorkflowStep.png, 40%, title="06_nextWorkflowStep.png", alt="06_nextWorkflowStep.png")]]
     33
     34Below the output of the current script is a window which suggests the next meaningful script in the workflow. The output of the current script is used as input to the next one. Of course, other scripts can be selected, as well.
     35
    1636[[Image(07_findPagebreaks.png, 40%, title="07_findPagebreaks.png", alt="07_findPagebreaks.png")]]
     37
     38Checking the pagebreaks is important for synchronizing the digital facsimiles with the transcription.
     39
    1740[[Image(08_pagebreaksChecker.png, 40%, title="08_pagebreaksChecker.png", alt="08_pagebreaksChecker.png")]]
     41
    1842[[Image(09_pagebreakCheckerMore.png, 40%, title="09_pagebreakCheckerMore.png", alt="09_pagebreakCheckerMore.png")]]
     43
     44The webservice displays the pages with the first few lines of the content. The user now checks manually if the text corresponds to what is seen on the digital facsimile. Links are provided for comfortable checking.
     45
    1946[[Image(10_outputs.png, 40%, title="10_outputs.png", alt="10_outputs.png")]]
     47
     48In this case, the script is divided into two parts, the first part creating an configuration file which can be altered by the user (shown below). This is then evaluated in the second step.
     49
    2050[[Image(11_pbConfigurationFile.png, 40%, title="11_pbConfigurationFile.png", alt="11_pbConfigurationFile.png")]]
    2151
     52The configuration file for the pagebreak script. The last entry should be removed. There is no corresponding pagebreak for that image in the transcription.
     53
     54[[Image(12_showDiff.png, 40%, title="12_showDiff.png", alt="12_showDiff.png")]]
     55
     56A useful feature is to show the effects of one script by displaying the current and the previous version side by side: the Diff.
     57
     58[[Image(13_DiffTool.png, 40%, title="13_DiffTool.png", alt="13_DiffTool.png")]]
     59
     60The Diff tool showing the beginning of the text document after application of the pagebreak script. Green lines have been altered: the filename has been written behind each pagebreak-pseudo-tag (pseudo, because this is not really XML yet).
     61
     62[[Image(14_unknownCharacterWarning.png, 40%, title="14_unknownCharacterWarning.png", alt="14_unknownCharacterWarning.png")]]
     63
     64Another preparational step is the treatment of unknown characters. Characters that were not recognized during data entry assigned a code and collected on a list together with a screenshot of that character.
     65
     66[[Image(15_unknownCharacterFile.png, 40%, title="15_unknownCharacterFile.png", alt="15_unknownCharacterFile.png")]]
     67
     68A configuration file takes care of these replacements with its corresponding Unicode character. This file is evaluated in the next step and the replacements take place.
     69
     70[[Image(16_unknownCharacterOutput.png, 40%, title="16_unknownCharacterOutput.png", alt="16_unknownCharacterOutput.png")]]
     71
    2272== Phase 2: Creating well-formed XML ==
    23 [[Image(12_showDiff.png, 40%, title="12_showDiff.png", alt="12_showDiff.png")]]
    24 [[Image(13_DiffTool.png, 40%, title="13_DiffTool.png", alt="13_DiffTool.png")]]
    25 [[Image(14_unknownCharacterWarning.png, 40%, title="14_unknownCharacterWarning.png", alt="14_unknownCharacterWarning.png")]]
    26 [[Image(15_unknownCharacterFile.png, 40%, title="15_unknownCharacterFile.png", alt="15_unknownCharacterFile.png")]]
    27 [[Image(16_unknownCharacterOutput.png, 40%, title="16_unknownCharacterOutput.png", alt="16_unknownCharacterOutput.png")]]
    28 [[Image(17_nextSteps.png, 40%, title="17_nextSteps.png", alt="17_nextSteps.png")]]
     73Following these important preparational steps is the conversion to a well-formed XML document. These are additional replacements and the resolving of shorthands that were used during data entry.
     74
    2975[[Image(18_helpText.png, 40%, title="18_helpText.png", alt="18_helpText.png")]]
     76
     77The built-in help describes the functionality of each script.
     78
    3079[[Image(19_stringInputForXML.png, 40%, title="19_stringInputForXML.png", alt="19_stringInputForXML.png")]]
     80
     81One of the final steps is the insertion of metadata that reside already in the system. For that reason, the identifier of the document has to be put in at this point.
     82
    3183[[Image(20_wellformedXML.png, 40%, title="20_wellformedXML.png", alt="20_wellformedXML.png")]]
     84
     85The first lines of the XML, displayed in the browser.
     86
    3287[[Image(21_testWellformedness.png, 40%, title="21_testWellformedness.png", alt="21_testWellformedness.png")]]
    3388[[Image(22_XMLWellformed.png, 40%, title="22_XMLWellformed.png", alt="22_XMLWellformed.png")]]
    3489
     90A script checks the XML if it is well-formed.
     91
    3592== Phase 3: Creating valid XML ==
     93Although the XML document being well-formed does not mean that it is also valid to an XML schema. There are still a few steps to be taken. This is done in the third phase.
     94
    3695[[Image(23_moveFloatsDiff.png, 40%, title="23_moveFloatsDiff.png", alt="23_moveFloatsDiff.png")]]
    37 [[Image(24_insertLineBreaks.png, 40%, title="24_insertLineBreaks.png", alt="24_insertLineBreaks.png")]]
     96
     97Floating elements like notes and images are moved away from their original places, being replaced by an anchor. The diff shows the effect of this.
     98
    3899[[Image(25_divStructure.png, 40%, title="25_divStructure.png", alt="25_divStructure.png")]]
    39 [[Image(26_testValidity.png, 40%, title="26_testValidity.png", alt="26_testValidity.png")]]
     100
     101Lateron, a div structure is added which is also used for creating a table of contents.
     102
    40103[[Image(27_XMLValid.png, 40%, title="27_XMLValid.png", alt="27_XMLValid.png")]]
    41104
     105As a final stage, the validity is tested against a schema. In this case, the document is valid. In some cases, the document has to be edited locally to stand this test.
    42106== Extras ==
     107[[Image(28_whatshallwedonow.png, 40%, title="28_whatshallwedonow.png", alt="28_whatshallwedonow.png")]]
     108
     109Further extras can be applied to the XML document, available in the bottom window. For example, mathematical formulas written as LaTeX can be converted to MathML here.
    43110
    44111== Upload ==
    45 [[Image(28_whatshallwedonow.png, 40%, title="28_whatshallwedonow.png", alt="28_whatshallwedonow.png")]]
    46112[[Image(29_uploadSandbox.png, 40%, title="29_uploadSandbox.png", alt="29_uploadSandbox.png")]]
     113
     114A valid XML file can then be uploaded in the Sandbox for further checking.
     115
    47116[[Image(30_indexmeta.png, 40%, title="30_indexmeta.png", alt="30_indexmeta.png")]]
     117
     118In addition to that, the file containing the metadata of the work has to be expanded with the path and name of the XML document so that it is also [http://echo.mpiwg-berlin.mpg.de/ECHOdocuView?url=/permanent/library/UR271U6Y/ displayed in the ECHO display environment].
     119
    48120[[Image(31_operationStatus.png, 40%, title="31_operationStatus.png", alt="31_operationStatus.png")]]
    49121
    50 == Output =
     122Operation status shows that the text has successfully been uploaded. During that process, the text is also analysed morphologically and connected to various dictionaries available in the system.
     123
     124== Output ==
    51125[[Image(32_resultPollux.png, 40%, title="32_resultPollux.png", alt="32_resultPollux.png")]]
    52 [[Image(33_resultPubby.png, 40%, title="33_resultPubby.png", alt="33_resultPubby.png")]]
    53 [[Image(34_resultEuropeana.png, 40%, title="34_resultEuropeana.png", alt="34_resultEuropeana.png")]]
    54126
     127[http://mpdl-system.mpiwg-berlin.mpg.de:30060/mpdl/page-query-result.xql?document=%2Fecho%2Fit%2FZonca_1656_UR271U6Y.xml&mode=textPollux&pn=20&query-type=&query=&query-result-pn=0 The text being displayed in the sandbox], maroon-coloured words can be clicked on, showing appropriate dictionary entries.
    55128
     129[[Image(33_wordInfo.png, 40%, title="33_wordInfo.png", alt="33_wordInfo.png")]]
    56130
     131[http://mpdl-service.mpiwg-berlin.mpg.de/mpiwg-mpdl-lt-web/lt/GetDictionaryEntries?query=legnami&queryDisplay=legnami&language=it&outputFormat=html&outputType=morphCompact&outputType=dictFull Word information]
     132
     133[[Image(35_resultPubby.png, 40%, title="35_resultPubby.png", alt="35_resultPubby.png")]]
     134
     135[http://data.dm2e.eu/data/html/resourcemap/mpiwg/rara/MPIWG_UK75HAUV/20141001101216473 After ingestion into the DM2E triple store], the results can be easily browsed
     136
     137[[Image(36_resultEuropeana.png, 40%, title="36_resultEuropeana.png", alt="36_resultEuropeana.png")]]
     138
     139Of course, the source has also been given to [Europeana http://europeana.eu/portal/record/2048607/data_item_mpiwg_rara_MPIWG_UK75HAUV.html]