annotate src/main/java/de/mpiwg/indexmeta/AnnotateIndexMeta.java @ 6:7a2a98655236

Some more changes. Class is now in a stable state.
author Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
date Fri, 12 Apr 2013 17:48:10 +0200
parents 8f6c4dab5d17
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
1 package de.mpiwg.indexmeta;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
2 // import stuff
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
3 import java.io.File;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
4 import java.io.IOException;
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
5 import java.util.ArrayList;
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
6 import java.util.Arrays;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
7 import java.util.List;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
8
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
9 import javax.xml.parsers.DocumentBuilder;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
10 import javax.xml.parsers.DocumentBuilderFactory;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
11 import javax.xml.parsers.ParserConfigurationException;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
12 import javax.xml.transform.Transformer;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
13 import javax.xml.transform.TransformerException;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
14 import javax.xml.transform.TransformerFactory;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
15 import javax.xml.transform.dom.DOMSource;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
16 import javax.xml.transform.stream.StreamResult;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
17
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
18 import org.w3c.dom.Attr;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
19 import org.w3c.dom.Document;
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
20 import org.w3c.dom.Element;
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
21 import org.w3c.dom.NamedNodeMap;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
22 import org.w3c.dom.Node;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
23 import org.w3c.dom.NodeList;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
24 import org.xml.sax.SAXException;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
25
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
26 public class AnnotateIndexMeta {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
27
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
28 public static void main(String argv[]) {
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
29 System.out.println("in main");
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
30
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
31 // Methodenaufruf
6
7a2a98655236 Some more changes. Class is now in a stable state.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
32 String filepath = "/Users/kthoden/eclipse/workspace/IndexMetaContextualizer/data/index.meta/index.meta_FQPFR8XP";
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
33 // this is a list of all the elements we want to contextualize
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
34 List<String> contextualizableList = Arrays.asList(new String[]{"author","editor","publisher","city","holding-library","keywords"});
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
35 try {xmlParse(filepath,contextualizableList);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
36 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
37 catch (Exception e) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
38 e.printStackTrace();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
39 };
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
40 System.out.println("Done");
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
41 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
42
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
43 /**
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
44 * Parses the XML file given as first argument and writes attributes in elements that are to be contextualized. These serve simply as markers for the next tools that are going to fetch these elements to put them in the database.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
45 * @param filepath path to the file. It will also be used as the basis for the output file (this adds "-annot").
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
46 * @param contextualizableList contains the elements that shall be given a context identifier which is later used to grab the contents and put them into the database to have it contextualized.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
47 * @throws Exception which means that in the source index.meta file there are already markers for contextualization.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
48 *
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
49 */
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
50 public static void xmlParse(String filepath, List<String> contextualizableList) throws Exception {
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
51 try {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
52 // this is how the outputfile will be called
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
53 String outfilepath = filepath + "-annot";
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
54 // open the file and parse it
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
55 DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
56 DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
57 Document doc = docBuilder.parse(filepath);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
58
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
59 // iterate through the document
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
60 Integer count = 0;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
61 for(String contextElement : contextualizableList){
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
62 NodeList nodeList = doc.getElementsByTagName(contextElement);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
63 for(int i=0; i < nodeList.getLength(); i++){
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
64 Node iter2 = nodeList.item(i);
6
7a2a98655236 Some more changes. Class is now in a stable state.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
65 String currentNodeValue = iter2.getFirstChild().getNodeValue();
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
66 NamedNodeMap attr = iter2.getAttributes();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
67 // make a new attribute
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
68 if (attr.getNamedItem("context-id") == null){
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
69 Attr attribute = doc.createAttribute ("context-id");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
70 attribute.setValue (count.toString());
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
71 attr.setNamedItem (attribute);
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
72 }
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
73 else {throw new Exception("There is already at least one context-id attribute in the source index.meta. This is not allowed. ");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
74 }
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
75 // Just for comfort. Print it out.
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
76 System.out.println(contextElement);
6
7a2a98655236 Some more changes. Class is now in a stable state.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
77 if (contextElement == "author" || contextElement == "editor") {
7a2a98655236 Some more changes. Class is now in a stable state.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
78 checkExistingContext(doc, currentNodeValue);
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
79 }
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
80 count++;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
81 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
82 // get the element by name (so they should be unique?)
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
83 //Node iter2 = doc.getElementsByTagName(contextElement).item(0);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
84 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
85 // write the content into xml file
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
86 TransformerFactory transformerFactory = TransformerFactory.newInstance();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
87 Transformer transformer = transformerFactory.newTransformer();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
88 DOMSource source = new DOMSource(doc);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
89 StreamResult result = new StreamResult(new File(outfilepath));
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
90 transformer.transform(source, result);
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
91 /*
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
92 * should these really go inside this method?
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
93 */
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
94 } catch (ParserConfigurationException pce) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
95 pce.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
96 } catch (TransformerException tfe) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
97 tfe.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
98 } catch (IOException ioe) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
99 ioe.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
100 } catch (SAXException sae) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
101 sae.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
102 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
103 }
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
104
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
105 /**
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
106 * this method checks the current index.meta file for already existing contextualizations. For example, newer generations of index.meta (as of 2013) already do have GND information for persons associated with the object in question.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
107 * However, for the sake of backwards compatibility, the nearly-deprecated "author" element is also existant (as well as "city", which is meant to be replaced by "place" which in turn might be superseded by "geo-location")
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
108 * Technically, we parse the XML and construct a map containing a persons name, its remote ID and its role.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
109 * @param doc
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
110 * @param currentNodeValue
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
111 */
6
7a2a98655236 Some more changes. Class is now in a stable state.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
112 public static void checkExistingContext(Document doc, String currentNodeValue) {
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
113 // first, define some variables
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
114 String nameOfPerson = "";
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
115 String roleOfPerson = "";
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
116 String idOfPerson= "";
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
117
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
118 // next, we try to see if there is already a contextualized author
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
119 // let us concentrate on that element
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
120 // then we look for tags called person
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
121 // if there are any, we take the liberty of querying them. This is a Nodelist
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
122 NodeList personList = doc.getElementsByTagName("person");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
123 // Debug information for the human eye.
6
7a2a98655236 Some more changes. Class is now in a stable state.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
124 // System.out.println("The current node value is "+ currentNodeValue + ". Let's do something useful in the checkExistingContext method.");
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
125 // System.out.println("This node list has " + personList.getLength() + " members: " + personList.item(0) + "and" + personList.item(1));
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
126 // Integer personCounter = 1;
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
127 // look at every element in the list of persons
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
128 for(int countPerson=0; countPerson < personList.getLength(); countPerson++){
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
129 // just some control
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
130 // System.out.println("This is person number " + personCounter);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
131 // drill down a bit further. We now can access the person list
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
132 Node iterPerson = personList.item(countPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
133
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
134 // this here produces the role of a person
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
135 if (iterPerson instanceof Element) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
136 Element e = (Element)iterPerson;
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
137 roleOfPerson = e.getAttribute("role");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
138 // System.out.println("Rolle: " + roleOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
139
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
140 // there will also be a name attached. It is so written in the index.meta specification. Can we trust that?
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
141 NodeList l0 = e.getElementsByTagName("name");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
142 if(l0.getLength() > 0){
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
143 Node name = l0.item(0);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
144 nameOfPerson = name.getFirstChild().getNodeValue();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
145 // System.out.println("Name: " + nameOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
146 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
147
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
148 // and the identifier, this should be there, too. Maybe it's not...
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
149 NodeList l1 = e.getElementsByTagName("identifier");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
150 if(l1.getLength() > 0){
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
151 Node name = l1.item(0);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
152 idOfPerson = name.getFirstChild().getNodeValue();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
153 //System.out.println("Identifier: " + idOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
154 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
155 // System.out.println("Current Node Value " + currentNodeValue + ". Name of Person " + nameOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
156 // now the final check and why we did all this:
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
157 if (nameOfPerson.equals(currentNodeValue)) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
158 ArrayList<String> authorInfo = new ArrayList<String>();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
159 authorInfo.add(nameOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
160 authorInfo.add(roleOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
161 authorInfo.add(idOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
162
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
163 System.out.println("This person has already been contextualized: " + nameOfPerson + " hat die Rolle " + roleOfPerson + " und den Identifier " + idOfPerson + ".");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
164 }}
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
165 // personCounter ++;
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
166 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
167 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
168 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
169