annotate src/main/java/de/mpiwg/indexmeta/AnnotateIndexMeta.java @ 7:bc57f2660b0f

implementation of web service
author Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
date Fri, 12 Apr 2013 17:48:42 +0200
parents 8f6c4dab5d17
children 9ce7979fd037
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
1 package de.mpiwg.indexmeta;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
2 // import stuff
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
3 import java.io.File;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
4 import java.io.IOException;
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
5 import java.util.ArrayList;
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
6 import java.util.Arrays;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
7 import java.util.List;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
8
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
9 import javax.xml.parsers.DocumentBuilder;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
10 import javax.xml.parsers.DocumentBuilderFactory;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
11 import javax.xml.parsers.ParserConfigurationException;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
12 import javax.xml.transform.Transformer;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
13 import javax.xml.transform.TransformerException;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
14 import javax.xml.transform.TransformerFactory;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
15 import javax.xml.transform.dom.DOMSource;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
16 import javax.xml.transform.stream.StreamResult;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
17
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
18 import org.w3c.dom.Attr;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
19 import org.w3c.dom.Document;
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
20 import org.w3c.dom.Element;
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
21 import org.w3c.dom.NamedNodeMap;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
22 import org.w3c.dom.Node;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
23 import org.w3c.dom.NodeList;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
24 import org.xml.sax.SAXException;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
25
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
26 public class AnnotateIndexMeta {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
27
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
28 public static void main(String argv[]) {
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
29 System.out.println("in main");
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
30
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
31 // Methodenaufruf
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
32 String filepath = "/Users/kthoden/eclipse/workspace/IndexMetaContextualization/data/index.meta/index.meta_FQPFR8XP";
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
33 // this is a list of all the elements we want to contextualize
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
34 List<String> contextualizableList = Arrays.asList(new String[]{"author","editor","publisher","city","holding-library","keywords"});
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
35 try {xmlParse(filepath,contextualizableList);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
36 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
37 catch (Exception e) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
38 e.printStackTrace();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
39 };
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
40 System.out.println("Done");
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
41 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
42
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
43 /**
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
44 * Parses the XML file given as first argument and writes attributes in elements that are to be contextualized. These serve simply as markers for the next tools that are going to fetch these elements to put them in the database.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
45 * @param filepath path to the file. It will also be used as the basis for the output file (this adds "-annot").
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
46 * @param contextualizableList contains the elements that shall be given a context identifier which is later used to grab the contents and put them into the database to have it contextualized.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
47 * @throws Exception which means that in the source index.meta file there are already markers for contextualization.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
48 *
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
49 */
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
50 public static void xmlParse(String filepath, List<String> contextualizableList) throws Exception {
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
51 try {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
52 // this is how the outputfile will be called
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
53 String outfilepath = filepath + "-annot";
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
54 // open the file and parse it
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
55 DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
56 DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
57 Document doc = docBuilder.parse(filepath);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
58
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
59 // iterate through the document
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
60 Integer count = 0;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
61 for(String contextElement : contextualizableList){
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
62 NodeList nodeList = doc.getElementsByTagName(contextElement);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
63 for(int i=0; i < nodeList.getLength(); i++){
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
64 Node iter2 = nodeList.item(i);
7
bc57f2660b0f implementation of web service
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
65 String currentNodeValue = null;
bc57f2660b0f implementation of web service
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents: 1
diff changeset
66 //String currentNodeValue = iter2.getTextContent();
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
67 NamedNodeMap attr = iter2.getAttributes();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
68 // make a new attribute
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
69 if (attr.getNamedItem("context-id") == null){
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
70 Attr attribute = doc.createAttribute ("context-id");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
71 attribute.setValue (count.toString());
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
72 attr.setNamedItem (attribute);
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
73 }
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
74 else {throw new Exception("There is already at least one context-id attribute in the source index.meta. This is not allowed. ");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
75 }
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
76 // Just for comfort. Print it out.
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
77 System.out.println(contextElement);
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
78 if (contextElement == "author") {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
79 findContext(doc, currentNodeValue);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
80 }
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
81 count++;
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
82 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
83 // get the element by name (so they should be unique?)
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
84 //Node iter2 = doc.getElementsByTagName(contextElement).item(0);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
85 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
86 // write the content into xml file
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
87 TransformerFactory transformerFactory = TransformerFactory.newInstance();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
88 Transformer transformer = transformerFactory.newTransformer();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
89 DOMSource source = new DOMSource(doc);
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
90 StreamResult result = new StreamResult(new File(outfilepath));
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
91 transformer.transform(source, result);
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
92 /*
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
93 * should these really go inside this method?
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
94 */
0
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
95 } catch (ParserConfigurationException pce) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
96 pce.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
97 } catch (TransformerException tfe) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
98 tfe.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
99 } catch (IOException ioe) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
100 ioe.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
101 } catch (SAXException sae) {
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
102 sae.printStackTrace();
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
103 }
dfce13a5f5f9 nit project!
Jorge Urzua <jurzua@mpiwg-berlin.mpg.de>
parents:
diff changeset
104 }
1
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
105
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
106 /**
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
107 * this method checks the current index.meta file for already existing contextualizations. For example, newer generations of index.meta (as of 2013) already do have GND information for persons associated with the object in question.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
108 * However, for the sake of backwards compatibility, the nearly-deprecated "author" element is also existant (as well as "city", which is meant to be replaced by "place" which in turn might be superseded by "geo-location")
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
109 * Technically, we parse the XML and construct a map containing a persons name, its remote ID and its role.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
110 * @param doc
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
111 * @param currentNodeValue
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
112 */
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
113 public static void findContext(Document doc, String currentNodeValue) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
114 // first, define some variables
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
115 String nameOfPerson = "";
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
116 String roleOfPerson = "";
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
117 String idOfPerson= "";
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
118
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
119 // next, we try to see if there is already a contextualized author
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
120 // let us concentrate on that element
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
121 // then we look for tags called person
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
122 // if there are any, we take the liberty of querying them. This is a Nodelist
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
123 NodeList personList = doc.getElementsByTagName("person");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
124 // Debug information for the human eye.
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
125 // System.out.println("The current node value is "+ currentNodeValue + ". Let's do something useful in the findContext method.");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
126 // System.out.println("This node list has " + personList.getLength() + " members: " + personList.item(0) + "and" + personList.item(1));
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
127 // Integer personCounter = 1;
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
128 // look at every element in the list of persons
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
129 for(int countPerson=0; countPerson < personList.getLength(); countPerson++){
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
130 // just some control
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
131 // System.out.println("This is person number " + personCounter);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
132 // drill down a bit further. We now can access the person list
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
133 Node iterPerson = personList.item(countPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
134
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
135 // this here produces the role of a person
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
136 if (iterPerson instanceof Element) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
137 Element e = (Element)iterPerson;
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
138 roleOfPerson = e.getAttribute("role");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
139 // System.out.println("Rolle: " + roleOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
140
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
141 // there will also be a name attached. It is so written in the index.meta specification. Can we trust that?
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
142 NodeList l0 = e.getElementsByTagName("name");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
143 if(l0.getLength() > 0){
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
144 Node name = l0.item(0);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
145 nameOfPerson = name.getFirstChild().getNodeValue();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
146 // System.out.println("Name: " + nameOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
147 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
148
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
149 // and the identifier, this should be there, too. Maybe it's not...
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
150 NodeList l1 = e.getElementsByTagName("identifier");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
151 if(l1.getLength() > 0){
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
152 Node name = l1.item(0);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
153 idOfPerson = name.getFirstChild().getNodeValue();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
154 //System.out.println("Identifier: " + idOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
155 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
156 // System.out.println("Current Node Value " + currentNodeValue + ". Name of Person " + nameOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
157 // now the final check and why we did all this:
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
158 if (nameOfPerson.equals(currentNodeValue)) {
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
159 ArrayList<String> authorInfo = new ArrayList<String>();
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
160 authorInfo.add(nameOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
161 authorInfo.add(roleOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
162 authorInfo.add(idOfPerson);
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
163
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
164 System.out.println("This person has already been contextualized: " + nameOfPerson + " hat die Rolle " + roleOfPerson + " und den Identifier " + idOfPerson + ".");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
165 }}
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
166 // personCounter ++;
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
167 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
168 System.out.println("printing author");
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
169 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
170 }
8f6c4dab5d17 First version. Annotates the elements to be contextualized and checks whether some authors already have an ID.
Klaus Thoden <kthoden@mpiwg-berlin.mpg.de>
parents: 0
diff changeset
171