Version 5 (modified by 11 years ago) (diff) | ,
---|
Harvesting der www-Seiten mit Hilfe von Nutch
MPIWG nutch plugins
Für das Harvesting der www- Seiten des Institutes existieren zwei Plugins.
parse-mpiwg
source:mpiwg-nutch-plugins/src/plugin/parse-mpiwg
parse-MPIWG-metaTag
source:mpiwg-nutch-plugins/src/plugin/parse-MPIWG-metaTag
HTML Tags
Metatags für Members
<meta name="description" content="member"/>
Classes
<span class="mpiwg-first_name">First Name</span> <span class="mpiwg-last_name">Last Name</span>
Metatags für Projects
<meta name="description" content="project"/>
Classes
<h1 class="mpiwg-title">History of Scientific Objectivity, 18th-19th Cs</h1> <p class="mpiwg-authors"> <a class="mpiwg-author">Name of responsible person</a> </p>
Metatags für features
<meta name="description" content="feature"/>