Harvesting der www-Seiten mit Hilfe von Nutch
MPIWG nutch plugins
Für das Harvesting der www- Seiten des Institutes existieren zwei Plugins.
parse-mpiwg
source:mpiwg-nutch-plugins/src/plugin/parse-mpiwg
parse-MPIWG-metaTag
source:mpiwg-nutch-plugins/src/plugin/parse-MPIWG-metaTag
Konfiguration
Konfiguration der Suche, Server, cron, etc..
HTML Tags
Metatags für Members
<meta name="description" content="member"/>
Classes
<span class="mpiwg-first_name">First Name</span> <span class="mpiwg-last_name">Last Name</span>
Metatags für Projects
<meta name="description" content="project"/>
Classes
<h1 class="mpiwg-title">History of Scientific Objectivity, 18th-19th Cs</h1> <p class="mpiwg-authors"> <a class="mpiwg-author">Name of responsible person</a> </p>
Metatags für features
<meta name="description" content="feature"/>
Last modified 11 years ago
Last modified on Oct 25, 2013, 6:34:30 AM