wiki:solr_nutch

Harvesting der www-Seiten mit Hilfe von Nutch

MPIWG nutch plugins

Für das Harvesting der www- Seiten des Institutes existieren zwei Plugins.

parse-mpiwg

source:mpiwg-nutch-plugins/src/plugin/parse-mpiwg

parse-MPIWG-metaTag

source:mpiwg-nutch-plugins/src/plugin/parse-MPIWG-metaTag

Konfiguration

Konfiguration der Suche, Server, cron, etc..

HTML Tags

Metatags für Members

  <meta name="description" content="member"/> 

Classes

  <span class="mpiwg-first_name">First Name</span>
  <span class="mpiwg-last_name">Last Name</span>

Metatags für Projects

  <meta name="description" content="project"/> 

Classes

  <h1 class="mpiwg-title">History of Scientific Objectivity, 18th-19th Cs</h1>
  
  <p class="mpiwg-authors">
    <a class="mpiwg-author">Name of responsible person</a>
  </p>

Metatags für features

  <meta name="description" content="feature"/> 
Last modified 11 years ago Last modified on Oct 25, 2013, 6:34:30 AM