Version 2 (modified by 11 years ago) (diff) | ,
---|
Search-Service
Der Searchservice liegt auf:
http://md.mpiwg-berlin.mpg.de:8983/solr/#/mpiwgSources
Experiments
Versuch mit Solr4
Config File (data-config.xml):
<dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="f" processor="FileListEntityProcessor" excludes="^\..*" baseDir="/Volumes/online_permanent/einstein/annalen" fileName=".*\.meta" recursive="true" rootEntity="false" dataSource="null"> <entity name="page" processor="de.mpiwg.itgroup.solr.transformer.ExtendedXPathEntityProcessor" stream="true" url="${f.fileAbsolutePath}" transformer="RegexTransformer,DateFormatTransformer" xsl="/Users/dwinter/Documents/Projekte/MetaDataManagement/testData/indexMeta_to_field.xsl" useSolrAddSchema="true" > </entity> </entity> </document> </dataConfig>
indexMeta_to_field.xsl konvertiert index.meta files in das doc-format zum Indizieren. Alle Einträge in bib werden dazu in Felder umgewandelt mit dem Prefix "IM_". Ausserdem werden alle Felder noch in ein Feld "all-bib-data" gemappt.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <add> <doc> <field name="archive-path"><xsl:value-of select="/resource/archive-path"/> </field> <xsl:for-each select="/resource/meta/bib//*"> <field><xsl:attribute name="name">IM_<xsl:value-of select="name()"/> </xsl:attribute><xsl:value-of select="."/></field> <field name="all-bib-data"><xsl:value-of select="."/></field> </xsl:for-each> </doc> </add> </xsl:template> </xsl:stylesheet>
In schema.xml:
<field name="all-bib-data" type="text_general" indexed="true" stored="true" multiValued="true"/> <field name="archive-path" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <dynamicField name="IM_*" type="text_general" indexed="true" stored="true"/> <uniqueKey>archive-path</uniqueKey>
ExtendedXPathEntityProcessor ist eine fehlertolerantere Erweiterung von XPathEntityProcessor.
package de.mpiwg.itgroup.solr.transformer; import java.util.Map; import org.apache.solr.handler.dataimport.XPathEntityProcessor; public class ExtendedXPathEntityProcessor extends XPathEntityProcessor { public Map<String,Object> nextRow(){ Map<String, Object> r; try { r = super.nextRow(); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); r = null; } return r; } }