annotate solr/schema.xml @ 0:a2b4f67e73dc default tip

initial
author Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
date Mon, 08 Jun 2015 10:21:54 +0200
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1 <?xml version="1.0" encoding="UTF-8" ?>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
2 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
3 Licensed to the Apache Software Foundation (ASF) under one or more
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
4 contributor license agreements. See the NOTICE file distributed with
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
5 this work for additional information regarding copyright ownership.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
6 The ASF licenses this file to You under the Apache License, Version 2.0
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
7 (the "License"); you may not use this file except in compliance with
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
8 the License. You may obtain a copy of the License at
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
9
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
10 http://www.apache.org/licenses/LICENSE-2.0
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
11 Unless required by applicable law or agreed to in writing, software
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
12 distributed under the License is distributed on an "AS IS" BASIS,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
13 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
14 See the License for the specific language governing permissions and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
15 limitations under the License.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
16 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
17
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
18 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
19 This is the Solr schema file. This file should be named "schema.xml" and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
20 should be in the conf directory under the solr home
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
21 (i.e. ./solr/conf/schema.xml by default)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
22 or located where the classloader for the Solr webapp can find it.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
23
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
24 This example schema is the recommended starting point for users.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
25 It should be kept correct and concise, usable out-of-the-box.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
26
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
27 For more information, on how to customize this file, please see
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
28 http://wiki.apache.org/solr/SchemaXml
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
29
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
30 PERFORMANCE NOTE: this schema includes many optional features and should not
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
31 be used for benchmarking. To improve performance one could
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
32 - set stored="false" for all fields possible (esp large fields) when you
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
33 only need to search on the field but don't need to return the original
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
34 value.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
35 - set indexed="false" if you don't need to search on the field, but only
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
36 return the field as a result of searching on other indexed fields.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
37 - remove all unneeded copyField statements
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
38 - for best index size and searching performance, set "index" to false
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
39 for all general text fields, use copyField to copy them to the
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
40 catchall "text" field, and use that for searching.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
41 - For maximum indexing performance, use the StreamingUpdateSolrServer
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
42 java client.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
43 - Remember to run the JVM in server mode, and use a higher logging level
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
44 that avoids logging every request
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
45 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
46
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
47 <schema name="example" version="1.5">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
48 <!-- attribute "name" is the name of this schema and is only used for display purposes.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
49 version="x.y" is Solr's version number for the schema syntax and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
50 semantics. It should not normally be changed by applications.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
51
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
52 1.0: multiValued attribute did not exist, all fields are multiValued
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
53 by nature
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
54 1.1: multiValued attribute introduced, false by default
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
55 1.2: omitTermFreqAndPositions attribute introduced, true by default
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
56 except for text fields.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
57 1.3: removed optional field compress feature
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
58 1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
59 behavior when a single string produces multiple tokens. Defaults
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
60 to off for version >= 1.4
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
61 1.5: omitNorms defaults to true for primitive field types
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
62 (int, float, boolean, string...)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
63 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
64
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
65 <fields>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
66 <!-- Valid attributes for fields:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
67 name: mandatory - the name for the field
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
68 type: mandatory - the name of a field type from the
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
69 <types> fieldType section
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
70 indexed: true if this field should be indexed (searchable or sortable)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
71 stored: true if this field should be retrievable
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
72 multiValued: true if this field may contain multiple values per document
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
73 omitNorms: (expert) set to true to omit the norms associated with
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
74 this field (this disables length normalization and index-time
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
75 boosting for the field, and saves some memory). Only full-text
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
76 fields or fields that need an index-time boost need norms.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
77 Norms are omitted for primitive (non-analyzed) types by default.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
78 termVectors: [false] set to true to store the term vector for a
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
79 given field.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
80 When using MoreLikeThis, fields used for similarity should be
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
81 stored for best performance.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
82 termPositions: Store position information with the term vector.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
83 This will increase storage costs.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
84 termOffsets: Store offset information with the term vector. This
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
85 will increase storage costs.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
86 required: The field is required. It will throw an error if the
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
87 value does not exist
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
88 default: a value that should be used if no value is specified
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
89 when adding a document.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
90 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
91
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
92 <!-- field names should consist of alphanumeric or underscore characters only and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
93 not start with a digit. This is not currently strictly enforced,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
94 but other field names will not have first class support from all components
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
95 and back compatibility is not guaranteed. Names with both leading and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
96 trailing underscores (e.g. _version_) are reserved.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
97 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
98
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
99 <field name="archive-path" type="string" indexed="true" stored="true" required="true" multiValued="false" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
100
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
101 <field name="archive-creation-date" type="string" indexed="true" stored="true" required="false" multiValued="false" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
102 <field name="year" type="int" indexed="true" stored="true" required="false" multiValued="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
103
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
104
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
105
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
106 <!-- Common metadata fields, named specifically to match up with
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
107 SolrCell metadata when parsing rich documents such as Word, PDF.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
108 Some fields are multiValued only because Tika currently may return
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
109 multiple values for them. Some metadata is parsed from the documents,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
110 but there are some which come from the client context:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
111 "content_type": From the HTTP headers of incoming stream
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
112 "resourcename": From SolrCell request param resource.name
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
113 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
114 <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
115 <field name="subtitle" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
116 <field name="author" type="author" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
117 <field name="author_c" type="author_c" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
118 <field name="keyword" type="string" indexed="true" stored="true" multiValued="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
119 <field name="date" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
120 <field name="access" type="text_general" indexed="true" stored="true" multiValued="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
121 <field name="access-type" type="text_general" indexed="true" stored="true" multiValued="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
122 <field name="mpiwg-dri" type="text_general" indexed="true" stored="true" multiValued="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
123 <field name="all-bib-data" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
124
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
125 <field name="doc-type" type="string" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
126
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
127
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
128 <field name="content" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
129 <field name="collectionid" type="string" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
130 <field name="collection" type="string" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
131
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
132
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
133 <!-- catchall field, containing all other searchable text fields (implemented
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
134 via copyField further on in this schema -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
135 <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
136
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
137 <!-- catchall text field that indexes tokens both normally and in reverse for efficient
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
138 leading wildcard queries. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
139 <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
140
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
141 <!-- non-tokenized version of manufacturer to make it easier to sort or group
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
142 results by manufacturer. copied from "manu" via copyField -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
143
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
144 <field name="_version_" type="long" indexed="true" stored="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
145
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
146 <!-- Uncommenting the following will create a "timestamp" field using
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
147 a default value of "NOW" to indicate when each document was indexed.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
148 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
149 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
150 <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
151 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
152
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
153 <!-- Dynamic field definitions allow using convention over configuration
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
154 for fields via the specification of patterns to match field names.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
155 EXAMPLE: name="*_i" will match any field ending in _i (like myid_i, z_i)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
156 RESTRICTION: the glob-like pattern in the name attribute must have
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
157 a "*" only at the start or the end. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
158
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
159 <dynamicField name="IM_*" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
160 <dynamicField name="TT_*" type="text_general" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
161 <field name="text-url-path" type="string" indexed="true" stored="true" multiValued="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
162
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
163
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
164 <!-- String typed fields for better sorting-->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
165 <dynamicField name="*_s" type="string" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
166
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
167 <!-- Type used to index the lat and lon components for the "location" FieldType -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
168
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
169 <!-- some trie-coded dynamic fields for faster range queries -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
170
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
171 <!-- uncomment the following to ignore any fields that don't already match an existing
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
172 field name or dynamic field, rather than reporting them as an error.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
173 alternately, change the type="ignored" to some other type e.g. "text" if you want
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
174 unknown fields indexed and/or stored by default -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
175 <!--dynamicField name="*" type="ignored" multiValued="true" /-->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
176 <field name="date2year" type="date2year" indexed="true" stored="true" multiValued="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
177 </fields>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
178
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
179
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
180 <!-- Field to use to determine and enforce document uniqueness.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
181 Unless this field is marked with required="false", it will be a required field
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
182 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
183 <uniqueKey>archive-path</uniqueKey>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
184 <defaultSearchField>title</defaultSearchField>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
185 <copyField source="IM_year" dest="year"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
186 <copyField source="IM_year" dest="date"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
187
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
188
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
189 <copyField source="IM_date" dest="date2year"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
190
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
191
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
192 <!-- copy to author field (author is splitted by ';' -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
193 <copyField source="IM_author" dest="author"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
194 <copyField source="IM_author" dest="author_c"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
195 <copyField source="IM_title" dest="title"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
196 <copyField source="subtitle" dest="title"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
197 <copyField source="IM_keyword" dest="keyword"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
198
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
199 <copyField source="IM_title" dest="title_s"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
200 <copyField source="subtitle" dest="title_s"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
201
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
202 <copyField source="IM_title" dest="maintitle_s"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
203 <copyField source="TT_text-url-path" dest="text-url-path"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
204
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
205
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
206 <!-- Above, multiple source fields are copied to the [text] field.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
207 Another way to map multiple source fields to the same
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
208 destination field is to use the dynamic field syntax.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
209 copyField also supports a maxChars to copy setting. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
210
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
211 <!-- <copyField source="*_t" dest="text" maxChars="3000"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
212
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
213 <!-- copy name to alphaNameSort, a field designed for sorting by name -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
214 <!-- <copyField source="name" dest="alphaNameSort"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
215
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
216 <types>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
217 <!-- field type definitions. The "name" attribute is
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
218 just a label to be used by field definitions. The "class"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
219 attribute and any other attributes determine the real
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
220 behavior of the fieldType.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
221 Class names starting with "solr" refer to java classes in a
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
222 standard package such as org.apache.solr.analysis
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
223 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
224
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
225 <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
226 <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
227
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
228 <!-- boolean type: "true" or "false" -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
229 <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
230
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
231 <!-- sortMissingLast and sortMissingFirst attributes are optional attributes are
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
232 currently supported on types that are sorted internally as strings
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
233 and on numeric types.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
234 This includes "string","boolean", and, as of 3.5 (and 4.x),
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
235 int, float, long, date, double, including the "Trie" variants.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
236 - If sortMissingLast="true", then a sort on this field will cause documents
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
237 without the field to come after documents with the field,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
238 regardless of the requested sort order (asc or desc).
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
239 - If sortMissingFirst="true", then a sort on this field will cause documents
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
240 without the field to come before documents with the field,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
241 regardless of the requested sort order.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
242 - If sortMissingLast="false" and sortMissingFirst="false" (the default),
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
243 then default lucene sorting will be used which places docs without the
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
244 field first in an ascending sort and last in a descending sort.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
245 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
246
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
247 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
248 Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
249 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
250 <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
251 <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
252 <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
253 <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
254
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
255 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
256 Numeric field types that index each value at various levels of precision
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
257 to accelerate range queries when the number of values between the range
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
258 endpoints is large. See the javadoc for NumericRangeQuery for internal
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
259 implementation details.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
260
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
261 Smaller precisionStep values (specified in bits) will lead to more tokens
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
262 indexed per value, slightly larger index size, and faster range queries.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
263 A precisionStep of 0 disables indexing at different precision levels.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
264 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
265 <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
266 <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
267 <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
268 <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
269
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
270 <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
271 is a more restricted form of the canonical representation of dateTime
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
272 http://www.w3.org/TR/xmlschema-2/#dateTime
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
273 The trailing "Z" designates UTC time and is mandatory.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
274 Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
275 All other components are mandatory.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
276
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
277 Expressions can also be used to denote calculations that should be
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
278 performed relative to "NOW" to determine the value, ie...
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
279
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
280 NOW/HOUR
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
281 ... Round to the start of the current hour
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
282 NOW-1DAY
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
283 ... Exactly 1 day prior to now
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
284 NOW/DAY+6MONTHS+3DAYS
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
285 ... 6 months and 3 days in the future from the start of
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
286 the current day
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
287
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
288 Consult the DateField javadocs for more information.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
289
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
290 Note: For faster range queries, consider the tdate type
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
291 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
292 <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
293
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
294 <!-- A Trie based date field for faster date range queries and date faceting. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
295 <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
296
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
297
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
298 <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
299 <fieldtype name="binary" class="solr.BinaryField"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
300
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
301 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
302 Note:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
303 These should only be used for compatibility with existing indexes (created with lucene or older Solr versions).
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
304 Use Trie based fields instead. As of Solr 3.5 and 4.x, Trie based fields support sortMissingFirst/Last
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
305
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
306 Plain numeric field types that store and index the text
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
307 value verbatim (and hence don't correctly support range queries, since the
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
308 lexicographic ordering isn't equal to the numeric ordering)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
309 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
310 <fieldType name="pint" class="solr.IntField"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
311 <fieldType name="plong" class="solr.LongField"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
312 <fieldType name="pfloat" class="solr.FloatField"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
313 <fieldType name="pdouble" class="solr.DoubleField"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
314 <fieldType name="pdate" class="solr.DateField" sortMissingLast="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
315
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
316 <!-- The "RandomSortField" is not used to store or search any
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
317 data. You can declare fields of this type it in your schema
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
318 to generate pseudo-random orderings of your docs for sorting
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
319 or function purposes. The ordering is generated based on the field
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
320 name and the version of the index. As long as the index version
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
321 remains unchanged, and the same field name is reused,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
322 the ordering of the docs will be consistent.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
323 If you want different psuedo-random orderings of documents,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
324 for the same version of the index, use a dynamicField and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
325 change the field name in the request.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
326 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
327 <fieldType name="random" class="solr.RandomSortField" indexed="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
328
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
329 <!-- solr.TextField allows the specification of custom text analyzers
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
330 specified as a tokenizer and a list of token filters. Different
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
331 analyzers may be specified for indexing and querying.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
332
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
333 The optional positionIncrementGap puts space between multiple fields of
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
334 this type on the same document, with the purpose of preventing false phrase
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
335 matching across fields.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
336
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
337 For more info on customizing your analyzer chain, please see
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
338 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
339 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
340
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
341 <!-- One can also specify an existing Analyzer class that has a
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
342 default constructor via the class attribute on the analyzer element.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
343 Example:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
344 <fieldType name="text_greek" class="solr.TextField">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
345 <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
346 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
347 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
348
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
349 <!-- A text field that only splits on whitespace for exact matching of words -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
350 <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
351 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
352 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
353 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
354 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
355
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
356 <!-- A general text field that has reasonable, generic
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
357 cross-language defaults: it tokenizes with StandardTokenizer,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
358 removes stop words from case-insensitive "stopwords.txt"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
359 (empty by default), and down cases. At query time only, it
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
360 also applies synonyms. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
361
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
362
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
363
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
364
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
365
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
366
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
367 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
368 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
369 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
370 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
371 <filter class="solr.LengthFilterFactory" min="2" max="100"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
372 <!-- in this example, we will only use synonyms at query time
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
373 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
374 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
375 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
376 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
377 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
378 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
379 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
380 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
381 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
382 <filter class="solr.LengthFilterFactory" min="2" max="100" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
383 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
384 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
385
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
386
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
387 <fieldType name="author" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
388 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
389 <charFilter class="solr.PatternReplaceCharFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
390 pattern="(\[Hrsg.\])" replacement=""/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
391 <charFilter class="solr.MappingCharFilterFactory" mapping="author_replace.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
392 <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*[;\n\r]\s*"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
393 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
394 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
395 <filter class="solr.LengthFilterFactory" min="2" max="100"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
396 <filter class="solr.TrimFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
397 <!-- in this example, we will only use synonyms at query time
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
398 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
399 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
400 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
401 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
402 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
403 <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*;\s*"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
404 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
405 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
406 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
407 <filter class="solr.TrimFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
408 <filter class="solr.LengthFilterFactory" min="2" max="100" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
409 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
410 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
411
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
412 <fieldType name="author_c" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
413 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
414 <charFilter class="solr.PatternReplaceCharFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
415 pattern="(\[Hrsg.\])" replacement=""/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
416 <charFilter class="solr.MappingCharFilterFactory" mapping="author_replace.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
417 <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*[;\n\r]\s*"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
418 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
419 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
420 <filter class="solr.LengthFilterFactory" min="2" max="100"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
421 <filter class="solr.TrimFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
422 <!-- in this example, we will only use synonyms at query time
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
423 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
424 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
425
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
426 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
427 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
428 <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*;\s*"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
429 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
430 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
431 <filter class="solr.TrimFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
432 <filter class="solr.LengthFilterFactory" min="2" max="100" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
433 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
434 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
435
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
436
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
437 <fieldType name="date2year" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
438 <!--<analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
439 <tokenizer class="solr.PatternTokenizerFactory" pattern="?!(.*\.)"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
440 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
441 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
442 <tokenizer class="solr.PatternTokenizerFactory" pattern="?!(.*\.)"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
443 </analyzer>-->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
444 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
445
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
446
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
447 <!-- A text field with defaults appropriate for English: it
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
448 tokenizes with StandardTokenizer, removes English stop words
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
449 (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
450 finally applies Porter's stemming. The query time analyzer
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
451 also applies synonyms from synonyms.txt. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
452 <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
453 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
454 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
455 <!-- in this example, we will only use synonyms at query time
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
456 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
457 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
458 <!-- Case insensitive stop word removal.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
459 add enablePositionIncrements=true in both the index and query
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
460 analyzers to leave a 'gap' for more accurate phrase queries.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
461 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
462 <filter class="solr.StopFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
463 ignoreCase="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
464 words="lang/stopwords_en.txt"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
465 enablePositionIncrements="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
466 />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
467 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
468 <filter class="solr.EnglishPossessiveFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
469 <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
470 <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
471 <filter class="solr.EnglishMinimalStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
472 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
473 <filter class="solr.PorterStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
474 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
475 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
476 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
477 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
478 <filter class="solr.StopFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
479 ignoreCase="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
480 words="lang/stopwords_en.txt"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
481 enablePositionIncrements="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
482 />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
483 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
484 <filter class="solr.EnglishPossessiveFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
485 <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
486 <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
487 <filter class="solr.EnglishMinimalStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
488 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
489 <filter class="solr.PorterStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
490 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
491 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
492
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
493 <!-- A text field with defaults appropriate for English, plus
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
494 aggressive word-splitting and autophrase features enabled.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
495 This field is just like text_en, except it adds
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
496 WordDelimiterFilter to enable splitting and matching of
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
497 words on case-change, alpha numeric boundaries, and
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
498 non-alphanumeric chars. This means certain compound word
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
499 cases will work, for example query "wi fi" will match
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
500 document "WiFi" or "wi-fi".
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
501 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
502 <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
503 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
504 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
505 <!-- in this example, we will only use synonyms at query time
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
506 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
507 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
508 <!-- Case insensitive stop word removal.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
509 add enablePositionIncrements=true in both the index and query
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
510 analyzers to leave a 'gap' for more accurate phrase queries.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
511 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
512 <filter class="solr.StopFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
513 ignoreCase="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
514 words="lang/stopwords_en.txt"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
515 enablePositionIncrements="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
516 />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
517 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
518 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
519 <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
520 <filter class="solr.PorterStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
521 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
522 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
523 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
524 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
525 <filter class="solr.StopFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
526 ignoreCase="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
527 words="lang/stopwords_en.txt"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
528 enablePositionIncrements="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
529 />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
530 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
531 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
532 <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
533 <filter class="solr.PorterStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
534 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
535 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
536
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
537 <!-- Less flexible matching, but less false matches. Probably not ideal for product names,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
538 but may be good for SKUs. Can insert dashes in the wrong place and still match. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
539 <fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
540 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
541 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
542 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
543 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
544 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
545 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
546 <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
547 <filter class="solr.EnglishMinimalStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
548 <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
549 possible with WordDelimiterFilter in conjuncton with stemming. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
550 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
551 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
552 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
553
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
554 <!-- Just like text_general except it reverses the characters of
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
555 each token, to enable more efficient leading wildcard queries. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
556 <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
557 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
558 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
559 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
560 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
561 <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
562 maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
563 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
564 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
565 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
566 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
567 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
568 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
569 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
570 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
571
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
572 <!-- charFilter + WhitespaceTokenizer -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
573 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
574 <fieldType name="text_char_norm" class="solr.TextField" positionIncrementGap="100" >
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
575 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
576 <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
577 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
578 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
579 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
580 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
581
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
582 <!-- This is an example of using the KeywordTokenizer along
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
583 With various TokenFilterFactories to produce a sortable field
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
584 that does not include some properties of the source text
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
585 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
586 <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
587 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
588 <!-- KeywordTokenizer does no actual tokenizing, so the entire
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
589 input string is preserved as a single token
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
590 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
591 <tokenizer class="solr.KeywordTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
592 <!-- The LowerCase TokenFilter does what you expect, which can be
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
593 when you want your sorting to be case insensitive
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
594 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
595 <filter class="solr.LowerCaseFilterFactory" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
596 <!-- The TrimFilter removes any leading or trailing whitespace -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
597 <filter class="solr.TrimFilterFactory" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
598 <!-- The PatternReplaceFilter gives you the flexibility to use
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
599 Java Regular expression to replace any sequence of characters
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
600 matching a pattern with an arbitrary replacement string,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
601 which may include back references to portions of the original
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
602 string matched by the pattern.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
603
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
604 See the Java Regular Expression documentation for more
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
605 information on pattern and replacement string syntax.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
606
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
607 http://java.sun.com/j2se/1.6.0/docs/api/java/util/regex/package-summary.html
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
608 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
609 <filter class="solr.PatternReplaceFilterFactory"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
610 pattern="([^a-z])" replacement="" replace="all"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
611 />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
612 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
613 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
614
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
615 <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
616 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
617 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
618 <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
619 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
620 </fieldtype>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
621
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
622 <fieldtype name="payloads" stored="false" indexed="true" class="solr.TextField" >
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
623 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
624 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
625 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
626 The DelimitedPayloadTokenFilter can put payloads on tokens... for example,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
627 a token of "foo|1.4" would be indexed as "foo" with a payload of 1.4f
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
628 Attributes of the DelimitedPayloadTokenFilterFactory :
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
629 "delimiter" - a one character delimiter. Default is | (pipe)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
630 "encoder" - how to encode the following value into a playload
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
631 float -> org.apache.lucene.analysis.payloads.FloatEncoder,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
632 integer -> o.a.l.a.p.IntegerEncoder
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
633 identity -> o.a.l.a.p.IdentityEncoder
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
634 Fully Qualified class name implementing PayloadEncoder, Encoder must have a no arg constructor.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
635 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
636 <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
637 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
638 </fieldtype>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
639
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
640 <!-- lowercases the entire field value, keeping it as a single token. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
641 <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
642 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
643 <tokenizer class="solr.KeywordTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
644 <filter class="solr.LowerCaseFilterFactory" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
645 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
646 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
647
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
648 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
649 Example of using PathHierarchyTokenizerFactory at index time, so
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
650 queries for paths match documents at that path, or in descendent paths
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
651 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
652 <fieldType name="descendent_path" class="solr.TextField">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
653 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
654 <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
655 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
656 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
657 <tokenizer class="solr.KeywordTokenizerFactory" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
658 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
659 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
660 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
661 Example of using PathHierarchyTokenizerFactory at query time, so
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
662 queries for paths match documents at that path, or in ancestor paths
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
663 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
664 <fieldType name="ancestor_path" class="solr.TextField">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
665 <analyzer type="index">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
666 <tokenizer class="solr.KeywordTokenizerFactory" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
667 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
668 <analyzer type="query">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
669 <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
670 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
671 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
672
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
673 <!-- since fields of this type are by default not stored or indexed,
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
674 any data added to them will be ignored outright. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
675 <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
676
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
677 <!-- This point type indexes the coordinates as separate fields (subFields)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
678 If subFieldType is defined, it references a type, and a dynamic field
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
679 definition is created matching *___<typename>. Alternately, if
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
680 subFieldSuffix is defined, that is used to create the subFields.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
681 Example: if subFieldType="double", then the coordinates would be
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
682 indexed in fields myloc_0___double,myloc_1___double.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
683 Example: if subFieldSuffix="_d" then the coordinates would be indexed
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
684 in fields myloc_0_d,myloc_1_d
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
685 The subFields are an implementation detail of the fieldType, and end
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
686 users normally should not need to know about them.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
687 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
688 <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
689
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
690 <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
691 <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
692
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
693 <!-- An alternative geospatial field type new to Solr 4. It supports multiValued and polygon shapes.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
694 For more information about this and other Spatial fields new to Solr 4, see:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
695 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
696 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
697 <!-- <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
698 geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" /> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
699
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
700 <!-- Money/currency field type. See http://wiki.apache.org/solr/MoneyFieldType
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
701 Parameters:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
702 defaultCurrency: Specifies the default currency if none specified. Defaults to "USD"
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
703 precisionStep: Specifies the precisionStep for the TrieLong field used for the amount
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
704 providerClass: Lets you plug in other exchange provider backend:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
705 solr.FileExchangeRateProvider is the default and takes one parameter:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
706 currencyConfig: name of an xml file holding exchange rates
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
707 solr.OpenExchangeRatesOrgProvider uses rates from openexchangerates.org:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
708 ratesFileLocation: URL or path to rates JSON file (default latest.json on the web)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
709 refreshInterval: Number of minutes between each rates fetch (default: 1440, min: 60)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
710 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
711 <!-- <fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" /> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
712
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
713 <!-- some examples for different languages (generally ordered by ISO code) -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
714
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
715 <!-- Arabic -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
716 <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
717 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
718 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
719 <!-- for any non-arabic -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
720 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
721 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ar.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
722 <!-- normalizes ��� to ���, etc -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
723 <filter class="solr.ArabicNormalizationFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
724 <filter class="solr.ArabicStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
725 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
726 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
727
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
728 <!-- Bulgarian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
729 <fieldType name="text_bg" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
730 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
731 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
732 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
733 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_bg.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
734 <filter class="solr.BulgarianStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
735 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
736 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
737
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
738 <!-- Catalan -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
739 <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
740 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
741 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
742 <!-- removes l', etc -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
743 <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ca.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
744 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
745 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ca.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
746 <filter class="solr.SnowballPorterFilterFactory" language="Catalan"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
747 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
748 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
749
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
750 <!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
751 <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
752 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
753 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
754 <!-- normalize width before bigram, as e.g. half-width dakuten combine -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
755 <filter class="solr.CJKWidthFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
756 <!-- for any non-CJK -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
757 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
758 <filter class="solr.CJKBigramFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
759 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
760 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
761
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
762 <!-- Czech -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
763 <fieldType name="text_cz" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
764 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
765 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
766 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
767 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_cz.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
768 <filter class="solr.CzechStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
769 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
770 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
771
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
772 <!-- Danish -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
773 <fieldType name="text_da" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
774 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
775 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
776 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
777 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_da.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
778 <filter class="solr.SnowballPorterFilterFactory" language="Danish"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
779 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
780 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
781
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
782 <!-- German -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
783 <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
784 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
785 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
786 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
787 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
788 <filter class="solr.GermanNormalizationFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
789 <filter class="solr.GermanLightStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
790 <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
791 <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
792 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
793 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
794
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
795 <!-- Greek -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
796 <fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
797 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
798 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
799 <!-- greek specific lowercase for sigma -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
800 <filter class="solr.GreekLowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
801 <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_el.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
802 <filter class="solr.GreekStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
803 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
804 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
805
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
806 <!-- Spanish -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
807 <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
808 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
809 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
810 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
811 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
812 <filter class="solr.SpanishLightStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
813 <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
814 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
815 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
816
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
817 <!-- Basque -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
818 <fieldType name="text_eu" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
819 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
820 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
821 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
822 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_eu.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
823 <filter class="solr.SnowballPorterFilterFactory" language="Basque"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
824 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
825 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
826
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
827 <!-- Persian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
828 <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
829 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
830 <!-- for ZWNJ -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
831 <charFilter class="solr.PersianCharFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
832 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
833 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
834 <filter class="solr.ArabicNormalizationFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
835 <filter class="solr.PersianNormalizationFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
836 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fa.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
837 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
838 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
839
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
840 <!-- Finnish -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
841 <fieldType name="text_fi" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
842 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
843 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
844 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
845 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fi.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
846 <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
847 <!-- less aggressive: <filter class="solr.FinnishLightStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
848 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
849 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
850
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
851 <!-- French -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
852 <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
853 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
854 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
855 <!-- removes l', etc -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
856 <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
857 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
858 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
859 <filter class="solr.FrenchLightStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
860 <!-- less aggressive: <filter class="solr.FrenchMinimalStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
861 <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="French"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
862 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
863 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
864
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
865 <!-- Irish -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
866 <fieldType name="text_ga" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
867 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
868 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
869 <!-- removes d', etc -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
870 <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ga.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
871 <!-- removes n-, etc. position increments is intentionally false! -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
872 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/hyphenations_ga.txt" enablePositionIncrements="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
873 <filter class="solr.IrishLowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
874 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ga.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
875 <filter class="solr.SnowballPorterFilterFactory" language="Irish"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
876 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
877 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
878
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
879 <!-- Galician -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
880 <fieldType name="text_gl" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
881 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
882 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
883 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
884 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_gl.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
885 <filter class="solr.GalicianStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
886 <!-- less aggressive: <filter class="solr.GalicianMinimalStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
887 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
888 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
889
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
890 <!-- Hindi -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
891 <fieldType name="text_hi" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
892 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
893 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
894 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
895 <!-- normalizes unicode representation -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
896 <filter class="solr.IndicNormalizationFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
897 <!-- normalizes variation in spelling -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
898 <filter class="solr.HindiNormalizationFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
899 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hi.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
900 <filter class="solr.HindiStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
901 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
902 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
903
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
904 <!-- Hungarian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
905 <fieldType name="text_hu" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
906 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
907 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
908 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
909 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hu.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
910 <filter class="solr.SnowballPorterFilterFactory" language="Hungarian"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
911 <!-- less aggressive: <filter class="solr.HungarianLightStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
912 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
913 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
914
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
915 <!-- Armenian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
916 <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
917 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
918 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
919 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
920 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
921 <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
922 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
923 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
924
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
925 <!-- Indonesian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
926 <fieldType name="text_id" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
927 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
928 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
929 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
930 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_id.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
931 <!-- for a less aggressive approach (only inflectional suffixes), set stemDerivational to false -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
932 <filter class="solr.IndonesianStemFilterFactory" stemDerivational="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
933 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
934 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
935
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
936 <!-- Italian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
937 <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
938 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
939 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
940 <!-- removes l', etc -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
941 <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
942 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
943 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
944 <filter class="solr.ItalianLightStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
945 <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Italian"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
946 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
947 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
948
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
949 <!-- Japanese using morphological analysis (see text_cjk for a configuration using bigramming)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
950
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
951 NOTE: If you want to optimize search for precision, use default operator AND in your query
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
952 parser config with <solrQueryParser defaultOperator="AND"/> further down in this file. Use
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
953 OR if you would like to optimize for recall (default).
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
954 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
955 <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
956 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
957 <!-- Kuromoji Japanese morphological analyzer/tokenizer (JapaneseTokenizer)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
958
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
959 Kuromoji has a search mode (default) that does segmentation useful for search. A heuristic
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
960 is used to segment compounds into its parts and the compound itself is kept as synonym.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
961
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
962 Valid values for attribute mode are:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
963 normal: regular segmentation
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
964 search: segmentation useful for search with synonyms compounds (default)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
965 extended: same as search mode, but unigrams unknown words (experimental)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
966
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
967 For some applications it might be good to use search mode for indexing and normal mode for
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
968 queries to reduce recall and prevent parts of compounds from being matched and highlighted.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
969 Use <analyzer type="index"> and <analyzer type="query"> for this and mode normal in query.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
970
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
971 Kuromoji also has a convenient user dictionary feature that allows overriding the statistical
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
972 model with your own entries for segmentation, part-of-speech tags and readings without a need
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
973 to specify weights. Notice that user dictionaries have not been subject to extensive testing.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
974
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
975 User dictionary attributes are:
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
976 userDictionary: user dictionary filename
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
977 userDictionaryEncoding: user dictionary encoding (default is UTF-8)
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
978
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
979 See lang/userdict_ja.txt for a sample user dictionary file.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
980
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
981 Punctuation characters are discarded by default. Use discardPunctuation="false" to keep them.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
982
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
983 See http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese language support.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
984 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
985 <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
986 <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt"/>-->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
987 <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (���������) -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
988 <filter class="solr.JapaneseBaseFormFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
989 <!-- Removes tokens with certain part-of-speech tags -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
990 <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
991 <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
992 <filter class="solr.CJKWidthFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
993 <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
994 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" enablePositionIncrements="true" />
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
995 <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
996 <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
997 <!-- Lower-cases romaji characters -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
998 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
999 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1000 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1001
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1002 <!-- Latvian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1003 <fieldType name="text_lv" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1004 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1005 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1006 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1007 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_lv.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1008 <filter class="solr.LatvianStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1009 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1010 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1011
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1012 <!-- Dutch -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1013 <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1014 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1015 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1016 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1017 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1018 <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1019 <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1020 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1021 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1022
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1023 <!-- Norwegian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1024 <fieldType name="text_no" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1025 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1026 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1027 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1028 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_no.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1029 <filter class="solr.SnowballPorterFilterFactory" language="Norwegian"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1030 <!-- less aggressive: <filter class="solr.NorwegianLightStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1031 <!-- singular/plural: <filter class="solr.NorwegianMinimalStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1032 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1033 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1034
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1035 <!-- Portuguese -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1036 <fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1037 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1038 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1039 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1040 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1041 <filter class="solr.PortugueseLightStemFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1042 <!-- less aggressive: <filter class="solr.PortugueseMinimalStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1043 <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Portuguese"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1044 <!-- most aggressive: <filter class="solr.PortugueseStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1045 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1046 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1047
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1048 <!-- Romanian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1049 <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1050 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1051 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1052 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1053 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1054 <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1055 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1056 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1057
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1058 <!-- Russian -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1059 <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1060 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1061 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1062 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1063 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1064 <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1065 <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1066 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1067 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1068
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1069 <!-- Swedish -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1070 <fieldType name="text_sv" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1071 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1072 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1073 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1074 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_sv.txt" format="snowball" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1075 <filter class="solr.SnowballPorterFilterFactory" language="Swedish"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1076 <!-- less aggressive: <filter class="solr.SwedishLightStemFilterFactory"/> -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1077 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1078 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1079
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1080 <!-- Thai -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1081 <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1082 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1083 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1084 <filter class="solr.LowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1085 <filter class="solr.ThaiWordFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1086 <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_th.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1087 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1088 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1089
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1090 <!-- Turkish -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1091 <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1092 <analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1093 <tokenizer class="solr.StandardTokenizerFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1094 <filter class="solr.TurkishLowerCaseFilterFactory"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1095 <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1096 <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1097 </analyzer>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1098 </fieldType>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1099
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1100 </types>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1101
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1102 <!-- Similarity is the scoring routine for each document vs. a query.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1103 A custom Similarity or SimilarityFactory may be specified here, but
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1104 the default is fine for most applications.
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1105 For more info: http://wiki.apache.org/solr/SchemaXml#Similarity
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1106 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1107 <!--
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1108 <similarity class="com.example.solr.CustomSimilarityFactory">
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1109 <str name="paramkey">param value</str>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1110 </similarity>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1111 -->
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1112
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1113 </schema>
a2b4f67e73dc initial
Dirk Wintergruen <dwinter@mpiwg-berlin.mpg.de>
parents:
diff changeset
1114