Configuration
Adding a search index
OpenCms comes with a built in search index for the online and offline projects. Since version 7, additional search indices can be created in the administration view. In former versions this had to be done through editing the XML config file opencms-search.xml. (Make a backup before touching these config files!!! OpenCms will not even run if there is a problem in them.)
A additional search index section would go in the indexes section of opencms-search.xml, and would look something like this:
<index> <name>myproject</name> <rebuild>auto</rebuild> <project>Online</project> <locale>en</locale> <sources> <source>source1</source> </sources> </index>
You also need to create an index source to go along with it, it goes in the indexsources section of the XML:
<indexsource> <name>source1</name> <indexer class="org.opencms.search.CmsVfsIndexer"/> <resources> <resource>/sites/mysite/</resource> </resources> <documenttypes-indexed> <name>xmlpage</name> <name>xmlcontent</name> <name>text</name> <name>pdf</name> <name>rtf</name> <name>html</name> <name>msword</name> <name>msexcel</name> <name>mspowerpoint</name> <name>image</name> <name>generic</name> </documenttypes-indexed> </indexsource>
Adding custom document types
Since OpenCms 7 you can add standard document types to your search index in the administration view. The standard types are as follows:
- generic (org.opencms.search.documents.CmsDocumentGeneric)
- html (org.opencms.search.documents.CmsDocumentHtml)
- image (org.opencms.search.documents.CmsDocumentGeneric)
- msexcel (org.opencms.search.documents.CmsDocumentMsExcel)
- mspowerpoint (org.opencms.search.documents.CmsDocumentMsPowerPoint)
- msword (org.opencms.search.documents.CmsDocumentMsWord)
- pdf (org.opencms.search.documents.CmsDocumentPdf)
- rtf (org.opencms.search.documents.CmsDocumentRtf)
- text (org.opencms.search.documents.CmsDocumentPlainText)
- xmlcontent (org.opencms.search.documents.CmsDocumentXmlContent)
- xmlpage (org.opencms.search.documents.CmsDocumentXmlPage)
And if you want to add a custom document type, you'll have to add this to the documenttypes section of the opencms-system.xml at first. You may also have to create a CmsDocumentClass in Java for it, or find one it can share. See Custom_File_and_Folder_Types for more information on how to create a custom document type. Here is an existing sample:
<documenttype> <name>rtf</name> <class>org.opencms.search.documents.CmsDocumentRtf</class> <mimetypes> <mimetype>text/rtf</mimetype> <mimetype>application/rtf</mimetype> </mimetypes> <resourcetypes> <resourcetype>binary</resourcetype> <resourcetype>plain</resourcetype> </resourcetypes> </documenttype>
If you lets say have an xmlcontent news module with the resource and document type news, and you want to add this to your index you have to edit the opencms-search.xml as follows:
go to the
<documenttypes>
-section, copy an existing
<documenttype>
-node (in this case it is best to take the xmlcontent documenttype-node) and modify it to fit your needs. Enter news in to the
<name>
<resourcetype>
<documenttype> <name>news</name> <class>org.opencms.search.documents.CmsDocumentXmlContent</class> <mimetypes/> <resourcetypes> <resourcetype>news</resourcetype> </resourcetypes> </documenttype>
Once you added this to the search-XML (and restarted your servlet container), you can add your custom document type to your search index. Do this either in the OpenCms administration view of in the xml-file manually.
To do the latter, go to the<indexsource>
and add the following code
<documenttypes-indexed> <name>news</name> </documenttypes-indexed>
Writing specific data to the lucene index
Find more about this on the Writing only specific xml-element-data to the lucene index