eFocus Lucene Web Search module

While updating the eFocus Lucene web search module, we came across a few difficult changes in the new Sitecore 7 content search since the old Sitecore 6 Search.

So what changed in this new version of the module? We gave it version number 2.0, which indicates that there were some major changes and even some breaking changes. For instance we removed the old site crawler and created a new one, based on content search. Also we added a new feature, the autocomplete service! This service creates an autocomplete index after each rebuild of the search index, but more about that later on in this article. And last but certainly not least, you can now use the Switch On Rebuild Lucene Index type for your site indexes. This index type has two index directories, between which it switches after every rebuild. So when you start a rebuild, the rebuild is triggered in directory two while the user can still search in directory one. After the rebuild these directories are switched so that the next rebuild is triggered in directory one, while you are now searching in directory two.

To get this module working, you will need to add an index to the Sitecore configs. For example:

<index id="corporate_crawledcontent" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider">
	<param desc="name">$(id)</param>
	<param desc="folder">__corporate_crawledcontent</param>
	<!-- This initializes index property store. Id has to be set to the index id -->
	<param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
	<configuration ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration" />
	<strategies hint="list:AddStrategy">
		<!-- NOTE: order of these is controls the execution order -->
		<strategy ref="contentSearch/indexUpdateStrategies/manual" />
	</strategies>
	<commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
		<policies hint="list:AddCommitPolicy">
			<policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
		</policies>
	</commitPolicyExecutor>
	<locations hint="list:AddCrawler">
	<crawler type="Efocus.Sitecore.LuceneWebSearch.NCrawlerProviderCrawler, Efocus.Sitecore.LuceneWebSearch">
		<Urls hint="list">
			<url>thieme.corporate.localhost.efocus.local</url>
		</Urls>
		<Triggers hint="list">
			<Trigger>efocus:updateindex:corporate_crawledcontent</Trigger>
		</Triggers>
		<Tags>crawled</Tags>
		<Boost>1</Boost>
		<AdhereToRobotRules>true</AdhereToRobotRules>
		<MaximumThreadCount>2</MaximumThreadCount>
		<RegexExcludeFilter>(&amp;sc_mode=preview|\.jpg|\.pdf|\.css|\.js|\.gif|\.jpeg|\.png|\.ico|^((?!corporate).)*$)</RegexExcludeFilter>
		<IndexFilters hint="raw:AddIndexFilter">
			<filter start="&lt;!--BEGIN-NOINDEX--&gt;" end="&lt;!--END-NOINDEX--&gt;" />
		</IndexFilters>
		<FollowFilters hint="raw:AddFollowFilter">
			<filter start="&lt;!--BEGIN-NOFOLLOW--&gt;" end="&lt;!--END-NOFOLLOW--&gt;" />
			<!-- remove <a rel="nofollow"> tags -->
			<!-- <a[^><]+?rel="[^"><"]*nofollow[^"><"]*"(.*?)> -->
			<filter startregex="&lt;a[^&gt;&lt;]+?rel="[^"&gt;&lt;"]*nofollow[^"&gt;&lt;"]*?&quot;" endregex="&gt;" />
			<!-- remove entire documents that have <meta name="robots" content="noindex" /> (regex = <meta name="robots" content="[^"><"]*nofollow[^"><"]*?") -->
			<filter startregex="&lt;meta name=&quot;robots&quot; content=&quot;[^"&gt;&lt;"]*nofollow[^"&gt;&lt;"]*?&quot;" endregex="&lt;/html&gt;" />
		</FollowFilters>
	</crawler>
	</locations>
</index>

If you compare this config to the old implementation you can will see some changes. As said we now use the SwitchOnRebuildLuceneIndex which is mentioned as type in the first row. Secondly you can now ad strategies to the indexes. In this example we only added the manual index strategy, so you will need to trigger a rebuild in the Sitecore control panel. And finally we added a commit policy, so that the index is committed every once in a while rebuilding.

Auto complete

As mentioned earlier we added an auto complete service. After rebuilding a site’s index, another index is created based on the just rebuild one. In this auto complete index are all the different words stored, which are used in the site.

With the auto complete service you can search for the complete word in the auto complete index. Below is a quick example implementation:

string indexesFolder = Settings.GetSetting("IndexFolder", FileUtil.MakePath(Settings.DataFolder, "/indexes"));
AutoCompleteService autoCompleteService = new AutoCompleteService(indexesFolder + "/__corporate_crawledcontent_autocomplete");
autoCompleteService.SearchAutoComplete(indexesFolder + "/__corporate_crawledcontent_autocomplete");
IEnumerable<string> suggestTermsFor = autoCompleteService.SuggestTermsFor(term).ToList();

You could, for example, use this in combination with jQuery autocomplete to show the completed words below a search box.

That’s all there is to know about the changes made in version 2.0. Got any questions? Just leave a comment below!

Leave a Reply

Your email address will not be published. Required fields are marked *