Monday, March 16, 2015

Sitecore error with Lucene Thai Analyzer

ManagedPoolThread #1 2015:03:12 08:32:28 ERROR Exception
Exception: System.Reflection.TargetInvocationException
Message: Exception has been thrown by the target of an invocation.
Source: mscorlib
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
   at (Object , Object[] )
   at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args)
   at Sitecore.Jobs.Job.ThreadEntry(Object state)

Nested Exception

Exception: System.NotSupportedException
Message: PORT ISSUES
Source: Lucene.Net.Contrib.Analyzers
   at Lucene.Net.Analysis.Th.ThaiAnalyzer.ReusableTokenStream(String fieldName, TextReader reader)
   at Lucene.Net.Index.DocInverterPerField.ProcessFields(IFieldable[] fields, Int32 count)
   at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument()
   at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc, Analyzer analyzer, Term delTerm)
   at Lucene.Net.Index.IndexWriter.UpdateDocument(Term term, Document doc, Analyzer analyzer)
   at Sitecore.ContentSearch.LuceneProvider.LuceneUpdateContext.UpdateDocument(Object itemToUpdate, Object criteriaForUpdate, IExecutionContext[] executionContexts)
   at Sitecore.ContentSearch.SitecoreItemCrawler.DoUpdate(IProviderUpdateContext context, SitecoreIndexableItem indexable)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndex.PerformUpdate(IEnumerable`1 indexableUniqueIds, IndexingOptions indexingOptions)

In a single day, we saw this error appear over 9000 times on a production environment.

From what I understand (since 7.0+) Sitecore by default provides full mapping of all available Lucene.net analyzers. They are configured under:
indexConfigurations > defaultLuceneIndexConfiguration > analyzer > param desc="map"
Based on the context of the content that's indexed/searched, Sitecore will (with reflection) figure out which mapping to use. Here’s a great post explaining execution contexts - http://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/08/execution-contexts-explained.aspx

So the Thai Analyzer seems to be a bit broken (read not implemented) from what I see in the Lucene.Net source. The Analyzer calls the constructor for ThaiWordFilter with a token stream and that constructor just throws the exception we see. You can decompile the Lucene.Net.Contrib.Analyzers.dll or look at the source at http://lucenenet.apache.org/.

public ThaiWordFilter(TokenStream input): base(input)
{
  throw new NotSupportedException("PORT ISSUES");
  //breaker = BreakIterator.getWordInstance(new Locale("th"));
  //termAtt = AddAttribute<TermAttribute>();
  //offsetAtt = AddAttribute<OffsetAttribute>();
}

Removing or commenting out the Thai analyzer (the below mapEntry) from the execution context mappings in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config should result in indexing/searching in th-TH to fall back to the standard analyzer and will get rid of the error in your log files.

             <mapEntry type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzerMapEntry, Sitecore.ContentSearch.LuceneProvider">
                <param hint="executionContext" type="Sitecore.ContentSearch.CultureExecutionContext, Sitecore.ContentSearch">
                  <param hint="cultureInfo" type="System.Globalization.CultureInfo, mscorlib">
                    <param hint="name">th-TH</param>
                  </param>
                </param>
                <param desc="analyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
                  <param desc="defaultAnalyzer" type="Lucene.Net.Analysis.Th.ThaiAnalyzer, Lucene.Net.Contrib.Analyzers">
                    <param hint="version">Lucene_30</param>
                  </param>
                </param>
              </mapEntry>

If anyone has come across this before, I'd love to hear from you!


Update: Pavel Veller (@pveller) pointed out to me that this issue has been fixed with Sitecore 7.2 Update 3. As per the release notes:
  • Thai Analyzer from Lucene.Net was not fully implemented and could sometimes throw Not Supported exceptions. The analyzer has been removed from the default Lucene index configuration. The default analyzer will be used instead. (420234)