Working with FAST Search for SharePoint and Multivalued Properties

Imagine the following scenario. You have some content in a database (or file share, or SharePoint site), and this content has some metadata that is comprised of multiple values for one specific field, such as a list of authors in a book, or a list of contributors for a project, or even a list of departments associated with an item. Your first question is: how do I configure this multivalued property (author, contributors, departments) to be crawled by FAST Search for SharePoint (FS4SP)?

After some thinking, you decide to return all of those values inside this one field, using a separator such as a semi-colon as a delimiter between each individual value. You run a full crawl against this content, find the crawled property associated with this multivalued metadata, map it to a new managed property and expose it in a refiner. All beautiful, correct?

Well, not quite. When you look at your resulting refiner, this is what you see:

FS4SP Multivalued Refiners

Notice how, instead of considering each individual value in the property, FS4SP is considering the whole property as one big value, which results in the refiner counters being all off.

The issue here is that FS4SP doesn’t know that this is a multivalued property, as the semi-colon is not a separator that it recognizes for multivalued items. To be able to get FS4SP to recognize your multivalued property and display the refiners correctly, you will need to follow a few steps:

  1. Configure the Managed Property with the correct options
  2. Create a custom processing component to apply the correct multivalued character separator
  3. Configure the Pipeline Extensibility to call your custom processing component and re-crawl your content
  4. Troubleshooting (lets hope this won’t be needed Smile)

Configure the Managed Property with the correct options

The first thing you have to do is configure your multivalued Managed Property with the option MergeCrawledProperties set to true. You can do this through PowerShell using the Set-FASTSearchMetadataManagedProperty cmdlet, or you can do this through Central Administration, as shown below:

FS4SP MergeCrawledProperties setting

This is detailed in the MSDN documentation for the ManagedProperty Interface, where it defines:

MergeCrawledProperties

Specifies whether to include the contents of all crawled properties mapped to a managed property. If this setting is disabled, the value of the first non-empty crawled property is used as the contents of the managed property.

This property must also be set to True to include all values from a multivalued crawled property. If set to False, only the first value from a multivalued crawled property is mapped to the managed property.

Create a custom processing component to apply the correct multivalued character separator

As I mentioned above, the main issue with the semi-colon character used as a separator is that FS4SP doesn’t recognize it as a multivalued separator, so in order to do this correctly you must create a custom processing component (in C#, in PowerShell, or any other language) that can replace the simple string separator (in this case the semi-colon), with the special multivalued separator that FS4SP can recognize (“\u2029”). The detailed procedure to incorporate a custom processing component is detailed in this reference on MSDN.

In my specific case, I followed the great steps described by Mikael Svenson on how to use PowerShell to create quick-and-easy custom processing components. This proved to be a very quick approach to get my customization in place and be able to test it very quickly. Still, you should do this only for prototyping, as Mikael describes, because there is a performance penalty associated with the use of PowerShell, so it is recommended that you “port the code over to e.g. C# when you are done testing your code”.

My final custom code (directly inspired by Mikael’s post) to replace the semi-colon separator with the proper multivalued separator is shown below:

function CreateXml()
{
    param ([string]$set, [string]$name, [int]$type, $value)

    $resultXml = New-Object xml
    $doc = $resultXml.CreateElement("Document")

    $crawledProperty = $resultXml.CreateElement("CrawledProperty")
    $propSet = $resultXml.CreateAttribute("propertySet")
    $propSet.innerText = $set
    $propName = $resultXml.CreateAttribute("propertyName")
    $propName.innerText = $name
    $varType = $resultXml.CreateAttribute("varType")
    $varType.innerText = $type

    $crawledProperty.Attributes.Append($propSet) > $null
    $crawledProperty.Attributes.Append($propName) > $null
    $crawledProperty.Attributes.Append($varType) > $null

    $crawledProperty.innerText = $value

    $doc.AppendChild($crawledProperty) > $null
    $resultXml.AppendChild($doc) > $null
    $xmlDecl = $resultXml.CreateXmlDeclaration("1.0", "UTF-8", "")
    $el = $resultXml.psbase.DocumentElement
    $resultXml.InsertBefore($xmlDecl, $el) > $null

    return $resultXml
}

function DoWork()
{
    param ([string]$inputFile, [string]$outputFile)    
    $propertyGroupIn = "00130329-0000-0130-c000-000000131346" # SharePoint Crawled Property Category
    $propertyNameIn = "ows_DepartmentTest" # property name
    $dataTypeIn = 31 # string

    $propertyGroupOut = "00130329-0000-0130-c000-000000131346" # SharePoint Crawled Property Category
    $propertyNameOut = "ows_DepartmentTest" # property name
    $dataTypeOut = 31 # string

    $xmldata = [xml](Get-Content $inputFile -Encoding UTF8)
    $node = $xmldata.Document.CrawledProperty | Where-Object {  $_.propertySet -eq $propertyGroupIn -and  $_.propertyName -eq $propertyNameIn -and $_.varType -eq $dataTypeIn }
    $data = $node.innerText

    [char]$multivaluedsep = 0x2029
    [char]$currentsep = ';'
    
    #Replace current separator (semi-colon) with special multivalued separator
    $data = $data.Replace($currentsep, $multivaluedsep)
    
    $resultXml = CreateXml $propertyGroupOut $propertyNameOut $dataTypeOut $data
    $resultXml.OuterXml | Out-File $outputFile -Encoding UTF8
    
    #Copy-Item $inputFile C:\Users\Administrator\AppData\LocalLow
}

# pass input and output file paths as arguments
DoWork $args[0] $args[1]

The first highlighted section above (lines 34 through 40) show the section which defines the input crawled property that will contain the items with the semi-colon separator, as well as the output crawled property that will store the updated content with the correct multivalued separator. In my case, both properties are the same, since I simply want to do an in-place replacement.

The second highlighted section (lines 46 through 50) shows the definitions for the current separator (semi-colon) and for the multivalued separator (0x2029 in PowerShell). In the following line the replacement for the correct separator is applied in the input crawled property string.

Configure the Pipeline Extensibility to call your custom processing component and re-crawl your content

The next important step is to tell FS4SP that you want to call your custom processing component during content processing. To do this you must configure the %FASTSearch%\etc\pipelineextensibility.xml configuration file. This is how this file looked on my system:

<!-- For permissions and the most current information about FAST Search Server 2010 for SharePoint configuration files, see the online documentation, (http://go.microsoft.com/fwlink/?LinkId=1632279). -->

<PipelineExtensibility>
	<Run command="C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe C:\FASTSearch\bin\multivalued.ps1 %(input)s %(output)s">
		<Input>      
			<CrawledProperty propertySet="00130329-0000-0130-c000-000000131346" varType="31" propertyName="ows_DepartmentTest"/>
		</Input>
		<Output>
			<CrawledProperty propertySet="00130329-0000-0130-c000-000000131346" varType="31" propertyName="ows_DepartmentTest"/>
		</Output>
	</Run>
</PipelineExtensibility>

As you can see above, all I’m doing is defining that I want my custom PowerShell script to be called, receiving as an input crawled property my property that contains the contents with the semi-colon separator and then returning as output the same crawled property, in order to just replace its contents with the new-and-updated value, now using the multivalued separator.

After saving this configuration file, the next step is to force your Document Processors to reload their configuration so they can be aware of this new content processing component, which you can accomplish by executing psctrl reset in a command prompt.

With all the pieces in place, you can start a re-crawl of your content and then test your refiner after crawl is complete. If all goes well, your refiner should now look exactly like you wanted!

FS4SP Multivalued Refiner - Correct

Troubleshooting

My main warning is that you pay a lot of attention to the fact that the name of your input and output crawled properties (both in the pipelineextensibility.xml and in the PowerShell script) are case-sensitive.

Many people have spent a very long time troubleshooting their code only to realize that it was a case-sensitive issue with the name of these properties. The best way I found to troubleshoot a new custom processing component is through these techniques:

  1. Investigate the contents of the input file sent to your custom code: as described in this post, the only path in the file system with full access for your custom code is the AppData\LocalLow directory for the account running the FAST Search Service. By uncommenting line 55 in the PowerShell script above, a copy of the input file received by the script will be created in the AppData\LocalLow directory. By looking at the contents of the input file you can detect what is the content of the input crawled property. If the input crawled property doesn’t contain any value, and you are sure that your document has that property, check for issues wite case-sensitive property names.
  2. Validate the list of crawled properties received by FS4SP: you can accomplish this through the use of the optional processing stage FFDDumper.
  3. If both options 1 and 2 look ok, use the input file from step 1 to call your custom code directly and debug it to identify the error (you can debug the PowerShell script above using the ISE Editor)

And that’s it for today. Enjoy your coding and your multivalued properties! Smile

About leonardocsouza

Mix together a passion for social media, search, recommendations, books, writing, movies, education, knowledge sharing plus a few other things and you get me as result :)
This entry was posted in FS4SP and tagged , . Bookmark the permalink.

11 Responses to Working with FAST Search for SharePoint and Multivalued Properties

  1. Philip Helsel says:

    Nicely done Leo, Thanks for sharing.

  2. Zheng Yu Xi says:

    Save my time. Thank. Leo

  3. sri raghu ram says:

    excellent post..Thanks alot.

  4. rishidx says:

    Hi.. Thanks for this post. If there are two crawled properties (say prop 1, prop2) then what changes should be one in the above powershell code???

    • In the case of having two crawled properties, you would need to duplicate the tasks of function DoWork() to replace the semicolon separator with the special multivalued separator for both properties (prop1 and prop2). You would then:

      a) map both crawled properties to the same managed property (if they are supposed to be mapped to the same one); or
      b) map each crawled property to a different managed property (in case they are not related to each other)

      Hope that helps!! 🙂

      –Leo

  5. Timothy says:

    Hi, thank you for this post. I’m new to BCS, and running into a similar problem to the one above. I created a BCS model to crawl an external program, where an record could be related to one or more clients.

    I tried handling this via a Multivalued properties of type List but finding finding that the refiners on the left is displaying all the possible combinations of the client / record relationships.

    For example

    Client A
    Client A;Client C
    Client C; Client D

    I would like to to yeld

    Client A
    Client C
    Client D

    I’m using SharePoint 2010 (not FAST) and reading from a custom search provider which I’ve written myself.

    My question is, would it be better for me to change the property from List to string and simply return the values using a string.Join(char.Parse(‘\u2029’), clients.ToArray())? Would that work, giving me the result I’m looking for?

    All of the instructions I’ve been seeing surrounding this seem to be catered to FAST installations, what about SharePoint Search that comes with SharePoint out-of-box?

    • Hi Timothy!

      I’m sorry about the delay to answer your question. As you have pointed out, my outlined solution works only for FAST Search for SharePoint due to its document processing pipeline. I’m unaware of how you would do this for pure SharePoint Search, but I would expect it to be more of a matter of finding the proper multivalue separator (or format – e.g. array) to use when crawling a multivalued property through BCS. As with the FS4SP case, this usually gets tricky because the search engine expects a specific format for the multivalued property, but unfortunately this isn’t always documented well :-/

      I will let you know if I find something for SharePoint Search-only. Good luck on your quest!

      Best,
      Leo

  6. Timothy Dilbert says:

    How do you set “MergeCrawledProperties” to true? Is Multi-valued properties (i.e. an array or List mapped to a single managed property) supported by non-FAST server. The plain on vanilla SharePoint Search Server?

  7. Pierre says:

    Wow, wish I had found this article then. Although, we were trying to do like Timothy with BCS and never made it work. We’re now migrating to 2013. How do we achieve the same thing in 2013? Thanks, great write-up!

  8. Pingback: Refinements on a string field | DL-UAT

Leave a reply to leonardocsouza Cancel reply