A very common confusion with FS4SP is the one around the distinctions between a Content Source and a Content Collection. If you just made a funny face while reading this stick around and I’ll try to make things a little bit more clear for you.
A few months ago I saw the following thread on TechNet:
I have created a new content collection “news”. I am able to crawl content and feed it into that specific content collection, but when I do a query it always queries the default “sp” collection.
Is there a way to specifically tell my FAST site to query my “news” content collection?
When I looked at this question, I thought it would be a good idea to step back a little and define what is a Content Collection and what is a Content Source, since those are two different concepts that, depending on how you configure your FS4SP will have distinct and important roles to play. With that in mind, this was my response:
The first question I would ask you is this: what are you trying to achieve by having multiple content collections?
Note that FAST Search for SharePoint has two distinct things:
- Content Sources: those are a logical grouping defined through Central Administration. New content sources are created to crawl distinct types of content or to use different crawl schedules, for example. By default all content crawled through any of these content sources is sent and stored inside FAST in the content collection “sp”;
- Content Collections: those are a logical grouping defined through Windows PowerShell and implemented directly on FAST. You can define multiple content collections to facilitate maintenance when you use one of the FAST-specific connectors (http://technet.microsoft.com/en-us/library/ff383278.aspx#About_FS_specific). Having a separate content collection for the FAST Search database connector, for example, would allow you to clear the contents of just that content collection in case that is needed.
The important thing is that no matter how you configure any of the above (content sources or content collections), they are just logical groupings of content. In the end it all goes to the same FAST index. Even more important, any queries by default will be executed against the entire index.
Now, if you do want to limit some of your queries to execute against just part of the content, you have two options:
- To filter by Content Source you can search against “contentsource”. For example, if you have a Content Source named “FAST Contoso” defined through Central Administration, you would be able to search against only this content with a query like this (KQL syntax) -> contentsource:”FAST Contoso” <query term>
- To filter by Content Collection you can search against “meta.collection”. For example, if you created a new Content Collection named “news” using Windows PowerShell, you would be able to search against this content with a query like this (KQL syntax) -> meta.collection:news <query term>
As you can see, you have many options, how you use them will depend on your business needs.
All of this reminds me of another thread that I just saw related to how you can get the contentsource property of a document in the Pipeline Extensibility (say you want to apply a certain processing rule only to documents coming through a specific content source, for example). But this is a topic for another day.