Update: For those interested in watching a presentation of this content below you can download (right-click and select “Save target as..”) and watch this video here (200+ MB) that was recorded during a webcast on 2011-07-27. My presentation starts at 6min20sec.
In previous posts I showed and explained a few architecture diagrams of search in SharePoint 2010 for both SharePoint Search and FAST Search for SharePoint, I shared my all-time-favorite resource on SharePoint Search Architecture and Scale for crawl and query, and (hopefully) helped you understand, scale and monitor Crawling / Processing / Indexing in FAST Search for SharePoint.
What I will try to do in this post is convert most of that content into additional diagrams that should help you “see” how these changes related to fault tolerance and/or performance affect your search diagram.
These are the architecture diagrams discussed in this post:
- Query Component (Fault Tolerance)
- Query Component (Performance)
- Property db (Performance)
- Query Processor (Fault Tolerance and Performance)
- Crawl Component (Fault Tolerance and Performance)
- Crawl Component and Crawl db (Performance)
FAST Search for SharePoint
- Content Processing (Fault Tolerance and Performance)
- Indexer (Fault Tolerance)
- Indexer (Performance)
- Indexer and Search (Fault Tolerance)
- Query Processing (Fault Tolerance)
SharePoint Search – Query Component (Fault Tolerance)
In this diagram you see how your architecture would look like after you add a new mirror Query Component for an existing Index Partition, which you do in order to provide fault tolerance for your lookup of matched items for full-text search queries against your index. The reasons for doing that are pretty simple (and detailed in here): one server goes down, the other can still keep serving queries, and unless you configure the mirror server as “failover only” it will also distribute the load of incoming queries.
SharePoint Search – Query Component (Performance)
In this diagram there is just a very subtle change from the previous one (marked in red), but it makes a lot of difference in your architecture: the additional Query Component has a different Index Partition. What this means is that now your content is divided between the two Index Partitions, so if for example you have a total of 6 million indexed items, then each Index Partition has 3 million items. This also means that your Query Processor will send requests in parallel to both Query Components and, since each one of them has to search against only half of the index (3 million out of 6 million total), they will be able to do this faster.
SharePoint Search – Property db (Performance)
Here things start to get interesting, with not only a new Query Component/Index Partition, but also with a new Property db (added items marked in red). If you read this post (mentioned a dozen times by now ) you understand that in order to provide search results, the Query Processor need to perform a lookup not only in the Index Partition but also in the Property db in order to retrieve the metadata associated with the results found. When you start to increase your indexed content, for example by having 20M items that you then split across 2 Index Partitions to improve your index lookup time, it may happen that your Property db is now your bottleneck. A way to minimize this impact in the growing number of indexed items is by adding a new Property db and assigning a new Query Component/Index Partition to it. This way, each combination of Index Partition/Property db has to store and handle search requests for only half of the total number of indexed items.
It is also important to notice that all search-related databases (Property db, Search Admin db and Crawl db) can be configured for fault tolerance through the use of database mirroring.
SharePoint Search – Query Processor (Fault Tolerance and Performance)
Even after you have scaled your Query Components, your Index Partitions, your Property dbs, another query component that may require your attention is the Query Processor. This is the component that does the hard work of accessing the Query Component (to check items that match the query), the Property db (to get metadata associated with those items) and the Search Admin db (to get security descriptors in order to apply security trimming in the results). By adding a new Query Processor (marked in red and described in here), you divide the load of this task across multiple servers, increasing your query performance and providing fault tolerance (if one goes down, the other can still handle queries).
SharePoint Search – Crawl Component (Fault Tolerance and Performance)
Now let’s take a look at the other side of search: Crawling/Processing/Indexing. You can notice a new Crawl Component that was added in the diagram above, now what does this mean? This means that both Crawl Components will split the load of crawling the content sources defined, and both will keep pulling from and updating the crawling queue stored in the Crawl db. For example, if your full crawl with one Crawl Component and one Crawl db was taking 4 days, by adding another Crawl Component (and considering you have sufficient CPU/Memory/IO/bandwidth/etc. resources) the same full crawl should be reduced to around 2 days. Also, with two Crawl Components working from the same Crawl db, you also get fault tolerance in case one of them goes down.
SharePoint Search – Crawl Component and Crawl db (Performance)
What happens when you start to add many Crawl Components to the same Crawl db? Well, the db can easily become your bottleneck. One way to keep scaling out and increasing your crawling performance is through the use of an additional set of Crawl Component/Crawl db, as shown in the diagram above. In this way, distinct content sources (web applications, web sites, file shares, etc.) will be split among these two Crawl dbs, and their respective Crawl Components will have to handle (crawl/process/index) only part of the content, making it easier to deal with.
There are a lot of things that go into this, from how content to be crawled is split among multiple Crawl dbs to how you can manually define this mapping yourself (if you want to). All of this and more are detailed in this post here.
FAST Search for SharePoint – Content Processing (Fault Tolerance and Performance)
Since we are starting with content processing You may be asking “what about the crawling part of FAST Search?”. Well, the good news is that if you are using the FAST Content SSA to crawl your content, then your crawling architecture looks pretty much like what we just saw for SharePoint Search above. The main difference is that the FAST Content SSA will be tasked only with crawling, since processing and indexing will be done in the FAST Search farm. And talking about content processing, the first component that can be scaled out is the Content Distributor (as shown above in red). What this gives you is just fault tolerance, since the FAST Content SSA will connect and send batches to only one Content Distributor at a time, and will switch to the other one just in case of failure to submit batches to the “primary” Content Distributor (you also must make sure to configure the FAST Content SSA listing both Content Distributors).
In regards to Document Processors, you will definitely have more than one (you get 4 of them by default in a simple installation), which gives both fault tolerance (in case one of them goes down) and performance (since they will work in parallel). Also, if the “primary” Content Distributor goes down, the Document Processors will be smart enough to switch to the other available Content Distributor.
Indexer (Fault Tolerance)
Remember the option to mirror an Index Partition in SharePoint Search to provide fault tolerance? This is the similar way that FAST Search can do that, but with a name change, since the documentation will refer to this process as adding a backup indexer row. In this case both Indexers will have the same content, which means that if your primary Indexer goes down, the backup Indexer can be configured to become the new primary Indexer.
In the diagram above, instead of adding a new backup Indexer for fault tolerance, it was added a new Indexer column to increase the volume of indexed content that can be stored in your search farm. In this scenario your content will be divide among the two Indexer columns (very similar to how we divided the content into separate Index Partitions for SharePoint Search).
Indexer and Search (Fault Tolerance)
Above is the diagram of a somewhat common deployment of FAST Search for SharePoint, where you have two servers and each one is configured with a combination of Indexer and Search in a way that one server is the primary Indexer and backup Search, and the other server is backup Indexer and primary Search. In this way, with just your two servers you are providing fault tolerance for both Indexer and Search.
Query Processing (Fault Tolerance)
In this diagram above a Query Processing server (with QRServer, QRProxy and FSA Worker components) was added to the FAST Search farm and also properly configured in the FAST Query SSA by listing both servers in its setup. With this configuration, queries will be sent to both servers in a round robin fashion, and if one of the servers fails the FAST Query SSA will keep sending queries just to the active server.
There is a lot you can configure in both SharePoint Search and FAST Search for SharePoint to increase performance and/or provide fault tolerance for components of your search farm. The important thing is to understand what options are available for each platform and keep them in mind when you first design your search architecture as well as after your search project is in production, in case you need to scale out your deployment.