SharePoint Search and FAST Search for SharePoint Architecture Diagrams – Fault Tolerance and Performance

Update: For those interested in watching a presentation of this content below you can download (right-click and select “Save target as..”) and watch this video here (200+ MB) that was recorded during a webcast on 2011-07-27. My presentation starts at 6min20sec.

In previous posts I showed and explained a few architecture diagrams of search in SharePoint 2010 for both SharePoint Search and FAST Search for SharePoint, I shared my all-time-favorite resource on SharePoint Search Architecture and Scale for crawl and query, and (hopefully) helped you understand, scale and monitor Crawling / Processing / Indexing in FAST Search for SharePoint.

What I will try to do in this post is convert most of that content into additional diagrams that should help you “see” how these changes related to fault tolerance and/or performance affect your search diagram.

These are the architecture diagrams discussed in this post:

SharePoint Search

FAST Search for SharePoint


SharePoint Search – Query Component (Fault Tolerance)

SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Fault Tolerance)

In this diagram you see how your architecture would look like after you add a new mirror Query Component for an existing Index Partition, which you do in order to provide fault tolerance for your lookup of matched items for full-text search queries against your index. The reasons for doing that are pretty simple (and detailed in here): one server goes down, the other can still keep serving queries, and unless you configure the mirror server as “failover only” it will also distribute the load of incoming queries.


SharePoint Search – Query Component (Performance)

SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Performance)

In this diagram there is just a very subtle change from the previous one (marked in red), but it makes a lot of difference in your architecture: the additional Query Component has a different Index Partition. What this means is that now your content is divided between the two Index Partitions, so if for example you have a total of 6 million indexed items, then each Index Partition has 3 million items. This also means that your Query Processor will send requests in parallel to both Query Components and, since each one of them has to search against only half of the index (3 million out of 6 million total), they will be able to do this faster.

The supported number of indexed items is 100 million per search service application and 10 million for each Index Partition.


SharePoint Search – Property db (Performance)

SharePoint 2010 - SharePoint Search Architecture Diagram - Property Db (Performance)

Here things start to get interesting, with not only a new Query Component/Index Partition, but also with a new Property db (added items marked in red). If you read this post (mentioned a dozen times by now Smile) you understand that in order to provide search results, the Query Processor need to perform a lookup not only in the Index Partition but also in the Property db in order to retrieve the metadata associated with the results found. When you start to increase your indexed content, for example by having 20M items that you then split across 2 Index Partitions to improve your index lookup time, it may happen that your Property db is now your bottleneck. A way to minimize this impact in the growing number of indexed items is by adding a new Property db and assigning a new Query Component/Index Partition to it. This way, each combination of Index Partition/Property db has to store and handle search requests for only half of the total number of indexed items.

It is also important to notice that all search-related databases (Property db, Search Admin db and Crawl db) can be configured for fault tolerance through the use of database mirroring.


SharePoint Search – Query Processor (Fault Tolerance and Performance)

SharePoint 2010 - SharePoint Search Architecture Diagram - Query Processor (Fault Tolerance and Performance)

Even after you have scaled your Query Components, your Index Partitions, your Property dbs, another query component that may require your attention is the Query Processor. This is the component that does the hard work of accessing the Query Component (to check items that match the query), the Property db (to get metadata associated with those items) and the Search Admin db (to get security descriptors in order to apply security trimming in the results). By adding a new Query Processor (marked in red and described in here), you divide the load of this task across multiple servers, increasing your query performance and providing fault tolerance (if one goes down, the other can still handle queries).


SharePoint Search – Crawl Component (Fault Tolerance and Performance)

SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component (Fault Tolerance and Performance)

Now let’s take a look at the other side of search: Crawling/Processing/Indexing. You can notice a new Crawl Component that was added in the diagram above, now what does this mean? This means that both Crawl Components will split the load of crawling the content sources defined, and both will keep pulling from and updating the crawling queue stored in the Crawl db. For example, if your full crawl with one Crawl Component and one Crawl db was taking 4 days, by adding another Crawl Component (and considering you have sufficient CPU/Memory/IO/bandwidth/etc. resources) the same full crawl should be reduced to around 2 days. Also, with two Crawl Components working from the same Crawl db, you also get fault tolerance in case one of them goes down.


SharePoint Search – Crawl Component and Crawl db (Performance)

SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component and Crawl Db (Performance)

What happens when you start to add many Crawl Components to the same Crawl db? Well, the db can easily become your bottleneck. One way to keep scaling out and increasing your crawling performance is through the use of an additional set of Crawl Component/Crawl db, as shown in the diagram above. In this way, distinct content sources (web applications, web sites, file shares, etc.) will be split among these two Crawl dbs, and their respective Crawl Components will have to handle (crawl/process/index) only part of the content, making it easier to deal with.

There are a lot of things that go into this, from how content to be crawled is split among multiple Crawl dbs to how you can manually define this mapping yourself (if you want to). All of this and more are detailed in this post here.


FAST Search for SharePoint – Content Processing (Fault Tolerance and Performance)

SharePoint 2010 - FAST Search Architecture Diagram - Content Processing (Fault Tolerance and Performance)

Since we are starting with content processing You may be asking “what about the crawling part of FAST Search?”. Well, the good news is that if you are using the FAST Content SSA to crawl your content, then your crawling architecture looks pretty much like what we just saw for SharePoint Search above. The main difference is that the FAST Content SSA will be tasked only with crawling, since processing and indexing will be done in the FAST Search farm. And talking about content processing, the first component that can be scaled out is the Content Distributor (as shown above in red). What this gives you is just fault tolerance, since the FAST Content SSA will connect and send batches to only one Content Distributor at a time, and will switch to the other one just in case of failure to submit batches to the “primary” Content Distributor (you also must make sure to configure the FAST Content SSA listing both Content Distributors).

In regards to Document Processors, you will definitely have more than one (you get 4 of them by default in a simple installation), which gives both fault tolerance (in case one of them goes down) and performance (since they will work in parallel). Also, if the “primary” Content Distributor goes down, the Document Processors will be smart enough to switch to the other available Content Distributor.


Indexer (Fault Tolerance)

SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Fault Tolerance)

Remember the option to mirror an Index Partition in SharePoint Search to provide fault tolerance? This is the similar way that FAST Search can do that, but with a name change, since the documentation will refer to this process as adding a backup indexer row. In this case both Indexers will have the same content, which means that if your primary Indexer goes down, the backup Indexer can be configured to become the new primary Indexer.


Indexer (Performance)

SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Performance)

In the diagram above, instead of adding a new backup Indexer for fault tolerance, it was added a new Indexer column to increase the volume of indexed content that can be stored in your search farm. In this scenario your content will be divide among the two Indexer columns (very similar to how we divided the content into separate Index Partitions for SharePoint Search).

The official guideline is to have one Indexer column for each 15 million items to index.


Indexer and Search (Fault Tolerance)

SharePoint 2010 - FAST Search Architecture Diagram - Indexer and Search (Fault Tolerance)

Above is the diagram of a somewhat common deployment of FAST Search for SharePoint, where you have two servers and each one is configured with a combination of Indexer and Search in a way that one server is the primary Indexer and backup Search, and the other server is backup Indexer and primary Search. In this way, with just your two servers you are providing fault tolerance for both Indexer and Search.


Query Processing (Fault Tolerance)

SharePoint 2010 - FAST Search Architecture Diagram - Query Processing (Fault Tolerance)

In this diagram above a Query Processing server (with QRServer, QRProxy and FSA Worker components) was added to the FAST Search farm and also properly configured in the FAST Query SSA by listing both servers in its setup. With this configuration, queries will be sent to both servers in a round robin fashion, and if one of the servers fails the FAST Query SSA will keep sending queries just to the active server.

Conclusion

There is a lot you can configure in both SharePoint Search and FAST Search for SharePoint to increase performance and/or provide fault tolerance for components of your search farm. The important thing is to understand what options are available for each platform and keep them in mind when you first design your search architecture as well as after your search project is in production, in case you need to scale out your deployment.

About leonardocsouza

Mix together a passion for social media, search, recommendations, books, writing, movies, education, knowledge sharing plus a few other things and you get me as result :)
This entry was posted in FS4SP, SP2010 and tagged , , , , . Bookmark the permalink.

12 Responses to SharePoint Search and FAST Search for SharePoint Architecture Diagrams – Fault Tolerance and Performance

  1. Thanks for posting this. Saw your presentation at the NYC enterprise usergroup.

    https://twitter.com/msftfastsearch
    what’s your twitter?

    • Hi Victor!

      Thank you for the comment here and for the tweet!

      My twitter is @leonardocsouza, but to be sincere I haven’t been posting there very frequently.🙂

      Best,
      Leo

  2. Pingback: Sharepoint Search Architecture « Sladescross's Blog

  3. Pingback: SharePoint Search and FAST Search for SharePoint Architecture Diagrams – Fault Tolerance and Performance | Diluk's SharePoint Blog

  4. Claudio says:

    Thank you for your post!
    How about the FAST Admin DB? What is the best way to make it Fault Tolerant?

    • You are most welcome, Claudio!

      I haven’t explored much in regards to fault tolerance for the SP DBs, since they support both SQL Server mirroring and clustering. I would expect the FAST Admin DB to follow the same rule. That would be my guess, as I haven’t tested it myself.

      Best,
      Leo

  5. Pingback: FAST Architecture « Sladescross's Blog

  6. Brian E. Williams says:

    Hi Leo!!! Configured my first Deployment.xml file with a three server FAST farm. With your vaious examples here, they make alot of sense, but could you provide sample deployment.xml files? It is confusing with primary/secondary rows/colums. Thanks!!!

    • Hi Brian!

      That’s not the first time I’ve heard this request, so let me see if I can find some time to craft another post with deployment.xml examples of deployments across multiple servers.

      And congrats on your first deployment.xml file!🙂 If you have specific questions, feel free to send them over and I will take a look.

      Cheers,
      Leo

  7. slylock2 says:

    Hi Leo, I’ve recently been looking into implementing search fault tolerance, as shown in your diagram: https://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-and-search-fault-tolerance.jpg

    How do you get a row with search(backup)?? In the documentation for deployment.xml, the search attribute has to be either true or false. There is no “backup” option.

    • Hi Matt!

      It’s been a while since I last check this, but if I recall correctly, the search attribute should be set to “true”. Then the queries will be automatically balanced between the two search servers. I don’t believe you can have a “passive” search node that only starts to get queries when the other one goes down. At least that’s what I remember.

      Hope that helps!

      Best,
      Leo

  8. Pingback: Search Service Application – Crawl failed notification | sharepoint2013admin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s