Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 1: Search 101 and Architecture

If you want to learn about search in SharePoint 2010, there is so much information everywhere, spread across many sites, in different media formats that it would be a daunting task to try and make sense of it all. That’s why a lot of people come to our instructor-led trainings here at FAST University (the training division of FAST that came to Microsoft through the FAST Search & Transfer acquisition in 2008). Still, even after class, students often ask me for additional material they can explore, sometimes just to help them refresh the concepts and other times to help them deepen their knowledge.

For this purpose I created a OneNote notebook with a collection of my favorite reference links about FS4SP (and also now with a bunch of links about SharePoint Search). Unfortunately though, my OneNote notebook doesn’t seem to be enough anymore, now that it grew to 11+ sections with dozens of links on each(!!).

With that in mind, I’m starting with this post a series of articles that when put together intend to provide a “roadmap” with the references (articles, posts, videos, etc.) that I would follow if I had to start learning about search in SharePoint 2010 (including both SharePoint Search and FAST Search). I hope it helps others find their way too.

The planned sections are the following:


Search 101

In a VERY simplistic way, a search engine has to perform the following main tasks:

  • Crawling: acquire content from wherever it may be located (web sites, intranet, file shares, email, databases, internal systems, etc.)
  • Processing: prepare this content to make it more “searchable”. Think of a Word document, where you will want to extract the text contained in the document, or an image that has some text that you want people to be able to search, or even an web page where you want to extract the title, the body and maybe some of its HTML metadata tags.
  • Indexing: this is the magic sauce of search and what makes it different than just storing the content in a database and searching it using SQL statements. The content in a search engine is stored in a certain way optimized for later retrieval of this content. We typically call this optimized version of the content as the search index.
  • Searching: the part of search engines most well known. You pass one or more query terms and the search engine will return results based on what is available in its search index.

If you are interested in exploring more on the subject you can check these resources:


Search Architecture in SharePoint 2010

To be able to fully comprehend and use search well in SharePoint 2010, I highly recommend you to begin by understanding the concept of Service Applications. You can do this by checking these resources:

If you did read the articles above you now know that SharePoint 2010 has a Search Service Application and this is what you will use to configure your search environment. Now let’s have a look at what the architecture of search on SharePoint 2010 (without FAST) looks like:

SharePoint 2010 - SharePoint Search Architecture Diagram

In a nutshell, this is what these components are:

  • Search Service Application Proxy: the proxy for this service application, as explained in the articles about Service Applications listed above.
  • Admin Component: responsible for applying all the changes you make to the system, such as adding new query components or new crawler dbs, for example.
  • Search Admin db: stores administrative information, such as the search topology, and also the security descriptors (ACL).
  • Crawl db: stores crawl information, such as the crawl queue and the crawl log, and also social and anchor data.
  • Crawl Component: effectively crawl information from data sources, process content, builds indexes and ship them to Index Partition(s), as well as store metadata information on Property db.
  • Property db: stores metadata for items in the search index.
  • Query Component: conducts full text queries against the search index stored in the Index Partition.
  • Query Processor: sends full text queries to Query Component, grabs metadata from Property db and apply security trimming to search results based on ACL info stored on Search Admin db.

In future posts we will explore in more detail each one of these components as well as point to places where you can get more information about them.

But as you may have noticed above, this diagram we just saw is only valid when you have a “pure” SharePoint 2010 Search environment (an installation without FAST Search that is). How does the search architecture looks like when you have FAST Search for SharePoint on top of regular SharePoint Search?

SharePoint 2010 - FAST Search Architecture Diagram - short version

Whoa! A bunch of new things appeared in this diagram. The first thing thing you probably noticed is that you now have not only one Search Service Application (SSA), but two! That’s right, when you want to use FAST Search for SharePoint you must configure two SSAs for the communication between the SharePoint farm and the FAST Search farm.

Two farms as well? Yes, the first thing you must understand about your architecture when you have SP2010 and FS4SP together is that you will configure and add servers to each one of these farms independently. The bridge between these two farms will be precisely our two SSAs listed above, and here is my quick breakdown of them:

  • FAST Content SSA: responsible for crawling content and pushing this content to be processed/indexed in the FAST Search farm.
  • FAST Query SSA: responsible for receiving incoming queries from search applications and routing them to the appropriate search engine (SharePoint Search for People Search and FAST Search for all other Content-related queries). Also responsible for crawling people content.

To understand more about these two SSAs, check this excellent explanation about them in the resource below:

Now that you understand a little bit more about the SSAs, let’s check the remaining components in the FS4SP architecture diagram, the ones in the FAST Search farm:

  • FAST-specific Connectors: additional connectors (beyond the ones available through the FAST Content SSA) for crawling data from multiple data sources and send it for processing. (Note: unless you explicitly configure one of these connectors to crawl your content, they will not be used at all)
  • Item Processing: receives incoming batches of documents, processes these documents (to make them more easily searchable) and forward them to Indexing.
  • Indexing: builds a search-optimized index of the processed content (the search index).
  • Query Processing: responsible for processing queries (e.g. modify the query to add security filters based on the user that issued the request) coming from the FAST Query SSA and also for processing results to be returned, performing activities such as the removal of duplicates.
  • FAST Search Authorization: contacted by Query Processing to return the security filters that should be added to the query.
  • Query Matching: performs the actual lookup in the search index to retrieve items that match the user’s query.
  • Administration: responsible for applying changes you make to the system, such as changes to the index schema.
  • FAST Search Admin db: stores information about the FAST Search environment, including the index schema.

Does it look like a lot of components? Well, in fact they are not really components per se, but more like subsystems that perform a specific task (note how the SSAs don’t have all of their components listed, but are instead collapsed into a single entity).

The real components in a FAST Search farm are the ones listed in this diagram below that shows each subsystem now divided into its components:

SharePoint 2010 - FAST Search Architecture Diagram - detailed

As you can see, there are a LOT of components just in the FAST Search farm (beyond the ones on each SSA on the SharePoint farm). Don’t worry about memorizing all of them now, as we will cover each one of them in more details in future posts.

 

If you managed to get all the way here in this post, congratulations! Smile

If you feel like you still have so much to learn, the answer is “yes, you do”. But, the important thing here is to understand the overall architecture of the system, how all these pieces fit together. If you manage to understand this, then you are on the right track to start going deeper into the roles each one of these components perform in your search environment, which is the subject for the following posts in this series.

Makes sense? It doesn’t? Just let me know in the comments.

About these ads

About leonardocsouza

Mix together a passion for social media, search, recommendations, books, writing, movies, education, knowledge sharing plus a few other things and you get me as result :)
This entry was posted in FS4SP, SP2010 and tagged , , . Bookmark the permalink.

9 Responses to Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 1: Search 101 and Architecture

  1. Pingback: Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 2: Planning, Scale, Installation and Deployment, and Crawling | Search Unleashed

  2. Pingback: SharePoint Search and FAST Search for SharePoint Architecture Diagrams – Fault Tolerance and Performance | Search Unleashed

  3. Piyush says:

    Hi leonardocsouza,

    This is one of the most excellent posts that I have come acrossed. You have explained the SharePoint 2010 Search architecture in simple and clear way.
    Regarding FAST I have one doubt as per your article:
    After Second diagram the article states the following:
    ■FAST Content SSA: responsible for crawling content and pushing this content to be processed/indexed in the FAST Search farm.
    ■FAST-specific Connectors: additional connectors (beyond the ones available through the FAST Content SSA) for crawling data from multiple data sources and send it for processing.

    Out of the above two who is actually crawalling the content? The FAST content SSA or the connector?

    Regards,

    Piyush

    • Thank you so much for the kind words, Piyush!

      I’m sorry if it wasn’t clear in the post, I will add a line there to clarify this point. With FAST Search you have the option of using either FAST Content SSA or the FAST-specific Connectors. It will all depend on your needs.

      If you just use the FAST Content SSA (by adding Content Sources through Central Administration), then the FAST-specific Connectors are not used at all and play no role in your farm.

      Those FAST-specific connectors are only used if you explicitly configure one of them (Database, Web Crawler, Lotus Notes). So, unless you directly configured one of the FAST-specific Connectors to crawl some content, all of your content will be crawled through the FAST Content SSA and then sent to the FAST farm just for Processing/Indexing.

      Hope that clarifies this point!

      Best,
      Leo

  4. Pingback: *SP2010 Search Explained: Concepts and Terminology (Part 1) - SharePoint Strategery - Site Home - MSDN Blogs

  5. Pingback: SP2010 Search *Explained: Concepts and Terminology - SharePoint Strategery - Site Home - MSDN Blogs

  6. Pingback: Search Architecture with Sharepoint 2013 | Search Unleashed

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s