If you want to learn about search in SharePoint 2010, there is so much information everywhere, spread across many sites, in different media formats that it would be a daunting task to try and make sense of it all. That’s why a lot of people come to our instructor-led trainings here at FAST University (the training division of FAST that came to Microsoft through the FAST Search & Transfer acquisition in 2008). Still, even after class, students often ask me for additional material they can explore, sometimes just to help them refresh the concepts and other times to help them deepen their knowledge.
For this purpose I created a OneNote notebook with a collection of my favorite reference links about FS4SP (and also now with a bunch of links about SharePoint Search). Unfortunately though, my OneNote notebook doesn’t seem to be enough anymore, now that it grew to 11+ sections with dozens of links on each(!!).
With that in mind, I’m starting with this post a series of articles that when put together intend to provide a “roadmap” with the references (articles, posts, videos, etc.) that I would follow if I had to start learning about search in SharePoint 2010 (including both SharePoint Search and FAST Search). I hope it helps others find their way too.
The planned sections are the following:
- Search 101: general concepts of search, including crawling, processing, indexing and searching
- Search Architecture in SharePoint 2010: the overall architecture of search-related components in SharePoint 2010 alone or with FAST Search for SharePoint
- Planning and Scale (second post in the series)
- Installation / Deployment (second post in the series)
- Crawling (second post in the series)
- Processing (future post)
- Indexing (future post)
- Searching (future post)
In a VERY simplistic way, a search engine has to perform the following main tasks:
- Crawling: acquire content from wherever it may be located (web sites, intranet, file shares, email, databases, internal systems, etc.)
- Processing: prepare this content to make it more “searchable”. Think of a Word document, where you will want to extract the text contained in the document, or an image that has some text that you want people to be able to search, or even an web page where you want to extract the title, the body and maybe some of its HTML metadata tags.
- Indexing: this is the magic sauce of search and what makes it different than just storing the content in a database and searching it using SQL statements. The content in a search engine is stored in a certain way optimized for later retrieval of this content. We typically call this optimized version of the content as the search index.
- Searching: the part of search engines most well known. You pass one or more query terms and the search engine will return results based on what is available in its search index.
If you are interested in exploring more on the subject you can check these resources:
- Wikipedia page on search engines
- Read more about the topic of enterprise search on the Professional Microsoft Search book
Search Architecture in SharePoint 2010
To be able to fully comprehend and use search well in SharePoint 2010, I highly recommend you to begin by understanding the concept of Service Applications. You can do this by checking these resources:
- Series of videos and labs about Services Architecture on Channel 9 (Videos)
- SharePoint 2010: Service Applications Part One: Model Overview (Blog post)
- In a Nutshell: SharePoint 2010 Service Applications (Blog post)
- Services architecture planning (SharePoint Server 2010) (Technical Documentation)
- Services in SharePoint 2010 Products XPS PDF Visio (great diagrams and awesome information all condensed into one file)
If you did read the articles above you now know that SharePoint 2010 has a Search Service Application and this is what you will use to configure your search environment. Now let’s have a look at what the architecture of search on SharePoint 2010 (without FAST) looks like:
In a nutshell, this is what these components are:
- Search Service Application Proxy: the proxy for this service application, as explained in the articles about Service Applications listed above.
- Admin Component: responsible for applying all the changes you make to the system, such as adding new query components or new crawler dbs, for example.
- Search Admin db: stores administrative information, such as the search topology, and also the security descriptors (ACL).
- Crawl db: stores crawl information, such as the crawl queue and the crawl log, and also social and anchor data.
- Crawl Component: effectively crawl information from data sources, process content, builds indexes and ship them to Index Partition(s), as well as store metadata information on Property db.
- Property db: stores metadata for items in the search index.
- Query Component: conducts full text queries against the search index stored in the Index Partition.
- Query Processor: sends full text queries to Query Component, grabs metadata from Property db and apply security trimming to search results based on ACL info stored on Search Admin db.
In future posts we will explore in more detail each one of these components as well as point to places where you can get more information about them.
But as you may have noticed above, this diagram we just saw is only valid when you have a “pure” SharePoint 2010 Search environment (an installation without FAST Search that is). How does the search architecture looks like when you have FAST Search for SharePoint on top of regular SharePoint Search?
Whoa! A bunch of new things appeared in this diagram. The first thing thing you probably noticed is that you now have not only one Search Service Application (SSA), but two! That’s right, when you want to use FAST Search for SharePoint you must configure two SSAs for the communication between the SharePoint farm and the FAST Search farm.
Two farms as well? Yes, the first thing you must understand about your architecture when you have SP2010 and FS4SP together is that you will configure and add servers to each one of these farms independently. The bridge between these two farms will be precisely our two SSAs listed above, and here is my quick breakdown of them:
- FAST Content SSA: responsible for crawling content and pushing this content to be processed/indexed in the FAST Search farm.
- FAST Query SSA: responsible for receiving incoming queries from search applications and routing them to the appropriate search engine (SharePoint Search for People Search and FAST Search for all other Content-related queries). Also responsible for crawling people content.
To understand more about these two SSAs, check this excellent explanation about them in the resource below:
- The two types of Search Service Applications (in a SharePoint 2010 deployment with FAST Search Server)
Now that you understand a little bit more about the SSAs, let’s check the remaining components in the FS4SP architecture diagram, the ones in the FAST Search farm:
- FAST-specific Connectors: additional connectors (beyond the ones available through the FAST Content SSA) for crawling data from multiple data sources and send it for processing. (Note: unless you explicitly configure one of these connectors to crawl your content, they will not be used at all)
- Item Processing: receives incoming batches of documents, processes these documents (to make them more easily searchable) and forward them to Indexing.
- Indexing: builds a search-optimized index of the processed content (the search index).
- Query Processing: responsible for processing queries (e.g. modify the query to add security filters based on the user that issued the request) coming from the FAST Query SSA and also for processing results to be returned, performing activities such as the removal of duplicates.
- FAST Search Authorization: contacted by Query Processing to return the security filters that should be added to the query.
- Query Matching: performs the actual lookup in the search index to retrieve items that match the user’s query.
- Administration: responsible for applying changes you make to the system, such as changes to the index schema.
- FAST Search Admin db: stores information about the FAST Search environment, including the index schema.
Does it look like a lot of components? Well, in fact they are not really components per se, but more like subsystems that perform a specific task (note how the SSAs don’t have all of their components listed, but are instead collapsed into a single entity).
The real components in a FAST Search farm are the ones listed in this diagram below that shows each subsystem now divided into its components:
As you can see, there are a LOT of components just in the FAST Search farm (beyond the ones on each SSA on the SharePoint farm). Don’t worry about memorizing all of them now, as we will cover each one of them in more details in future posts.
If you managed to get all the way here in this post, congratulations!
If you feel like you still have so much to learn, the answer is “yes, you do”. But, the important thing here is to understand the overall architecture of the system, how all these pieces fit together. If you manage to understand this, then you are on the right track to start going deeper into the roles each one of these components perform in your search environment, which is the subject for the following posts in this series.
Makes sense? It doesn’t? Just let me know in the comments.