Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 2: Planning, Scale, Installation and Deployment, and Crawling

Did you enjoy your break since our last post in the series, when we finished up with some architecture diagrams for both SharePoint Search and FAST Search for SharePoint? Now let’s have a deeper look into some of those components, focusing on some considerations to properly plan and scale search solutions. Following up, we will cover some installation and deployment topics and then close with crawling. This should be enough to keep you entertained for a few days. Smile

In case you want the full list of this roadmap, the planned sections (so far) are the following:


Planning and Scale

Ready to dig a little deeper into SharePoint Search? Then read these two out-of-this-world articles that explain not only how the architecture of SharePoint Search works, but also how to scale it. Believe me, these two posts have saved me more times than I can count. Extra points for those working with FAST, as almost everything related to the crawling components, including scaling, also applies to FS4SP:

In the links above you understood more about the SharePoint Search architecture, now in this next step you can expand your knowledge by looking at how these same things apply to FS4SP. It is important to note that scaling the FAST Query SSA is mostly done for failover reasons, as the hard work done during query time for FS4SP is executed in the FAST farm (and not in the SharePoint farm):

Now, if you got to here you understand about the crawling and query components running in the SharePoint farm, either for SharePoint Search or for FS4SP, so it is time to do some deep reading into the product documentation. I know hardly anyone likes to read the documentation (I don’t like it either Smile), but there are great nuggets of useful information in the links below that will allow you to understand more about how to design the search solution and topology with FS4SP. The whole piece on performance and capacity management/testing/recommendations under the “Plan search topology” section is definitely worth a look (trust me, it will save you valuable time later on):

Advanced Material on Planning, Design, High Availability

A scenario that I get inquired about somewhat often is the idea of sharing the search service application across multiple SharePoint farms (something much discussed when you have dispersed SharePoint farms and want to provide a central Search farm). If that caught your attention, first you can read the official documentation, then you can go ahead and check the very detailed blog post covering step-by-step instructions on how to set this up for the User Profile Service Application and Search Service Application. The same principles apply to both SharePoint Search and FS4SP (since you are publishing/consuming the SSAs on the SharePoint farm):


Installation / Deployment

First, review and understand the steps required to configure search in SharePoint 2010. Even for those that will only work with FAST, this still matters, as a lot of the overall guidance here will also apply to FAST:

After you complete your reading above, you can go ahead and understand the steps required to deploy FS4SP from the official documentation:

Also, if you are planning to virtualize FS4SP, you better make sure to check the official recommendations here:


Crawling

First, learn the basics of configuring a new Content Source to crawl content in SharePoint 2010, since you will have to do this at some point. The best part? Most of what you learn about defining content sources, crawl rules, starting and stopping here is also valid for FS4SP. The video linked below shows the sequence of events when you trigger a full crawl (the part about crawling is the same for both SharePoint Search and FS4SP, but the part about processing and indexing is different in FS4SP)

For similar information but specific to FS4SP, this is the official documentation:

If you got through here, but still manage to recall the FS4SP architecture diagram from the previous post, you probably noticed that in FS4SP there are a bunch of new components, each with their own function. As I mentioned above, the crawling piece of FS4SP when you use the FAST Content SSA to define content sources will work the same way as it does for SharePoint Search. Below is one of my previous posts trying to explain the crawling/processing/indexing flow in FS4SP:

Another difference in FS4SP is the ability to use one of the FAST Search specific connectors (Web content, Database content, Lotus Notes content). Those are the connectors that came from the previous standalone version of the FAST product, and for those non-initiated in FAST administration, they may look a little strange (command line utilities only? xml configuration files?). These FAST Search specific connectors are completely unknown to your SharePoint farm (SharePoint basically doesn’t even know they exist, as they reside directly on the FAST farm) which means that a SP administrator will not have access to them through Central Administration, so you should be aware of that. My recommendation is that you always try to use the connectors through Central Administration (FAST Content SSA), and go to the FAST Search specific connectors only if you need a specific functionality that you can only get with them (such as the support to Lotus Notes security through the FAST Search Lotus Notes connector):

Now that you already understand how to crawl standard content with both SharePoint Search and FS4SP, it is time to understand how to bring content from other external sources (beyond Web Sites, File Shares, etc.). So do yourself a big favor and learn about Business Connectivity Services (BCS) in SharePoint 2010. To me this is one of THE most important pieces of technology in SharePoint that can really make search shine, as it integrates with other sources in a company (databases, web services, whatever-you-want) bringing all together inside SharePoint. The best part? It is a technology that works with both SharePoint Search and FS4SP seamlessly. The post below has the most detailed explanation I have ever found on how to create the basic External Content Types (to get content from a database, probably the most common scenario):

If you are looking for extra credits as an applied student (as you should Smile), then you can not only learn about BCS for search, but explore the broader capabilities that BCS brings to SharePoint overall, besides search. Believe me, you won’t regret this.

Advanced Material on Crawling and Connectors

Through BCS you can also create your own connectors to link SharePoint with any external sources you want. The first post below is a great starting point on this, and is the exact post I first read to understand how this works:

This second reference is a small gem buried on MSDN that explains how to create something that a lot of people want to do, which is to have a connector that aggregates metadata with an attached document and bring both together to be processed and indexed (such as indexing the metadata information for a candidate along with his/hers resume, allowing users to search for both and get just one result). Powerful stuff.

Another frequently asked question is about the possibility to use BCS to crawl databases other than SQL Server. The article below explains how to do this for Oracle, but gives some clues to the fact that you could do something similar for any other database supporting OLE DB or ODBC:

 

This should keep you busy for a while. And remember that if you just want a quick way to get a server to try some of the things you read above, you can always play around with one of the MSDN Virtual Labs instances, such as this one here that will give you a VM with both SharePoint 2010 and FS4SP.

Didn’t understand some of the materials? Have other resources you want to share? By all means, feel free to comment below. Smile

About these ads

About leonardocsouza

Mix together a passion for social media, search, recommendations, books, writing, movies, education, knowledge sharing plus a few other things and you get me as result :)
This entry was posted in FS4SP, SP2010 and tagged , , . Bookmark the permalink.

9 Responses to Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 2: Planning, Scale, Installation and Deployment, and Crawling

  1. Pingback: Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 1: Search 101 and Architecture | Search Unleashed

  2. Nancy says:

    Hi Leo,
    Wanted to know how to index the internal content of document(pdf,word) in the Hyperlinks placed in the sharepoint page body.
    Tried to do full crawl but only the url is getting indexed as text not the content inside it.
    Whether it will be indexed by default or need to change some configurations or need to write some custom codes??

    • Hi Nancy,

      If I understood your question correctly, what you have is a SharePoint page that has a hyperlink to a document (word, pdf) and you would like to crawl and index not only the SharePoint page, but also the document linked through that URL, correct? I also assume this document resides somewhere else that is not in your SharePoint site (or that document would be crawled anyway through different means).

      If that’s the case, nothing out-of-the-box comes to mind to solve this, except some custom document processing code if you are using FAST Search for SharePoint, but even in this case you would still have to handle parsing/extraction of that document’s content, since this step would have been executed already at the point of the Pipeline Extensibility.

      Now the really important question is: what is the business scenario behind this request?

      By understanding the business scenario we may come up with different ways to approach this problem. Or at least we can try :)

      Best,
      Leo

  3. Paul Beck says:

    Nice series on FS4SP – pls keep them coming.
    I have linked to this post from my blog at: http://blog.sharepointsite.co.uk/2011/07/working-with-qr-server-in-fs4sp.html

  4. Nancy says:

    Thanks for ur prompt response:)
    Ya Leo, we want to crawl and index the SharePoint page with the document linked through
    that URL which resides at some other location, and while searching in the content of the document we want sharepoint page URL to displayed as search results. Can you please elaborate how to customize document processing or Pipeline Extensibility or any blogs,technet refrences would help us.

    • You are most welcome, Nancy :)

      So, if you want to explore the route to implement your custom processing to do this (index the contents of this external linked document as part of the SharePoint page), the first thing to keep in mind is that you will need FAST Search for SharePoint (as SharePoint Search doesn’t allow you to apply custom processing).

      Now, here are two references that should help you get started:

      Pipeline Extensibility (Integrating an External Item Processing Component)
      http://msdn.microsoft.com/library/ff795801.aspx

      This other link below explains how to access “special” properties, including the one you are probably looking for (data):
      • url: The URL that is displayed when the item occurs in the query results.
      • data: The binary content of the source document encoded in base64.
      • body: The text extracted from the item by parsing the data property. The body is extracted by using an IFilter or other document parser.
      http://msdn.microsoft.com/en-us/library/ff795815.aspx

      Good luck!

      Best,
      Leo

      • Nancy says:

        Hi Leo,
        While extending the custom stage pipeline the following error is obtained in the crawl log.

        The FAST Search backend reported warnings when processing the item. ( Customer-supplied command failed: Process terminated abnormally: Unknown error (0x80131700) )

        Have given the full control for FASTSearch folder for Fast adminstrator Account and Crawling account.Kindly let us know what needs to be done in this regard.

      • Hi Nancy,

        To be able to help you with this we will need more information :)

        First, what are you trying to do in this custom application? Call a web service? Access something on disk? To do either one of these options you will need special care, as your custom code has access to only the AppData\LocalLow directory of the user running the FAST Search service.

        In any case, please take a look at this article here (http://techmikael.blogspot.com/2010/12/how-to-debug-and-log-fast-search.html) as it should help you in troubleshooting at which point in your code this error is happening.

        Hope that helps!

        Best,
        Leo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s