Did you enjoy your break since our last post in the series, when we finished up with some architecture diagrams for both SharePoint Search and FAST Search for SharePoint? Now let’s have a deeper look into some of those components, focusing on some considerations to properly plan and scale search solutions. Following up, we will cover some installation and deployment topics and then close with crawling. This should be enough to keep you entertained for a few days. ![]()
In case you want the full list of this roadmap, the planned sections (so far) are the following:
- Search 101 (previous post)
- Search Architecture in SharePoint 2010 (previous post)
- Planning and Scale
- Installation / Deployment
- Crawling
- Processing (future post)
- Indexing (future post)
- Searching (future post)
Planning and Scale
Ready to dig a little deeper into SharePoint Search? Then read these two out-of-this-world articles that explain not only how the architecture of SharePoint Search works, but also how to scale it. Believe me, these two posts have saved me more times than I can count. Extra points for those working with FAST, as almost everything related to the crawling components, including scaling, also applies to FS4SP:
- Crawling – http://blogs.msdn.com/b/russmax/archive/2010/04/16/search-2010-architecture-and-scale-part-1-crawl.aspx
- Query – http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx
In the links above you understood more about the SharePoint Search architecture, now in this next step you can expand your knowledge by looking at how these same things apply to FS4SP. It is important to note that scaling the FAST Query SSA is mostly done for failover reasons, as the hard work done during query time for FS4SP is executed in the FAST farm (and not in the SharePoint farm):
- Multiple server deployment of the Content SSA (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/ff599537.aspx
- Multiple server deployment of the Query SSA (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/ff602184.aspx
Now, if you got to here you understand about the crawling and query components running in the SharePoint farm, either for SharePoint Search or for FS4SP, so it is time to do some deep reading into the product documentation. I know hardly anyone likes to read the documentation (I don’t like it either
), but there are great nuggets of useful information in the links below that will allow you to understand more about how to design the search solution and topology with FS4SP. The whole piece on performance and capacity management/testing/recommendations under the “Plan search topology” section is definitely worth a look (trust me, it will save you valuable time later on):
- Plan the search solution (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/ff383310.aspx
- Plan search topology (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/ff599528.aspx
Advanced Material on Planning, Design, High Availability
A scenario that I get inquired about somewhat often is the idea of sharing the search service application across multiple SharePoint farms (something much discussed when you have dispersed SharePoint farms and want to provide a central Search farm). If that caught your attention, first you can read the official documentation, then you can go ahead and check the very detailed blog post covering step-by-step instructions on how to set this up for the User Profile Service Application and Search Service Application. The same principles apply to both SharePoint Search and FS4SP (since you are publishing/consuming the SSAs on the SharePoint farm):
- Share service applications across farms – http://technet.microsoft.com/en-us/library/ff621100.aspx
- SharePoint Server 2010 Enterprise Service Application Publishing and Consuming Farms – http://www.kowalski.ms/2010/07/16/sharepoint-server-2010-enterprise-service-application-publishing-and-consuming-farms/
Installation / Deployment
First, review and understand the steps required to configure search in SharePoint 2010. Even for those that will only work with FAST, this still matters, as a lot of the overall guidance here will also apply to FAST:
- Post-installation steps for search – http://technet.microsoft.com/en-us/library/ee808863.aspx
After you complete your reading above, you can go ahead and understand the steps required to deploy FS4SP from the official documentation:
- Deployment for FAST Search Server 2010 for SharePoint – http://technet.microsoft.com/en-us/library/ff381267.aspx
Also, if you are planning to virtualize FS4SP, you better make sure to check the official recommendations here:
- Recommendations: Virtualization (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/gg702612.aspx
Crawling
First, learn the basics of configuring a new Content Source to crawl content in SharePoint 2010, since you will have to do this at some point. The best part? Most of what you learn about defining content sources, crawl rules, starting and stopping here is also valid for FS4SP. The video linked below shows the sequence of events when you trigger a full crawl (the part about crawling is the same for both SharePoint Search and FS4SP, but the part about processing and indexing is different in FS4SP)
- Manage crawling (SharePoint Server 2010) – http://technet.microsoft.com/en-us/library/ee792876.aspx
- SharePoint Server 2010 Full Crawl Sequence Demo – http://www.microsoft.com/resources/msdn/en-us/office/media/video/sharepointestc.html?uuid=f716a6eb-9b74-45fc-acab-a2909f80d2d9&from=mscomsharepoint
For similar information but specific to FS4SP, this is the official documentation:
- Manage crawling with the FAST Search Content SSA (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/ff384288.aspx
If you got through here, but still manage to recall the FS4SP architecture diagram from the previous post, you probably noticed that in FS4SP there are a bunch of new components, each with their own function. As I mentioned above, the crawling piece of FS4SP when you use the FAST Content SSA to define content sources will work the same way as it does for SharePoint Search. Below is one of my previous posts trying to explain the crawling/processing/indexing flow in FS4SP:
Another difference in FS4SP is the ability to use one of the FAST Search specific connectors (Web content, Database content, Lotus Notes content). Those are the connectors that came from the previous standalone version of the FAST product, and for those non-initiated in FAST administration, they may look a little strange (command line utilities only? xml configuration files?). These FAST Search specific connectors are completely unknown to your SharePoint farm (SharePoint basically doesn’t even know they exist, as they reside directly on the FAST farm) which means that a SP administrator will not have access to them through Central Administration, so you should be aware of that. My recommendation is that you always try to use the connectors through Central Administration (FAST Content SSA), and go to the FAST Search specific connectors only if you need a specific functionality that you can only get with them (such as the support to Lotus Notes security through the FAST Search Lotus Notes connector):
- Manage crawling with the FAST Search specific connectors (FAST Search Server 2010 for SharePoint) – http://technet.microsoft.com/en-us/library/ff383272.aspx
Now that you already understand how to crawl standard content with both SharePoint Search and FS4SP, it is time to understand how to bring content from other external sources (beyond Web Sites, File Shares, etc.). So do yourself a big favor and learn about Business Connectivity Services (BCS) in SharePoint 2010. To me this is one of THE most important pieces of technology in SharePoint that can really make search shine, as it integrates with other sources in a company (databases, web services, whatever-you-want) bringing all together inside SharePoint. The best part? It is a technology that works with both SharePoint Search and FS4SP seamlessly. The post below has the most detailed explanation I have ever found on how to create the basic External Content Types (to get content from a database, probably the most common scenario):
- Searching External Data in SharePoint 2010 Using Business Connectivity Services – http://blogs.msdn.com/b/ericwhite/archive/2010/04/28/searching-external-data-in-sharepoint-2010-using-business-connectivity-services.aspx
If you are looking for extra credits as an applied student (as you should
), then you can not only learn about BCS for search, but explore the broader capabilities that BCS brings to SharePoint overall, besides search. Believe me, you won’t regret this.
- BCS Overview Demo Part 1 of 3 – http://www.youtube.com/watch?v=82xzNsG0d5A
- BCS Overview Demo Part 2 of 3 – http://www.youtube.com/watch?v=QUBqpYxkOEo
- BCS Overview Demo Part 3 of 3 – http://www.youtube.com/watch?v=aC15uqL-V0o
Advanced Material on Crawling and Connectors
Through BCS you can also create your own connectors to link SharePoint with any external sources you want. The first post below is a great starting point on this, and is the exact post I first read to understand how this works:
- HOW TO: Create a Searchable SharePoint 2010 BDC .NET Assembly Connector Which Reads From A Flat File – http://www.toddbaginski.com/blog/archive/2009/11/05/how-to-create-a-searchable-sharepoint-2010-bdc-.net-assembly-connector-which-reads-from-a-flat-file.aspx
This second reference is a small gem buried on MSDN that explains how to create something that a lot of people want to do, which is to have a connector that aggregates metadata with an attached document and bring both together to be processed and indexed (such as indexing the metadata information for a candidate along with his/hers resume, allowing users to search for both and get just one result). Powerful stuff.
- Creating .NET Assemblies That Aggregate Data from Multiple External Systems for Business Connectivity Services in SharePoint Server 2010 – http://msdn.microsoft.com/en-us/library/ff728359.aspx
Another frequently asked question is about the possibility to use BCS to crawl databases other than SQL Server. The article below explains how to do this for Oracle, but gives some clues to the fact that you could do something similar for any other database supporting OLE DB or ODBC:
- How to: Connect to an Oracle Database Using Business Connectivity Services – http://msdn.microsoft.com/library/ff464424(office.14).aspx
This should keep you busy for a while. And remember that if you just want a quick way to get a server to try some of the things you read above, you can always play around with one of the MSDN Virtual Labs instances, such as this one here that will give you a VM with both SharePoint 2010 and FS4SP.
Didn’t understand some of the materials? Have other resources you want to share? By all means, feel free to comment below. ![]()
Pingback: Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) – Part 1: Search 101 and Architecture | Search Unleashed
Hi Leo,
Wanted to know how to index the internal content of document(pdf,word) in the Hyperlinks placed in the sharepoint page body.
Tried to do full crawl but only the url is getting indexed as text not the content inside it.
Whether it will be indexed by default or need to change some configurations or need to write some custom codes??
Hi Nancy,
If I understood your question correctly, what you have is a SharePoint page that has a hyperlink to a document (word, pdf) and you would like to crawl and index not only the SharePoint page, but also the document linked through that URL, correct? I also assume this document resides somewhere else that is not in your SharePoint site (or that document would be crawled anyway through different means).
If that’s the case, nothing out-of-the-box comes to mind to solve this, except some custom document processing code if you are using FAST Search for SharePoint, but even in this case you would still have to handle parsing/extraction of that document’s content, since this step would have been executed already at the point of the Pipeline Extensibility.
Now the really important question is: what is the business scenario behind this request?
By understanding the business scenario we may come up with different ways to approach this problem. Or at least we can try
Best,
Leo
Nice series on FS4SP – pls keep them coming.
I have linked to this post from my blog at: http://blog.sharepointsite.co.uk/2011/07/working-with-qr-server-in-fs4sp.html
Thank you, Paul! And thank you for sharing the link to your blog.
Best,
Leo
Thanks for ur prompt response:)
Ya Leo, we want to crawl and index the SharePoint page with the document linked through
that URL which resides at some other location, and while searching in the content of the document we want sharepoint page URL to displayed as search results. Can you please elaborate how to customize document processing or Pipeline Extensibility or any blogs,technet refrences would help us.
You are most welcome, Nancy
So, if you want to explore the route to implement your custom processing to do this (index the contents of this external linked document as part of the SharePoint page), the first thing to keep in mind is that you will need FAST Search for SharePoint (as SharePoint Search doesn’t allow you to apply custom processing).
Now, here are two references that should help you get started:
Pipeline Extensibility (Integrating an External Item Processing Component)
http://msdn.microsoft.com/library/ff795801.aspx
This other link below explains how to access “special” properties, including the one you are probably looking for (data):
• url: The URL that is displayed when the item occurs in the query results.
• data: The binary content of the source document encoded in base64.
• body: The text extracted from the item by parsing the data property. The body is extracted by using an IFilter or other document parser.
http://msdn.microsoft.com/en-us/library/ff795815.aspx
Good luck!
Best,
Leo
Hi Leo,
While extending the custom stage pipeline the following error is obtained in the crawl log.
The FAST Search backend reported warnings when processing the item. ( Customer-supplied command failed: Process terminated abnormally: Unknown error (0×80131700) )
Have given the full control for FASTSearch folder for Fast adminstrator Account and Crawling account.Kindly let us know what needs to be done in this regard.
Hi Nancy,
To be able to help you with this we will need more information
First, what are you trying to do in this custom application? Call a web service? Access something on disk? To do either one of these options you will need special care, as your custom code has access to only the AppData\LocalLow directory of the user running the FAST Search service.
In any case, please take a look at this article here (http://techmikael.blogspot.com/2010/12/how-to-debug-and-log-fast-search.html) as it should help you in troubleshooting at which point in your code this error is happening.
Hope that helps!
Best,
Leo