<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Search Unleashed</title>
	<atom:link href="http://searchunleashed.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://searchunleashed.wordpress.com</link>
	<description>FS4SP, SP2010, search and more</description>
	<lastBuildDate>Fri, 24 May 2013 19:38:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='searchunleashed.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Search Unleashed</title>
		<link>http://searchunleashed.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://searchunleashed.wordpress.com/osd.xml" title="Search Unleashed" />
	<atom:link rel='hub' href='http://searchunleashed.wordpress.com/?pushpress=hub'/>
		<item>
		<title>The 4 Essential Concepts You Need to Know To Use Any Search Engine Efficiently</title>
		<link>http://searchunleashed.wordpress.com/2012/12/28/the-4-essential-concepts-you-need-to-know-to-use-any-search-engine-efficiently/</link>
		<comments>http://searchunleashed.wordpress.com/2012/12/28/the-4-essential-concepts-you-need-to-know-to-use-any-search-engine-efficiently/#comments</comments>
		<pubDate>Fri, 28 Dec 2012 16:02:55 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[cXense]]></category>
		<category><![CDATA[FS4SP]]></category>

		<guid isPermaLink="false">http://searchunleashed.wordpress.com/?p=197</guid>
		<description><![CDATA[When you go to your insert-search-application-name-here enter a query and hit the search button, what exactly are you searching on? One of the hardest things to do in IT (or in any field, really) is to sometimes take a step &#8230; <a href="http://searchunleashed.wordpress.com/2012/12/28/the-4-essential-concepts-you-need-to-know-to-use-any-search-engine-efficiently/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=197&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<blockquote><p>When you go to your <em>insert-search-application-name-here</em> enter a query and hit the search button, what <strong>exactly</strong> are you searching on?</p></blockquote>
<p>One of the hardest things to do in IT (or in any field, really) is to sometimes take a step back and look at the basics, at the foundational knowledge behind some things that we may use every day without necessarily understanding how they really work.</p>
<p>After realizing I&#8217;ve been having the same conversation with different customers/students to explain these same main concepts over the last few years (both at FAST and now at cXense), I decided to explain a little bit about these 4 essential concepts here:</p>
<ol>
<li><a href="#typeofquery">Type of query (AND, PHRASE, OR)</a></li>
<li><a href="#wheretosearch">Where to search (all fields, body, title, etc.)</a></li>
<li><a href="#fieldimportance">Field Importance</a></li>
<li><a href="#sorting">Sorting</a></li>
</ol>
<p><a name="typeofquery"></a></p>
<h2>Type of Query</h2>
<p>The first thing you have to think about when constructing your search interface is: how do I want the system to match the text/query specified by the user?</p>
<p>To answer this question you must understand the differences in each of the following three search requests explained below.</p>
<blockquote><p>Obs.: Note that the examples below are query language-agnostic, so just replace them with whatever is the proper syntax for the search engine you are using. Even though the syntax may change, the concepts should remain the same.</p></blockquote>
<h3>AND query</h3>
<p><em>Example: venture AND capital</em></p>
<p>This query above will match only documents that contain all terms in the query, which in this case means that any document, in order to be returned, must have both the term <em>venture</em> as well as the term <em>capital</em>. Those terms can be found together (e.g. &#8220;raised more venture capital money&#8230;&#8221;), separately (e.g. &#8220;is initiating a new venture with capital raised&#8230;&#8221; or even in a different order (e.g. &#8220;for his new venture he raised capital from&#8230;&#8221;).</p>
<p>This is the most common operator used across search applications and also the default operator in many search platforms (e.g. FAST ESP, FS4SP, cX::search).</p>
<h3>PHRASE query</h3>
<p><em>Example: &#8220;venture capital&#8221;</em></p>
<p>This query above, in contrary to the AND query, will only return documents that contain this exact phrase. This means that a document with a text like &#8220;raised more venture capital money&#8230;&#8221; will match, but a document with &#8220;is initiating a new venture with capital raised&#8230;&#8221; will not (due to the fact that there is an extra term &#8211; <em>with</em> &#8211; in between the two required terms).</p>
<p>This is an operator often used behind the scenes by search applications whenever a user puts some text in between quotes into the search box. It&#8217;s very useful for scenarios where the user is trying to find some exact phrase he/she is looking for.</p>
<h3>OR query</h3>
<p><em>Example: venture OR capital</em></p>
<p>This last query is the most open of all, as it will return documents that contain any of the terms in the query. With this query, a document only needs to have the term <em>venture</em> or <em>capital</em> to be returned, without the need for having both (as it was the case with the AND and PHRASE queries). This means that a document with the text &#8220;he decided to venture down the hall&#8230;&#8221; will be returned, as well as a document with the text &#8220;Brasília is the federal capital of Brazil&#8221;.</p>
<p><a name="wheretosearch"></a></p>
<h2>Where to search</h2>
<p>Now that you have decided what type of queries you want to execute (and, phrase, or), the next step is to decide where do you want this search to occur. When asked &#8220;where do you want to search?&#8221; people usually reply with &#8220;everywhere, of course!&#8221;. Yet it is important to step back and think if that&#8217;s really what you want.</p>
<p>Imagine you go to your search application and type &#8220;financial systems&#8221; (with/without the quotes) and click the search button, what will happen then? Where in the document do you believe this query will try to find the terms <em>financial</em> and <em>systems</em>?</p>
<p>The answer to these questions depends heavily on which search technology you are using behind the scenes:</p>
<ul>
<li>in FAST ESP &#8211; this would be a query against the default composite field, which out-of-the-box would be comprised of fields such as body, title, url, keywords, etc.</li>
<li>in FAST Search for SharePoint &#8211; this would be a query against the fulltext index, which by default contains fields such as title, author, body, etc.</li>
<li>in cX::search &#8211; this would be a query against all searchable fields in the index</li>
</ul>
<p>In the case of cX::search, if you do not define exactly which fields should be searched on, by default the search will be executed against all the searchable fields in the index. This means that cX::search will look for the terms financial and systems in the fields title and body, but also in fields such as category, related_content, or even unitsInStock which may not be exactly what you are looking for.</p>
<p>When I was teaching FAST Search for SharePoint, the main confusion for students was the fact that the default search was <strong>not</strong> across ALL fields, but instead just a subset of them, which meant that for every new managed property that you wanted to search by default (just by typing some terms in the search box, that is) you needed to make sure to add it to the fulltext index as well.</p>
<p>As you can see, even such a simple question can have very distinct answers depending on which search platform you are using, so the best way to avoid future problems is to first understand exactly how your specific search platform handles the <em>default</em> queries, and then use this knowledge to control exactly which fields you want to search on by default.</p>
<p>For cX::search, for example, this could be done by adding the desired list of fields before the query term:</p>
<p><code>?p_aq=query(title,body,description,tags,url,author:"financial systems", token-op=and)</code></p>
<p>In the example above we are being very clear about which fields should be used when looking for the query terms defined by the user, which makes it a lot easier to debug and answer questions like &#8220;why was this document returned in the results?&#8221;.</p>
<p><a name="fieldimportance"></a></p>
<h2>Field Importance</h2>
<p>By now should know how you want to search (and, phrase, or) and also where to search (title, body, etc.), so it&#8217;s time to decide which fields matter more to you among all the ones that were selected to be searched in the previous step. As a starting point, take look at these document examples below:</p>
<p>Document 1<br />
Title: Market Research Findings &#8211; 2012<br />
Description: This document summarizes the findings from the 2012 market research study&#8230;<br />
Tags: research, 2012</p>
<p>Document 2<br />
Title: About the market crash of 1929<br />
Description: All the available research on the market crash of 1929&#8230;<br />
Tags: stock, market, 1929</p>
<p>Document 3<br />
Title: XYZ begins to explore new market<br />
Description: After a few years focused on research, company XYZ began exploring a new market&#8230;<br />
Tags: XYZ</p>
<p>And now consider the following query: <em>market AND research</em></p>
<p>Based on the sample query and documents above, which document would you expect to be ranked higher?</p>
<p>Most people would say Document 1 listed above should be ranked higher, and the reason is that users got trained by search engines to expect, among other things, that anything that is found in the title of a document should have more relevance than something found somewhere in the body of the document. This is a very reasonable expectation, because we tend to accept that if someone went through the trouble of choosing specific terms to put in the title of a document, then those terms must be important.</p>
<p>So, depending on your search platform of choice, there are different ways for you to be explicit about what fields should have higher importance.</p>
<p>In cX::search, for example, the modified query would look like this:</p>
<p><code>?p_aq=query(title^5,tags^3,body:"market research", token-op=and)</code></p>
<p>The query above is defining that cX::search should:</p>
<ul>
<li>look for documents containing the terms <em>market</em> and <em>research</em>;</li>
<li>these terms must be found in the <em>title</em>, <em>tags</em> or <em>body</em> fields; and, even more importantly</li>
<li>terms found in the <em>title</em> have 5 times (title^5) more importance than terms found in the <em>body</em> (the default field boost is 1)</li>
<li>terms found in the <em>tags</em> have 3 times (tags^3) more importance than terms found in the <em>body</em></li>
</ul>
<p>In a similar fashion, FAST ESP has <a href="http://nesfast1.scot.nhs.uk:16089/help/configuration/references/r_esp_config_relevance_tuning_using_rank_profile.html">the composite-rank piece of the rank profile</a>, which allows you to define how much importance you want to give for each field that is part of a composite field.</p>
<p>In FAST Search for SharePoint, you also have some options available both through the UI or through PowerShell, which allow you to configure <a href="http://technet.microsoft.com/en-us/library/gg982954(v=office.14).aspx#BKMK_FullTextIndex">which importance level a managed property should belong to when mapped to a fulltext index</a>, as shown in the screenshot below:</p>
<p><img class="alignnone wp-image-199" alt="Fulltext Index Mapping" src="http://searchunleashed.files.wordpress.com/2012/12/fulltext-index-mapping.png?w=640" /></p>
<p>As you can see from the examples above, using field boosts (or any similar feature for the search platform you are using) give you the flexibility to be very precise about which fields matter most according to your specific business rules.</p>
<p><a name="sorting"></a></p>
<h2>Sorting</h2>
<p>The last important piece of this puzzle of configuring basic relevance settings for your search application is to decide how results should be sorted before being returned. This is crucial because, in the end, this is what decides what results will be displayed on top.</p>
<p>Remember the previous example above that used field boosts to define the importance of each field? Well, now take a look at this cX::search request below:</p>
<p><code>?p_aq=query(title^5,tags^3,body:"market research", token-op=and)&amp;p_sm=publication_date:desc</code></p>
<p>As you can see above, this query is explicitly requesting that results be sorted by <em>publication_date</em> in descending order. What this means is that any field boosts are completely ignored by the search engine. Yes, they are simply ignored, since we directly requested results to be sorted based on a date field, instead of the default sorting that is based on the ranking score.</p>
<p>Sometimes this is exactly what you want, such as the case when the user has already drilled down to a subset of results and you want to allow him/her to just sort by price or average rating, for example (two options I often use when searching for products at Amazon).</p>
<p>And a last option is the case when you want to mix the two approaches, in a way that you can still use the ranking score, but with extra boosts that take into consideration how recent is a document (or how many units it has sold, or what is its average rating, etc.). Those are more advanced options that we will discuss another day, but for now just keep in mind that yes, that&#8217;s also possible <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/197/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/197/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=197&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2012/12/28/the-4-essential-concepts-you-need-to-know-to-use-any-search-engine-efficiently/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2012/12/fulltext-index-mapping.png" medium="image">
			<media:title type="html">Fulltext Index Mapping</media:title>
		</media:content>
	</item>
		<item>
		<title>How to get authenticated/secure results through the QRServer in FAST Search for SharePoint</title>
		<link>http://searchunleashed.wordpress.com/2012/04/21/how-to-query-authenticated-secure-results-qrserver-fast-search-for-sharepoint/</link>
		<comments>http://searchunleashed.wordpress.com/2012/04/21/how-to-query-authenticated-secure-results-qrserver-fast-search-for-sharepoint/#comments</comments>
		<pubDate>Sat, 21 Apr 2012 12:54:20 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>

		<guid isPermaLink="false">http://searchunleashed.wordpress.com/?p=195</guid>
		<description><![CDATA[I received an email from an ex-student today that forced me to remember how to send an authenticated query to the QRServer in FAST Search for SharePoint. The reason for doing this is that when you issue a query through &#8230; <a href="http://searchunleashed.wordpress.com/2012/04/21/how-to-query-authenticated-secure-results-qrserver-fast-search-for-sharepoint/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=195&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I received an email from an ex-student today that forced me to remember how to send an authenticated query to the QRServer in FAST Search for SharePoint.</p>
<p>The reason for doing this is that when you issue a query through the SharePoint UI, additional security parameters are sent to FAST along with the query. But when you go directly against the QRServer interface (accessible through <a href="http://localhost:13280" rel="nofollow">http://localhost:13280</a> directly in the server running the <em>query</em> component in the FAST farm), the queries typed in there are sent without any security parameters by default, which means you will not get back any results that require security permissions (such as all your crawled SharePoint content, for example).</p>
<p>I&#8217;ve sent instructions to students on how to get authenticated results from the QRServer many times in the past, and even <a href="http://blog.sharepointsite.co.uk/2011/07/working-with-qr-server-in-fs4sp.html">commented about it in this post here</a>, but I just realized I never posted this here on the blog, so I&#8217;m doing it now to make this information easier to be found.</p>
<p>Below are the steps to get secure results through the QRServer without having to modify qtf-config.xml (which is something advisable):</p>
<p><em>Note: you will need to perform the steps below in a query server in your FAST farm<br />
</em></p>
<ol>
<li>Edit %FASTSEARCH%\components\sam\worker\user_config.xml</li>
<li>Change:<br />
&lt;add name=&#8221;AllowNonCleanUpClaimsCacheForTestingOnly&#8221; value=&#8221;false&#8221; type=&#8221;System.Boolean&#8221; /&gt;To:<br />
&lt;add name=&#8221;AllowNonCleanUpClaimsCacheForTestingOnly&#8221; <strong>value=&#8221;true&#8221;</strong> type=&#8221;System.Boolean&#8221; /&gt;</li>
<li>To pick up your changes, open a command prompt window and restart the samworker<br />
nctrl restart samworker</li>
<li>Make sure the samworker is running. If it is not running, check your previous edits.<br />
nctrl status</li>
<li>Execute a query through a search center in SharePoint and ensure results are returned. You will use the security credentials from this query to get secure results from the QRServer.</li>
<li>Navigate to %FASTSEARCH%\var\log\querylogs and open your latest query log (if the file is locked; make a copy of the file and open the copy).</li>
<li>Locate and copy this parameter: <em>&amp;qtf_securityfql:uid=&lt;token&gt;=</em> (the trailing equal sign should be included)</li>
<li>Navigate to the qrserver page: <a href="http://localhost:13280/" rel="nofollow">http://localhost:13280/</a></li>
<li>In the additional parameters text box add:<br />
<em>&amp;qtf_securityfql:uid=&lt;token&gt;=</em></li>
<li>Issue a query and ensure you get secure results back.</li>
</ol>
<p>Another way to also get authenticated results (from outside the SharePoint UI) without having to make any modifications in your system, is to use the terrific <a title="FAST Search for SharePoint 2010 Query Logger" href="http://fs4splogger.codeplex.com/">FAST Search for SharePoint 2010 Query Logger</a> tool created by Mikael Svenson.</p>
<p>Enjoy! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/195/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=195&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2012/04/21/how-to-query-authenticated-secure-results-qrserver-fast-search-for-sharepoint/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>
	</item>
		<item>
		<title>Goodbye Microsoft, Hello New Year and New Ventures</title>
		<link>http://searchunleashed.wordpress.com/2012/02/01/goodbye-microsoft-hello-new-year-and-new-ventures/</link>
		<comments>http://searchunleashed.wordpress.com/2012/02/01/goodbye-microsoft-hello-new-year-and-new-ventures/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 17:46:54 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[cXense]]></category>
		<category><![CDATA[cxense]]></category>
		<category><![CDATA[enterprise search]]></category>
		<category><![CDATA[microsoft]]></category>

		<guid isPermaLink="false">http://searchunleashed.wordpress.com/?p=187</guid>
		<description><![CDATA[&#8220;All changes, even the most longed for, have their melancholy; for what we leave behind us is a part of ourselves; we must die to one life before we can enter another.&#8221; ~ Anatole France This has pretty much been &#8230; <a href="http://searchunleashed.wordpress.com/2012/02/01/goodbye-microsoft-hello-new-year-and-new-ventures/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=187&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<blockquote><p>&#8220;All changes, even the most longed for, have their melancholy; for what we leave behind us is a part of ourselves; we must die to one life before we can enter another.&#8221; ~ Anatole France</p></blockquote>
<p>This has pretty much been my feeling for the past few weeks, as I gathered my things and organized everything for my departure from Microsoft. As of today (or better saying, yesterday at midnight) I&#8217;m no longer working as a Senior Technical Instructor at Microsoft.</p>
<p>Combining my time at FAST, and then at Microsoft after the acquisition, I have been with the company for over 6 years. During this time I&#8217;ve made many friends, worked in challenging and exciting projects and also had a chance to spend a lot of time exploring, understanding and trying to help others understand (at least I hope so <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ) the enterprise search world. In short, it was a lot of fun and I learned a LOT. Which is why I decided to change and keep learning&#8230;</p>
<p>Starting today I&#8217;m joining some great friends over at <a href="http://www.cxense.com/">cXense</a> as their Director Customer Excellence in Americas (US, Canada and Latin America), which is a very fancy title to say that I will continue doing what I love to do most: help customers to be successful.</p>
<p>As for the future of the blog, it will get even better. I will keep trying to help as much as I can, responding to questions and comments, as well as posting about enterprise search, this time also including posts about other flavors of search (such as Solr/Lucene and cXsearch, which I can&#8217;t wait to learn more about!). I&#8217;ve also invited a few friends, who are still working with SharePoint/FAST Search on a daily basis, to guest post here now and then. Questions about any of these search technologies will be more than welcome, as always!</p>
<p>6 years ago I got a call inviting me to join FAST and, by accepting that offer, I had one of my best professional experiences to date. When <a href="http://www.cxense.com/leadership-team.html">the same people</a> called me again, this time with an invitation to join cXense, I had no choice than to say yes and look forward to many more adventures. <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/187/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/187/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=187&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2012/02/01/goodbye-microsoft-hello-new-year-and-new-ventures/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>
	</item>
		<item>
		<title>2011 in review &#8211; Annual Report by WordPress</title>
		<link>http://searchunleashed.wordpress.com/2012/01/02/2011-in-review-annual-report-by-wordpress/</link>
		<comments>http://searchunleashed.wordpress.com/2012/01/02/2011-in-review-annual-report-by-wordpress/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 15:31:47 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://searchunleashed.wordpress.com/?p=181</guid>
		<description><![CDATA[It&#8217;s very interesting to look at the 2011 annual report for my blog that was put together by WordPress. Not surprisingly, the most read posts were the learning roadmaps. It seems like with so much information spread all over the &#8230; <a href="http://searchunleashed.wordpress.com/2012/01/02/2011-in-review-annual-report-by-wordpress/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=181&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s very interesting to look at <a href="/2011/annual-report/">the 2011 annual report for my blog that was put together by WordPress</a>. Not surprisingly, the most read posts were the <i>learning roadmaps</i>. It seems like with so much information spread all over the place about SharePoint Search &amp; FAST Search for SharePoint, it becomes even more important to be have a clear path to follow to learn more.</p>
<p>Now let&#8217;s get ready for 2012! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<blockquote><p>The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.</p>
<p>	<a href="/2011/annual-report/"><img src="http://www.wordpress.com/wp-content/mu-plugins/annual-reports/img/emailteaser.jpg" width="100%" alt="" /></a></p>
<p>Here&#8217;s an excerpt:</p>
</p>
<blockquote><p>The concert hall at the Syndey Opera House holds 2,700 people.  This blog was viewed about <strong>14,000</strong> times in 2011.  If it were a concert at Sydney Opera House, it would take about 5 sold-out performances for that many people to see it.</p></blockquote>
<p><a href="/2011/annual-report/">Click here to see the complete report.</p>
</blockquote>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/181/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/181/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=181&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2012/01/02/2011-in-review-annual-report-by-wordpress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://www.wordpress.com/wp-content/mu-plugins/annual-reports/img/emailteaser.jpg" medium="image" />
	</item>
		<item>
		<title>How &#8220;Remove Duplicate Results&#8221; works in FAST Search for SharePoint</title>
		<link>http://searchunleashed.wordpress.com/2011/12/08/how-remove-duplicate-results-works-in-fast-search-for-sharepoint/</link>
		<comments>http://searchunleashed.wordpress.com/2011/12/08/how-remove-duplicate-results-works-in-fast-search-for-sharepoint/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 15:08:45 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>

		<guid isPermaLink="false">https://searchunleashed.wordpress.com/?p=163</guid>
		<description><![CDATA[This is a question that I have received quite a few times, and this time I thought I would get some screenshots and detail the process a little bit in here so that other folks can take advantage of this &#8230; <a href="http://searchunleashed.wordpress.com/2011/12/08/how-remove-duplicate-results-works-in-fast-search-for-sharepoint/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=163&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This is a question that I have received quite a few times, and this time I thought I would get some screenshots and detail the process a little bit in here so that other folks can take advantage of this info as well.</p>
<p>First of all, what is the “Remove Duplicate Results” feature? It is a feature that tells the search engine to collapse results that are perceived as duplicates (such as the same document located in different paths), so that only one instance of the document is returned in the search results (instead of showing the multiple copies to the end-user).</p>
<p>The setting that enables this feature (which is <em>on by default</em>) in a Search Center is available at the Search Core Results Web Part, under the section Result Query Options, as shown below:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="Search Core Results Web Part - Remove Duplicate Results" border="0" alt="Search Core Results Web Part - Remove Duplicate Results" src="http://searchunleashed.files.wordpress.com/2011/12/image.png?w=230&#038;h=501" width="230" height="501"></p>
<p>Once this option is enabled, any duplicate items will be collapsed in the search results, as you can see in this example:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FAST Search for SharePoint Duplicate Results Removal" border="0" alt="FAST Search for SharePoint Duplicate Results Removal" src="http://searchunleashed.files.wordpress.com/2011/12/image1.png?w=601&#038;h=355" width="601" height="355"></p>
<p>And if you want to see all the duplicates (in order to delete one of them in the source repository, for example), all you have to do is click in the “Duplicates (2)” link highlighted above. This will execute another query, filtering results to display only the duplicates of the item you selected:</p>
<p><a href="http://searchunleashed.files.wordpress.com/2011/12/image2.png"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FAST Search for SharePoint Duplicate Removal Example" border="0" alt="FAST Search for SharePoint Duplicate Removal Example" src="http://searchunleashed.files.wordpress.com/2011/12/image_thumb.png?w=604&#038;h=220" width="604" height="220"></a></p>
<p>&nbsp;</p>
<p>Now let’s investigate how this feature works. To do this, we will go in reverse order (from search to processing) to understand all the pieces involved.</p>
<p>The first clue is that this is enabled/disabled during search time, so there must be some parameter being sent by the Search Center to FAST Search for SharePoint to inform that duplicate removal is enabled. Taking a look at one of the full search requests in the querylogs (%FASTSearch%\var\log\querylogs\) we can confirm this:</p>
<p>/cgi-bin/search?hits=10&amp;resubmitflags=1&amp;rpf_navigation:hits=50&amp;query=sharepoint&amp;spell=suggest&amp;<strong><font color="#ff0000">collapsenum=1</font></strong>&amp;qtf_lemmatize=True…&amp;<strong><font color="#ff0000">collapseon=batvdocumentsignature</font></strong>&amp;type=kwall…</p>
<blockquote><p>Note: the querylog shown above has some query parameters removed so we can focus on the items that matter to duplicate removal.</p>
</blockquote>
<p>As you can see, there are two parameters sent to FAST Search for SharePoint indicating which property should be used for collapsing (<em>batvdocumentsignature</em>) and how many items should be kept after collapsing is performed (<em>1</em>). And if we want more information about these options, the <a href="http://msdn.microsoft.com/en-us/library/ff521593.aspx">MSDN documentation explains these two parameters used for duplicate removal</a> (the names differ because the querylog shows the internal query parameter names received by FAST Search for SharePoint):</p>
<blockquote><p>onproperty &#8211; Specifies the name of a non-default managed property to use as the basis for duplicate removal. The default value is the <strong>DocumentSignature</strong> managed property. The managed property must be of type Integer. By using a managed property that represents a grouping of items, you can use this feature for field collapsing.</p>
<p>keepcount &#8211; Specifies the number of items to keep for each set of duplicates. <strong>The default value is 1</strong>. It can be used for result collapsing use cases. If TrimDuplicates is based on a managed property that can be used as a group identifier (for example, a site ID), you can control how many results are returned for each group. The items returned are the items with the highest dynamic rank within each group.</p>
</blockquote>
<p>The last parameter that can be used with the Duplicate Removal feature is also described in the MSDN article, explaining what happened behind the scenes when we clicked the “Duplicates (2)” link to display all the duplicates for that item:</p>
<blockquote><p>includeid &#8211; Specifies the value associated with a collapse group, typically used when a user clicks the <strong>Duplicates (n)</strong> link of an item with duplicates. This value corresponds to the value of the fcoid managed property that is returned in query results.</p>
</blockquote>
<p>Ok, so far we know that duplicate removal is enabled by default and is applied by collapsing results that have the same value for the managed property DocumentSignature. Let’s have a look at the settings for this managed property then:</p>
<p><a href="http://searchunleashed.files.wordpress.com/2011/12/image3.png"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="image" border="0" alt="image" src="http://searchunleashed.files.wordpress.com/2011/12/image_thumb1.png?w=683&#038;h=393" width="683" height="393"></a></p>
<p>As you can see, the type of this managed property is <em>Integer</em> (which the MSDN article defined as a requirement) and it is also configured both as <em>Sortable</em> and <em>Queryable</em>. The peculiar thing is that when looking at the crawled properties mapped to this managed property we get nothing as a result, which indicates that the value for this property is most likely being computed by FAST Search for SharePoint during content processing.</p>
<p>So let’s take a look at some lower-level configuration files to track this down. We start with the mother file of all configurations related to the content processing pipeline: %FASTSearch%\etc\PipelineConfig.xml. This is a file that can’t be modified (<a href="http://technet.microsoft.com/en-us/library/ff354943.aspx">since it is not included here in the list of configuration files that can be modified</a>), but nothing prevents us from just looking at it. After opening this configuration file and searching for “documentsignature”, you will find the definition for the stage responsible for assigning the value to this property:</p>
<p><pre class="brush: xml; highlight: [4,5]; title: ; notranslate">
    &lt;processor name=&quot;DocumentSignature&quot; type=&quot;general&quot; hidden=&quot;0&quot;&gt;
      &lt;load module=&quot;processors.DuplicateId&quot; class=&quot;DuplicateId&quot;/&gt;
      &lt;config&gt;
       &lt;param name=&quot;Output&quot; value=&quot;documentsignature&quot; type=&quot;string&quot;/&gt;
       &lt;param name=&quot;Input&quot;  value=&quot;title:0:required body:1024:required documentsignaturecontribution&quot; type=&quot;string&quot;/&gt;
      &lt;/config&gt;
    &lt;/processor&gt;
</pre>
</p>
<p>The parameters that matter most to us are highlighted above:</p>
<ul>
<li><strong>Input</strong>: which properties will be used to calculate the document signature –&gt; <em>title</em> and the first 1024 bytes of <em>body</em> (as well as a property called <em>documentsignaturecontribution</em> that will also be used if it has any value)
<li><strong>Output</strong>: our dear <em>documentsignature</em> property </li>
</ul>
<p>And with this we get to the bottom of this duplicate removal feature, which means is a good time to recap everything we found out:</p>
<ol>
<li>During content processing, for every item being processed, FAST Search for SharePoint will obtain the value of <em>title</em> and the first 1024 bytes of <em>body</em> for this item, and use it to compute a numerical checksum that will be used as a document signature. This checksum is stored in the property <em>documentsignature</em> for every item processed.
<li>During query time, whenever “Remove Duplicate Results” is enabled, the Search Center tells FAST Search for SharePoint to collapse results using the <em>documentsignature</em> property, effectively eliminating any duplicates for items that have the same <em>title+first-1024-bytes-of-body</em>.
<li>When a user clicks on the “Duplicates (n)” link next to an item that has duplicates, another query is submitted to FAST Search for SharePoint, passing as an additional parameter the value of the <em>fcoid</em> managed property for the item selected, which will be used to return all items that contain the same checksum (aka “the duplicates”). </li>
</ol>
<p>A few extra questions that may appear in your head after you read this:</p>
<ul>
<li>Is it possible to collapse results by anything other than <em>documentsignature</em> and use this feature for other types of collapsing (collapse by site, collapse by category, etc.)?
<ul>
<li>Answer: Yes, it is absolutely possible. All you will need is an Integer, Sortable, Queryable managed property containing the values you want to collapse by + a custom web part (or an extended Search Core Results Web Part) where you request this managed property to be used for collapsing and how many items should be preserved (<a href="http://msdn.microsoft.com/en-us/library/ff521593.aspx">as explained in this MSDN article linked before</a>) </li>
</ul>
<li>Can two items that are <em>not identical </em>be considered as duplicates?
<ul>
<li>Answer: Yep. As we saw above, only the <em>first 1024 bytes</em> of <em>body</em> are used for calculating the checksum, which means that any other differences these documents may have beyond the first 1024 bytes will not be considered for the purposes of duplicate removal. <em>(Note: roughly speaking, the <u>body</u> property will have just the <u>text</u> of the document, without any of the formatting)</em> </li>
</ul>
<li>Can I change how the default document signature is computed?
<ul>
<li>Answer: Yes, read the update to this post below. <strike>No, since this is done by one of FAST Search for SharePoint out-of-the-box stages. But, what you can do is calculate your own checksum using the Pipeline Extensibility and then use your custom property for duplicate removal.</strike> </li>
</ul>
<li>When you have duplicates and only one item is shown in the results, how does FS4SP decides which item to show?
<ul>
<li>Answer: The item shown is the item with the highest rank score among all the duplicates.</li>
</ul>
</li>
</ul>
<p>That’s it for today. If any other questions related to duplicate removal surface after this post is published, I will add them to the list above. <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/12/wlemoticon-smile.png?w=640"></p>
<p><strong>UPDATE (2011-12-09): Yes, you can influence how document signature is computed!</strong></p>
<p>After I posted this article yesterday I was still thinking about that <em>documentsignaturecontribution</em> property mentioned, as I had a feeling that there was a way to use it to influence how the document signature is compute. Well, today I found some time to test it and yes, it works! Here is how to do it.</p>
<p>What you have to do is create a new managed property with <strong>exactly</strong> the name <em>documentsignaturecontribution</em> and then map to this managed property any values that you also want to use for the checksum computation (as with other managed properties, to assign a value to this property you must map a crawled property to it).</p>
<p>You need a <strong>managed</strong> property because the DocumentSignature stage is executed after the mapping from crawled properties to managed properties, so FAST Search for SharePoint is looking for a managed property named <em>documentsignaturecontribution</em> to use as part of the checksum computation. When you create this managed property and assign it some value, FAST Search for SharePoint simply uses this, along with <em>title</em> and the <em>first 1024 bytes of body</em>, to calculate the checksum.</p>
<p>I followed the <a href="http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/thread/025e643c-03d8-463c-87d9-ec57db8e8237">great idea from Mikael Svenson to create two text files filled with the same content just to force them to be perceived as duplicates by the system</a>. The key here was to create these two files with exactly the same name and content, but put them in different folders. This way I could guarantee that both items had the same <em>title</em> and <em>body</em>, which would result in them having the same checksum. This was confirmed after I crawled the folders with these items:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FAST Search for SharePoint - Duplicate Removal - Example of duplicates" border="0" alt="FAST Search for SharePoint - Duplicate Removal - Example of duplicates" src="http://searchunleashed.files.wordpress.com/2011/12/image4.png?w=664&#038;h=187" width="664" height="187"></p>
<p>Both items had the same checksum, which could be checked by looking at the property <em>fcoid </em>that was returned with the results:</p>
<blockquote><p>fcoid: 102360510986285564</p>
</blockquote>
<p>My next step was to create the managed property <em>documentsignaturecontribution</em> and map to it some value that would allow me to distinguish between the two files. In my case, that value was the <em>path </em>to the items, which were located in different folders. So, after creating my managed property <em>documentsignaturecontribution</em> of type Text, I mapped to it the same crawled properties that are mapped to the managed property <em>path</em>, just to make sure I would get the same value as that property:</p>
<p><a href="http://searchunleashed.files.wordpress.com/2011/12/image5.png" target="_blank"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FAST Search for SharePoint - Duplicate Removal - documentsignaturecontribution managed property" border="0" alt="FAST Search for SharePoint - Duplicate Removal - documentsignaturecontribution managed property" src="http://searchunleashed.files.wordpress.com/2011/12/image_thumb2.png?w=644&#038;h=474" width="644" height="474"></a></p>
<p>With this done, I just had to perform another full crawl to force SharePoint to process both of my text files again, and confirm that they were not perceived as duplicates anymore (since I was also using their path to compute the checksum):</p>
<p><a href="http://searchunleashed.files.wordpress.com/2011/12/image6.png"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="image" border="0" alt="image" src="http://searchunleashed.files.wordpress.com/2011/12/image_thumb3.png?w=667&#038;h=292" width="667" height="292"></a></p>
<p>Another look into the <em>fcoid </em>property confirmed that both items did have different checksums now:</p>
<ul>
<li>file://demo2010a/c$/testleo/test.txt &#8211; <em>fcoid</em>: <font color="#ff0000">334483385708934799</font></li>
<li>file://demo2010a/c$/testleo/<font color="#ff0000"><strong>subdir/</strong></font>test.txt &#8211; <em>fcoid</em>: <font color="#ff0000">452732803969334619</font></li>
</ul>
<p>So what we learned with this is that you <strong>can </strong>influence how the document signature is computed by creating this managed property <em>documentsignaturecontribution</em> and mapping to it any value you want to be part of the checksum computation. And if you want to use the full body of the document to compute the checksum, you could accomplish this through the following steps:</p>
<ol>
<li>add to the Pipeline Extensibility a custom process that uses the <em>body </em>property as input and return a checksum crawled property based on the full contents of the body as the output</li>
<li>map this checksum crawled property returned in the previous step to the managed property <em>documentsignaturecontribution</em> </li>
<li>re-crawl your content and celebrate when it works <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/12/wlemoticon-smile.png?w=640"></li>
</ol>
<p>Additional info:</p>
<ul>
<li><a href="http://techmikael.blogspot.com/2011/03/prototyping-pipeline-stages-in.html">Mikael Svenson’s post on how to use PowerShell to create quick-and-easy custom processing components.</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/ff795815.aspx">Pipeline Extensibility Configuration Schema &#8211; CrawledProperty Element, which explains how to access the body property from Pipeline Extensibility custom process</a></li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/163/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/163/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=163&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2011/12/08/how-remove-duplicate-results-works-in-fast-search-for-sharepoint/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image.png" medium="image">
			<media:title type="html">Search Core Results Web Part - Remove Duplicate Results</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image1.png" medium="image">
			<media:title type="html">FAST Search for SharePoint Duplicate Results Removal</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image_thumb.png" medium="image">
			<media:title type="html">FAST Search for SharePoint Duplicate Removal Example</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image_thumb1.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image4.png" medium="image">
			<media:title type="html">FAST Search for SharePoint - Duplicate Removal - Example of duplicates</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image_thumb2.png" medium="image">
			<media:title type="html">FAST Search for SharePoint - Duplicate Removal - documentsignaturecontribution managed property</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/image_thumb3.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/12/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>
	</item>
		<item>
		<title>How to force an item to be removed from the index immediately with FAST Search for SharePoint</title>
		<link>http://searchunleashed.wordpress.com/2011/11/15/how-to-force-an-item-to-be-removed-from-the-index-immediately-with-fast-search-for-sharepoint/</link>
		<comments>http://searchunleashed.wordpress.com/2011/11/15/how-to-force-an-item-to-be-removed-from-the-index-immediately-with-fast-search-for-sharepoint/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 21:24:31 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>
		<category><![CDATA[delete]]></category>
		<category><![CDATA[docpush]]></category>

		<guid isPermaLink="false">https://searchunleashed.wordpress.com/?p=150</guid>
		<description><![CDATA[Very important reminder: the tip below will explain how to remove an item from the index, yet this does not prevent it from being picked up during the next crawl (full/incremental). In order to prevent the item from being crawled &#8230; <a href="http://searchunleashed.wordpress.com/2011/11/15/how-to-force-an-item-to-be-removed-from-the-index-immediately-with-fast-search-for-sharepoint/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=150&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<blockquote><p>Very important reminder: the tip below will explain how to remove an item from the index, yet this does not prevent it from being picked up during the next crawl (full/incremental). In order to prevent the item from being crawled again additional steps must be taken (such as creating a crawl rule for the item). Big thanks to <a href="http://techmikael.blogspot.com/">Mikael Svenson</a> for the reminder!</p></blockquote>
<p>Earlier today I posted an article on Microsoft Learning’s Born to Learn website, and I thought it would be an interesting article for the audience here as well. The article covers <a href="http://borntolearn.mslearn.net/btl/b/weblog/archive/2011/11/15/microsoft-learning-fast-university-how-to-force-an-item-to-be-removed-from-the-index-immediately-with-fast-search-for-sharepoint.aspx">how to force FAST Search for SharePoint to remove an item from the index immediately</a>, without having to wait until the next crawl.</p>
<p>In the article linked above I showed how to obtain the Item ID (or ssic://&lt;id&gt;) using the Crawl Logs, so here in this post I will show how to obtain it directly through the Search Center. This will also be a neat way to show you how to get the raw XML for the search results, which is very helpful when you are troubleshooting issues with your search results (such as to confirm that some property is being returned properly).</p>
<p>The first thing you need to do is execute a query that returns the item you want to remove from the index. Next you will need to customize the Search Core Results Web Part with this XSL (obtained from <a href="http://technet.microsoft.com/en-us/library/ms546985.aspx">this MSDN article on how to get XML search results</a>):</p>
<pre class="brush: xml; title: ; notranslate">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;
&lt;xsl:output method=&quot;xml&quot; version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; indent=&quot;yes&quot;/&gt;
&lt;xsl:template match=&quot;/&quot;&gt;
&lt;xmp&gt;&lt;xsl:copy-of select=&quot;*&quot;/&gt;&lt;/xmp&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
<p>These are the detailed steps to configure the Search Core Results Web Part to use this XSL as well as return the additional property we need to run the command to remove the item from the index:
<p>1) While editing the Search Core Results Web Part, uncheck the option &#8220;Use Location Visualization&#8221; under Display Properties</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="Use Location Visualization" border="0" alt="Use Location Visualization" src="http://searchunleashed.files.wordpress.com/2011/11/image.png?w=223&#038;h=133" width="223" height="133"></p>
<p>2) Click to open the XSL Editor and replace all the contents of the XSL entered there with the XSL listed above</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="XSL Editor" border="0" alt="XSL Editor" src="http://searchunleashed.files.wordpress.com/2011/11/image1.png?w=650&#038;h=550" width="650" height="550"></p>
<p>3) Edit the Fetched Properties parameter, adding the following entry right after the opening &lt;Columns&gt; tag (and before the &lt;/Columns&gt; tag):&nbsp; <strong>&lt;Column Name=&#8221;contentid&#8221;/&gt;</strong> </p>
<p>4) Confirm the changes to the web part, save the page and check the new output of your search results page showing the full XML with the additional properties you wanted</p>
<p><a href="http://searchunleashed.files.wordpress.com/2011/11/image2.png"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="Search results" border="0" alt="Search results" src="http://searchunleashed.files.wordpress.com/2011/11/image_thumb.png?w=644&#038;h=458" width="644" height="458"></a></p>
<p>Now, with the <em>contentid</em> in hands we can execute the command to remove this item from the index:</p>
<blockquote><p>docpush -c sp -U -d <strong>ssic://4583</strong></p>
</blockquote>
<p>And with that you have a way to remove items from the FAST Search for SharePoint index whenever you need. It does take <em>some </em>work, but at least now you have a way to accomplish that. <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/11/wlemoticon-smile.png?w=640"></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/150/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/150/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=150&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2011/11/15/how-to-force-an-item-to-be-removed-from-the-index-immediately-with-fast-search-for-sharepoint/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/11/image.png" medium="image">
			<media:title type="html">Use Location Visualization</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/11/image1.png" medium="image">
			<media:title type="html">XSL Editor</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/11/image_thumb.png" medium="image">
			<media:title type="html">Search results</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/11/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>
	</item>
		<item>
		<title>Understanding Crawled Properties, Managed Properties and Full-text Index &#8211; Part 1</title>
		<link>http://searchunleashed.wordpress.com/2011/08/24/understanding-crawled-properties-managed-properties-and-full-text-index-part-1/</link>
		<comments>http://searchunleashed.wordpress.com/2011/08/24/understanding-crawled-properties-managed-properties-and-full-text-index-part-1/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 17:50:26 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>

		<guid isPermaLink="false">https://searchunleashed.wordpress.com/2011/08/24/understanding-crawled-properties-managed-properties-and-full-text-index-part-1/</guid>
		<description><![CDATA[A while ago I received an email from a previous student asking my thoughts about an issue in his environment. He was crawling some content that had a metadata he wanted to use as the title for the items indexed, &#8230; <a href="http://searchunleashed.wordpress.com/2011/08/24/understanding-crawled-properties-managed-properties-and-full-text-index-part-1/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=137&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A while ago I received an email from a previous student asking my thoughts about an issue in his environment. He was crawling some content that had a metadata he wanted to use as the title for the items indexed, but even after doing the proper mapping to the Title managed property he was still not able to get his custom title to be used and appear in the results. “What could be going on?”, he wondered.</p>
<p>My first thought after reading his email was “I think I know what’s the issue”, followed closely by “Maybe I should look for another profession, since I was supposed to have taught this clearly during class”. And then I proceeded to explain to him what I will explain to you in this series of posts about Crawled Properties, Managed Properties, Full-text index, and how they all work together to make your search work.</p>
<p>To get to the bottom of this issue, we need to understand three concepts perfectly clear (<em>ok, to tell you the truth you need to understand only two concepts for this particular issue, but the third one is also very important, so please play along <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile1.png?w=640"></em>):</p>
<ul>
<li>Crawled Properties (this post – Part 1)
<li>Managed Properties (future post – Part 2)
<li>Full-text Index (future post – Part 3)</li>
</ul>
<h3>Crawled Properties</h3>
<p>As the <a href="http://office.microsoft.com/en-us/fast-search-server-help/property-management-HA010382016.aspx">official documentation</a> states: “Crawled properties are automatically extracted from crawled content and grouped by category based on the protocol handler or IFilter used.”</p>
<p>What this means is that crawled properties are metadata associated with your content (such as <em>title</em> and <em>url</em>), which can be found in two ways: during <em>crawling</em> of the content and during <em>content processing</em>. </p>
<h4>Crawled properties found during crawling</h4>
<p>When you crawl content from a SharePoint Document Library, each column in the library is metadata associated to the corresponding document, therefore they end up being exposed as crawled properties. For example, in the document library below, each column will be exposed as a crawled property, including my custom column <em>Department</em>:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="Document Library with custom column" border="0" alt="Document Library with custom column" src="http://searchunleashed.files.wordpress.com/2011/08/document-library-with-custom-column.jpg?w=646&#038;h=138" width="646" height="138"></p>
<p>After my first crawl of this content, I can quickly check that my custom column was exposed as a crawled property either through PowerShell or through Central Administration. To do it through PowerShell, I can run this simple cmdlet:</p>
<blockquote><p>Get-FASTSearchMetadataCrawledProperty –Name ows_Department</p>
</blockquote>
<p><a href="http://searchunleashed.files.wordpress.com/2011/08/image3.png"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="image" border="0" alt="image" src="http://searchunleashed.files.wordpress.com/2011/08/image_thumb3.png?w=652&#038;h=142" width="652" height="142"></a></p>
<p>In my case, it was easy to locate my property, since I already knew that all custom columns in a SharePoint document library are exposed as crawled properties with the prefix “ows_”, and they are also assigned to the category SharePoint. Now, what if I did not know this, or if this metadata was coming from somewhere else?</p>
<p>Then I could simply use the name of the column to do a wildcard search for it:</p>
<blockquote><p>Get-FASTSearchMetadataCrawledProperty –Name *department</p>
</blockquote>
<h4>Crawled properties found during content processing</h4>
<p>The second way crawled properties can be found is during content processing. One example are the metadata properties you can define for Office documents, such as the ones shown below for a Microsoft Word document:</p>
<p><a href="http://searchunleashed.files.wordpress.com/2011/08/word-document-properties.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="Word document properties" border="0" alt="Word document properties" src="http://searchunleashed.files.wordpress.com/2011/08/word-document-properties_thumb.jpg?w=270&#038;h=429" width="270" height="429"></a></p>
<p>Each one of the properties shown above is also extracted during content processing and exposed as a crawled property. Take the property <em>Comments</em> shown above, for example. Any information stored in this property will be extracted and exposed through the crawled property <em>Office:6(Text)</em>.</p>
<p>Now you may be asking yourself: how am I supposed to know that a crawled property named <em>Office:6(Text)</em> will contain the values of the metadata property <em>Comments</em> from an Office document?</p>
<p>The only answer I have for that is <a href="http://blogs.technet.com/b/anneste/archive/tags/crawled+properties/">this brilliant series of posts from Anne Stenberg</a> where she reverse-engineer the out-of-the-box crawled properties and their mappings to figure out what metadata they represent. In my case, all I had to do was check <a href="http://blogs.technet.com/b/anneste/archive/2008/11/25/mystery-solved-crawled-properties-in-sharepoint-part-4.aspx">this post here</a> from her series that talks specifically about the crawled properties for Office documents.</p>
<p><em>Note: even though Anne’s articles mentioned above are all about SharePoint 2007 (MOSS 2007), so far they have been spot on every time I needed to find a specific crawled property in FAST Search for SharePoint.</em></p>
<h4>Closing thoughts</h4>
<p>Ok, so now we know that crawled properties are metadata found either through crawling or through content processing, and we also know how to identify specific crawled properties associated with our content. The real deal though will come from understanding how we can use these crawled properties during search, such as you can see bellow, where I’m searching for the contents I entered into the Comments property of my Word document:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="Search for Comments" border="0" alt="Search for Comments" src="http://searchunleashed.files.wordpress.com/2011/08/search-for-comments.jpg?w=644&#038;h=231" width="644" height="231"></p>
<p>And this will be the subject of the next post in this series: Managed Properties. See you then! <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile1.png?w=640"></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/137/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=137&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2011/08/24/understanding-crawled-properties-managed-properties-and-full-text-index-part-1/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile1.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/document-library-with-custom-column.jpg" medium="image">
			<media:title type="html">Document Library with custom column</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/image_thumb3.png" medium="image">
			<media:title type="html">image</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/word-document-properties_thumb.jpg" medium="image">
			<media:title type="html">Word document properties</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/search-for-comments.jpg" medium="image">
			<media:title type="html">Search for Comments</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile1.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>
	</item>
		<item>
		<title>Working with FAST Search for SharePoint and Multivalued Properties</title>
		<link>http://searchunleashed.wordpress.com/2011/08/19/working-with-fast-search-for-sharepoint-and-multivalued-properties/</link>
		<comments>http://searchunleashed.wordpress.com/2011/08/19/working-with-fast-search-for-sharepoint-and-multivalued-properties/#comments</comments>
		<pubDate>Fri, 19 Aug 2011 04:11:20 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>
		<category><![CDATA[pipeline extensibility]]></category>

		<guid isPermaLink="false">https://searchunleashed.wordpress.com/?p=113</guid>
		<description><![CDATA[Imagine the following scenario. You have some content in a database (or file share, or SharePoint site), and this content has some metadata that is comprised of multiple values for one specific field, such as a list of authors in &#8230; <a href="http://searchunleashed.wordpress.com/2011/08/19/working-with-fast-search-for-sharepoint-and-multivalued-properties/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=113&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Imagine the following scenario. You have some content in a database (or file share, or SharePoint site), and this content has some metadata that is comprised of multiple values for one specific field, such as a list of authors in a book, or a list of contributors for a project, or even a list of departments associated with an item. Your first question is: how do I configure this multivalued property (author, contributors, departments) to be crawled by FAST Search for SharePoint (FS4SP)?</p>
<p>After some thinking, you decide to return all of those values inside this one field, using a separator such as a semi-colon as a delimiter between each individual value. You run a full crawl against this content, find the crawled property associated with this multivalued metadata, map it to a new managed property and expose it in a refiner. All beautiful, correct?</p>
<p>Well, not quite. When you look at your resulting refiner, this is what you see:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FS4SP Multivalued Refiners" border="0" alt="FS4SP Multivalued Refiners" src="http://searchunleashed.files.wordpress.com/2011/08/image_thumb.png?w=168&#038;h=284" width="168" height="284"></p>
<p>Notice how, instead of considering each individual value in the property, FS4SP is considering the <em>whole</em> property as one big value, which results in the refiner counters being all off.</p>
<p>The issue here is that FS4SP doesn’t <em>know</em> that this is a multivalued property, as the semi-colon is not a separator that it recognizes for multivalued items. To be able to get FS4SP to recognize your multivalued property and display the refiners correctly, you will need to follow a few steps:</p>
<ol>
<li><a href="#configure-managed-property">Configure the Managed Property with the correct options</a>
<li><a href="#create-custom-processing-component">Create a custom processing component to apply the correct multivalued character separator </a>
<li><a href="#configure-pipeline-extensibility">Configure the Pipeline Extensibility to call your custom processing component and re-crawl your content</a>
<li><a href="#troubleshooting">Troubleshooting (lets hope this won’t be needed <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile.png?w=640">)</a></li>
</ol>
<p><a name="configure-managed-property"></a><br />
<h3>Configure the Managed Property with the correct options</h3>
<p>The first thing you have to do is configure your multivalued Managed Property with the option <em>MergeCrawledProperties</em> set to true. You can do this through PowerShell using the <em><a href="http://technet.microsoft.com/en-us/library/ff393811.aspx">Set-FASTSearchMetadataManagedProperty</a></em> cmdlet, or you can do this through Central Administration, as shown below:</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FS4SP MergeCrawledProperties setting" border="0" alt="FS4SP MergeCrawledProperties setting" src="http://searchunleashed.files.wordpress.com/2011/08/image_thumb1.png?w=644&#038;h=185" width="644" height="185"></p>
<p>This is detailed in the <a href="http://msdn.microsoft.com/en-us/library/ff464344.aspx#schema_managed_property">MSDN documentation for the ManagedProperty Interface</a>, where it defines:</p>
<blockquote><p><strong>MergeCrawledProperties</strong></p>
<p>Specifies whether to include the contents of all crawled properties mapped to a managed property. If this setting is disabled, the value of the first non-empty crawled property is used as the contents of the managed property.
<p><strong>This property must also be set to</strong> True <strong>to include all values from a multivalued crawled property. If set to</strong> False<strong>, only the first value from a multivalued crawled property is mapped to the managed property.</strong> </p>
</blockquote>
<p><a name="create-custom-processing-component"></a><br />
<h3>Create a custom processing component to apply the correct multivalued character separator</h3>
<p>As I mentioned above, the main issue with the semi-colon character used as a separator is that FS4SP doesn’t recognize it as a multivalued separator, so in order to do this correctly you must create a custom processing component (in C#, in PowerShell, or any other language) that can replace the simple string separator (in this case the semi-colon), with the special multivalued separator that FS4SP can recognize (&#8220;\u2029&#8243;). The detailed <a href="http://msdn.microsoft.com/library/ff795801.aspx">procedure to incorporate a custom processing component is detailed in this reference on MSDN</a>.</p>
<p>In my specific case, I followed the great steps described by <a href="http://techmikael.blogspot.com/2011/03/prototyping-pipeline-stages-in.html">Mikael Svenson on how to use PowerShell to create quick-and-easy custom processing components</a>. This proved to be a very quick approach to get my customization in place and be able to test it very quickly. Still, you should do this only for prototyping, as Mikael describes, because there is a performance penalty associated with the use of PowerShell, so it is recommended that you “port the code over to e.g. C# when you are done testing your code”.</p>
<p>My final custom code (directly inspired by Mikael&#8217;s post) to replace the semi-colon separator with the proper multivalued separator is shown below:</p>
<p><pre class="brush: powershell; highlight: [34,35,36,37,38,39,40,46,47,48,49,50]; title: ; notranslate">
function CreateXml()
{
    param ([string]$set, [string]$name, [int]$type, $value)

    $resultXml = New-Object xml
    $doc = $resultXml.CreateElement(&quot;Document&quot;)

    $crawledProperty = $resultXml.CreateElement(&quot;CrawledProperty&quot;)
    $propSet = $resultXml.CreateAttribute(&quot;propertySet&quot;)
    $propSet.innerText = $set
    $propName = $resultXml.CreateAttribute(&quot;propertyName&quot;)
    $propName.innerText = $name
    $varType = $resultXml.CreateAttribute(&quot;varType&quot;)
    $varType.innerText = $type

    $crawledProperty.Attributes.Append($propSet) &gt; $null
    $crawledProperty.Attributes.Append($propName) &gt; $null
    $crawledProperty.Attributes.Append($varType) &gt; $null

    $crawledProperty.innerText = $value

    $doc.AppendChild($crawledProperty) &gt; $null
    $resultXml.AppendChild($doc) &gt; $null
    $xmlDecl = $resultXml.CreateXmlDeclaration(&quot;1.0&quot;, &quot;UTF-8&quot;, &quot;&quot;)
    $el = $resultXml.psbase.DocumentElement
    $resultXml.InsertBefore($xmlDecl, $el) &gt; $null

    return $resultXml
}

function DoWork()
{
    param ([string]$inputFile, [string]$outputFile)    
    $propertyGroupIn = &quot;00130329-0000-0130-c000-000000131346&quot; # SharePoint Crawled Property Category
    $propertyNameIn = &quot;ows_DepartmentTest&quot; # property name
    $dataTypeIn = 31 # string

    $propertyGroupOut = &quot;00130329-0000-0130-c000-000000131346&quot; # SharePoint Crawled Property Category
    $propertyNameOut = &quot;ows_DepartmentTest&quot; # property name
    $dataTypeOut = 31 # string

    $xmldata = [xml](Get-Content $inputFile -Encoding UTF8)
    $node = $xmldata.Document.CrawledProperty | Where-Object {  $_.propertySet -eq $propertyGroupIn -and  $_.propertyName -eq $propertyNameIn -and $_.varType -eq $dataTypeIn }
    $data = $node.innerText

    [char]$multivaluedsep = 0x2029
    [char]$currentsep = ';'
    
    #Replace current separator (semi-colon) with special multivalued separator
    $data = $data.Replace($currentsep, $multivaluedsep)
    
    $resultXml = CreateXml $propertyGroupOut $propertyNameOut $dataTypeOut $data
    $resultXml.OuterXml | Out-File $outputFile -Encoding UTF8
    
    #Copy-Item $inputFile C:\Users\Administrator\AppData\LocalLow
}

# pass input and output file paths as arguments
DoWork $args[0] $args[1]
</pre>
</p>
<p>The first highlighted section above (lines 34 through 40) show the section which defines the input crawled property that will contain the items with the semi-colon separator, as well as the output crawled property that will store the updated content with the correct multivalued separator. In my case, both properties are the same, since I simply want to do an in-place replacement.</p>
<p>The second highlighted section (lines 46 through 50) shows the definitions for the current separator (semi-colon) and for the multivalued separator (0&#215;2029 in PowerShell). In the following line the replacement for the correct separator is applied in the input crawled property string.</p>
<p><a name="configure-pipeline-extensibility"></a><br />
<h3>Configure the Pipeline Extensibility to call your custom processing component and re-crawl your content</h3>
<p>The next important step is to tell FS4SP that you want to call your custom processing component during content processing. To do this you must configure the <a href="http://msdn.microsoft.com/en-us/library/ff795801.aspx#pipeline-ext-custom">%FASTSearch%\etc\pipelineextensibility.xml configuration file</a>. This is how this file looked on my system:</p>
<p>
<pre class="brush: xml; title: ; notranslate">
&lt;!-- For permissions and the most current information about FAST Search Server 2010 for SharePoint configuration files, see the online documentation, (http://go.microsoft.com/fwlink/?LinkId=1632279). --&gt;

&lt;PipelineExtensibility&gt;
	&lt;Run command=&quot;C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe C:\FASTSearch\bin\multivalued.ps1 %(input)s %(output)s&quot;&gt;
		&lt;Input&gt;      
			&lt;CrawledProperty propertySet=&quot;00130329-0000-0130-c000-000000131346&quot; varType=&quot;31&quot; propertyName=&quot;ows_DepartmentTest&quot;/&gt;
		&lt;/Input&gt;
		&lt;Output&gt;
			&lt;CrawledProperty propertySet=&quot;00130329-0000-0130-c000-000000131346&quot; varType=&quot;31&quot; propertyName=&quot;ows_DepartmentTest&quot;/&gt;
		&lt;/Output&gt;
	&lt;/Run&gt;
&lt;/PipelineExtensibility&gt;
</pre>
</p>
<p>As you can see above, all I’m doing is defining that I want my custom PowerShell script to be called, receiving as an input crawled property my property that contains the contents with the semi-colon separator and then returning as output the same crawled property, in order to just replace its contents with the new-and-updated value, now using the multivalued separator.</p>
<p>After saving this configuration file, the next step is to force your Document Processors to reload their configuration so they can be aware of this new content processing component, which you can accomplish by executing <em><a href="http://technet.microsoft.com/en-us/library/ee943506.aspx">psctrl reset</a></em> in a command prompt.</p>
<p>With all the pieces in place, you can start a re-crawl of your content and then test your refiner after crawl is complete. If all goes well, your refiner should now look exactly like you wanted!</p>
<p><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="FS4SP Multivalued Refiner - Correct" border="0" alt="FS4SP Multivalued Refiner - Correct" src="http://searchunleashed.files.wordpress.com/2011/08/image_thumb2.png?w=157&#038;h=155" width="157" height="155"></p>
<p><a name="troubleshooting"></a><br />
<h3>Troubleshooting</h3>
<p>My main warning is that you pay a lot of attention to the fact that the <strong>name of your input and output crawled properties</strong> (both in the pipelineextensibility.xml and in the PowerShell script) <strong>are case-sensitive</strong>.</p>
<p>Many people have spent a very long time troubleshooting their code only to realize that it was a case-sensitive issue with the name of these properties. The best way I found to troubleshoot a new custom processing component is through these techniques:</p>
<ol>
<li>Investigate the contents of the input file sent to your custom code: as <a href="http://blogs.msdn.com/b/thomsven/archive/2010/09/23/debugging-and-tracing-fast-search-pipeline-extensibility-stages.aspx">described in this post</a>, the only path in the file system with full access for your custom code is the AppData\LocalLow directory for the account running the FAST Search Service. By uncommenting line 55 in the PowerShell script above, a copy of the input file received by the script will be created in the AppData\LocalLow directory. By looking at the contents of the input file you can detect what is the content of the input crawled property. If the input crawled property doesn’t contain any value, and you are sure that your document has that property, check for issues wite case-sensitive property names.
<li>Validate the list of crawled properties received by FS4SP: you can accomplish this <a href="http://techmikael.blogspot.com/2011/03/using-ffddumper-to-log-items-in-custom.html">through the use of the optional processing stage FFDDumper</a>.
<li>If both options 1 and 2 look ok, use the input file from step 1 to call your custom code directly and debug it to identify the error (<a href="http://blogs.msdn.com/b/powershell/archive/2009/01/19/debugging-powershell-script-using-the-ise-editor.aspx">you can debug the PowerShell script above using the ISE Editor</a>)</li>
</ol>
<p>And that&#8217;s it for today. Enjoy your coding and your multivalued properties! <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile.png?w=640"></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/113/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=113&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2011/08/19/working-with-fast-search-for-sharepoint-and-multivalued-properties/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/image_thumb.png" medium="image">
			<media:title type="html">FS4SP Multivalued Refiners</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/image_thumb1.png" medium="image">
			<media:title type="html">FS4SP MergeCrawledProperties setting</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/image_thumb2.png" medium="image">
			<media:title type="html">FS4SP Multivalued Refiner - Correct</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/08/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>
	</item>
		<item>
		<title>SharePoint Search and FAST Search for SharePoint Architecture Diagrams &#8211; Fault Tolerance and Performance</title>
		<link>http://searchunleashed.wordpress.com/2011/07/15/sharepoint-search-and-fast-search-for-sharepoint-architecture-diagrams-fault-tolerance-and-performance/</link>
		<comments>http://searchunleashed.wordpress.com/2011/07/15/sharepoint-search-and-fast-search-for-sharepoint-architecture-diagrams-fault-tolerance-and-performance/#comments</comments>
		<pubDate>Fri, 15 Jul 2011 14:05:56 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>
		<category><![CDATA[SP2010]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[fault tolerance]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">https://searchunleashed.wordpress.com/2011/07/15/sharepoint-search-and-fast-search-for-sharepoint-architecture-diagrams-fault-tolerance-and-performance/</guid>
		<description><![CDATA[Update: For those interested in watching a presentation of this content below you can download (right-click and select “Save target as..”) and watch this video here (200+ MB) that was recorded during a webcast on 2011-07-27. My presentation starts at &#8230; <a href="http://searchunleashed.wordpress.com/2011/07/15/sharepoint-search-and-fast-search-for-sharepoint-architecture-diagrams-fault-tolerance-and-performance/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=102&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>Update: For those interested in watching a presentation of this content below you can download (right-click and select “Save target as..”) and </em><a href="https://fast.omnisocial.mzinga.com/content/ss/eda339aac39b4c42a7f089442bfd62a8/_Samples/Webcast-ArchitectureOfSearch.wmv"><em>watch this video here</em></a><em> (200+ MB) that was recorded during a webcast on 2011-07-27. My presentation starts at 6min20sec.</em></p>
<p>In previous posts I showed and explained a few <a href="http://searchunleashed.wordpress.com/2011/05/10/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-1-search-101-and-architecture/">architecture diagrams of search in SharePoint 2010 for both SharePoint Search and FAST Search for SharePoint</a>, I shared my all-time-favorite resource on SharePoint Search Architecture and Scale <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/16/search-2010-architecture-and-scale-part-1-crawl.aspx">for crawl</a> and <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx">query</a>, and (hopefully) helped you <a href="http://searchunleashed.wordpress.com/2011/03/16/understand-scale-and-monitor-crawling-processing-indexing-in-fast-search-for-sharepoint/">understand, scale and monitor Crawling / Processing / Indexing in FAST Search for SharePoint</a>.</p>
<p>What I will try to do in this post is convert most of that content into additional diagrams that should help you “see” how these changes related to fault tolerance and/or performance affect your search diagram.</p>
<p>These are the architecture diagrams discussed in this post:</p>
<p>SharePoint Search</p>
<ul>
<li><a href="#spsearch-querycomponent-ft">Query Component (Fault Tolerance)</a>
<li><a href="#spsearch-querycomponent-perf">Query Component (Performance)</a>
<li><a href="#spsearch-propertydb-perf">Property db (Performance)</a>
<li><a href="#spsearch-queryprocessor-ftperf">Query Processor (Fault Tolerance and Performance)</a>
<li><a href="#spsearch-crawlcomponent-ftperf">Crawl Component (Fault Tolerance and Performance)</a>
<li><a href="#spsearch-crawlcomponentcrawldb-perf">Crawl Component and Crawl db (Performance)</a></li>
</ul>
<p>FAST Search for SharePoint</p>
<ul>
<li><a href="#fs4sp-contentprocessing">Content Processing (Fault Tolerance and Performance)</a>
<li><a href="#fs4sp-indexer-ft">Indexer (Fault Tolerance)</a>
<li><a href="#fs4sp-indexer-perf">Indexer (Performance)</a>
<li><a href="#fs4sp-indexersearch-ft">Indexer and Search (Fault Tolerance)</a>
<li><a href="#fs4sp-queryprocessing-ft">Query Processing (Fault Tolerance)</a></li>
</ul>
<p><a name="spsearch-querycomponent-ft"></a><br />
<h3>SharePoint Search – Query Component (Fault Tolerance)</h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-component-fault-tolerance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Fault Tolerance)" border="0" alt="SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Fault Tolerance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-component-fault-tolerance_thumb.jpg?w=489&#038;h=465" width="489" height="465"></a></p>
<p>In this diagram you see how your architecture would look like after you add a new mirror Query Component for an existing Index Partition, which you do in order to provide fault tolerance for your lookup of matched items for full-text search queries against your index. The reasons for doing that are pretty simple (<a href="http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx">and detailed in here</a>): one server goes down, the other can still keep serving queries, and unless you configure the mirror server as “failover only” it will also distribute the load of incoming queries.</p>
<p><a name="spsearch-querycomponent-perf"></a><br />
<h3>SharePoint Search – Query Component (Performance)</h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-component-performance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Performance)" border="0" alt="SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-component-performance_thumb.jpg?w=490&#038;h=466" width="490" height="466"></a></p>
<p>In this diagram there is just a very subtle change from the previous one (marked in red), but it makes a lot of difference in your architecture: the additional Query Component has a different Index Partition. What this means is that now your content is divided between the two Index Partitions, so if for example you have a total of 6 million indexed items, then each Index Partition has 3 million items. This also means that your Query Processor will send requests in parallel to both Query Components and, since each one of them has to search against only half of the index (3 million out of 6 million total), they will be able to do this faster.</p>
<p><a href="http://technet.microsoft.com/en-us/library/cc262787.aspx#Search">The supported number of indexed items is 100 million per search service application and 10 million for each Index Partition.</a></p>
<p><a name="spsearch-propertydb-perf"></a><br />
<h3>SharePoint Search – Property db (Performance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-property-db-performance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - SharePoint Search Architecture Diagram - Property Db (Performance)" border="0" alt="SharePoint 2010 - SharePoint Search Architecture Diagram - Property Db (Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-property-db-performance_thumb.jpg?w=637&#038;h=389" width="637" height="389"></a></p>
<p>Here things start to get interesting, with not only a new Query Component/Index Partition, but also with a new Property db (added items marked in red). If you read <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx">this post</a> (mentioned a dozen times by now <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/07/wlemoticon-smile.png?w=640">) you understand that in order to provide search results, the Query Processor need to perform a lookup not only in the Index Partition but also in the Property db in order to retrieve the metadata associated with the results found. When you start to increase your indexed content, for example by having 20M items that you then split across 2 Index Partitions to improve your index lookup time, it may happen that your Property db is now your bottleneck. A way to minimize this impact in the growing number of indexed items is by adding a new Property db and assigning a new Query Component/Index Partition to it. This way, each combination of Index Partition/Property db has to store and handle search requests for only half of the total number of indexed items.</p>
<p>It is also important to notice that all search-related databases (Property db, Search Admin db and Crawl db) can be configured for fault tolerance through the use of <a href="http://blogs.technet.com/b/wbaer/archive/2010/05/03/database-mirroring-in-sharepoint-2010.aspx">database mirroring</a>.</p>
<p><a name="spsearch-queryprocessor-ftperf"></a><br />
<h3>SharePoint Search – Query Processor (Fault Tolerance and Performance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-processor-fault-tolerance-and-p.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - SharePoint Search Architecture Diagram - Query Processor (Fault Tolerance and Performance)" border="0" alt="SharePoint 2010 - SharePoint Search Architecture Diagram - Query Processor (Fault Tolerance and Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-processor-fault-tolerance-and-p1.jpg?w=631&#038;h=434" width="631" height="434"></a></p>
<p>Even after you have scaled your Query Components, your Index Partitions, your Property dbs, another query component that may require your attention is the Query Processor. This is the component that does the hard work of accessing the Query Component (to check items that match the query), the Property db (to get metadata associated with those items) and the Search Admin db (to get security descriptors in order to apply security trimming in the results). By adding a new Query Processor (marked in red and <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx">described in here</a>), you divide the load of this task across multiple servers, increasing your query performance and providing fault tolerance (if one goes down, the other can still handle queries).</p>
<p><a name="spsearch-crawlcomponent-ftperf"></a><br />
<h3>SharePoint Search – Crawl Component (Fault Tolerance and Performance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-crawl-component-fault-tolerance-and-p.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component (Fault Tolerance and Performance)" border="0" alt="SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component (Fault Tolerance and Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-crawl-component-fault-tolerance-and-p1.jpg?w=636&#038;h=315" width="636" height="315"></a></p>
<p>Now let’s take a look at the other side of search: Crawling/Processing/Indexing. You can notice a new Crawl Component that was added in the diagram above, now what does this mean? This means that both Crawl Components will split the load of crawling the content sources defined, and both will keep pulling from and updating the crawling queue stored in the Crawl db. For example, if your full crawl with one Crawl Component and one Crawl db was taking 4 days, by adding another Crawl Component (and considering you have sufficient CPU/Memory/IO/bandwidth/etc. resources) the same full crawl should be reduced to around 2 days. Also, with two Crawl Components working from the same Crawl db, you also get fault tolerance in case one of them goes down.</p>
<p><a name="spsearch-crawlcomponentcrawldb-perf"></a><br />
<h3>SharePoint Search – Crawl Component and Crawl db (Performance)</h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-crawl-component-and-crawl-db-performa.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component and Crawl Db (Performance)" border="0" alt="SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component and Crawl Db (Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-crawl-component-and-crawl-db-performa1.jpg?w=639&#038;h=205" width="639" height="205"></a></p>
<p>What happens when you start to add many Crawl Components to the same Crawl db? Well, the db can easily become your bottleneck. One way to keep scaling out and increasing your crawling performance is through the use of an additional set of Crawl Component/Crawl db, as shown in the diagram above. In this way, distinct content sources (web applications, web sites, file shares, etc.) will be split among these two Crawl dbs, and their respective Crawl Components will have to handle (crawl/process/index) only part of the content, making it easier to deal with.</p>
<p>There are a lot of things that go into this, from how content to be crawled is split among multiple Crawl dbs to how you can manually define this mapping yourself (if you want to). All of this and more are detailed <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/16/search-2010-architecture-and-scale-part-1-crawl.aspx">in this post here</a>.</p>
<p><a name="fs4sp-contentprocessing"></a><br />
<h3>FAST Search for SharePoint – Content Processing (Fault Tolerance and Performance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-content-processing-fault-tolerance-and-perf.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - FAST Search Architecture Diagram - Content Processing (Fault Tolerance and Performance)" border="0" alt="SharePoint 2010 - FAST Search Architecture Diagram - Content Processing (Fault Tolerance and Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-content-processing-fault-tolerance-and-perf1.jpg?w=625&#038;h=390" width="625" height="390"></a></p>
<p>Since we are starting with c<em>ontent processing</em> You may be asking “what about the <em>crawling</em> part of FAST Search?”. Well, the good news is that if you are using the FAST Content SSA to crawl your content, then your crawling architecture looks pretty much like what we just saw for SharePoint Search above. The main difference is that the FAST Content SSA will be tasked only with crawling, since processing and indexing will be done in the FAST Search farm. And talking about content processing, the first component that can be scaled out is the Content Distributor (as shown above in red). What this gives you is just fault tolerance, since the FAST Content SSA will connect and send batches to only one Content Distributor at a time, and will switch to the other one just in case of failure to submit batches to the “primary” Content Distributor (you also must make sure to <a href="http://technet.microsoft.com/en-us/library/ff381261.aspx">configure the FAST Content SSA</a> listing both Content Distributors).</p>
<p>In regards to Document Processors, you will definitely have more than one (you get 4 of them by default in a simple installation), which gives both fault tolerance (in case one of them goes down) and performance (since they will work in parallel). Also, if the “primary” Content Distributor goes down, the Document Processors will be smart enough to switch to the other available Content Distributor.</p>
<p><a name="fs4sp-indexer-ft"></a><br />
<h3>Indexer (Fault Tolerance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-fault-tolerance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Fault Tolerance)" border="0" alt="SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Fault Tolerance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-fault-tolerance_thumb.jpg?w=559&#038;h=536" width="559" height="536"></a></p>
<p>Remember the option to <a href="#spsearch-querycomponent-ft">mirror an Index Partition in SharePoint Search</a> to provide fault tolerance? This is the similar way that FAST Search can do that, but with a name change, since the documentation will refer to this process as <a href="http://technet.microsoft.com/en-us/library/gg482027.aspx">adding a backup indexer row</a>. In this case both Indexers will have the same content, which means that if your primary Indexer goes down, <a href="http://technet.microsoft.com/en-us/library/gg482021.aspx">the backup Indexer can be configured to become the new primary Indexer</a>.</p>
<p><a name="fs4sp-indexer-perf"></a><br />
<h3>Indexer (Performance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-performance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Performance)" border="0" alt="SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Performance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-performance_thumb.jpg?w=557&#038;h=534" width="557" height="534"></a></p>
<p>In the diagram above, instead of adding a new backup Indexer for fault tolerance, it was <a href="http://technet.microsoft.com/en-us/library/gg482015.aspx">added a new Indexer column</a> to increase the volume of indexed content that can be stored in your search farm. In this scenario your content will be divide among the two Indexer columns (very similar to how we <a href="#spsearch-querycomponent-perf">divided the content into separate Index Partitions for SharePoint Search</a>).</p>
<p><a href="http://technet.microsoft.com/en-us/library/gg702617.aspx">The official guideline is to have one Indexer column for each 15 million items to index</a>.</p>
<p><a name="fs4sp-indexersearch-ft"></a><br />
<h3>Indexer and Search (Fault Tolerance) </h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-and-search-fault-tolerance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - FAST Search Architecture Diagram - Indexer and Search (Fault Tolerance)" border="0" alt="SharePoint 2010 - FAST Search Architecture Diagram - Indexer and Search (Fault Tolerance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-and-search-fault-tolerance_thumb.jpg?w=578&#038;h=657" width="578" height="657"></a></p>
<p>Above is the diagram of a somewhat common deployment of FAST Search for SharePoint, where you have two servers and each one is configured with a combination of Indexer and Search in a way that one server is the primary Indexer and backup Search, and the other server is backup Indexer and primary Search. In this way, with just your two servers you are providing fault tolerance for both Indexer and Search.</p>
<p><a name="fs4sp-queryprocessing-ft"></a><br />
<h3>Query Processing (Fault Tolerance)</h3>
<p><a href="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-query-processing-fault-tolerance.jpg"><img style="background-image:none;border-bottom:0;border-left:0;padding-left:0;padding-right:0;display:inline;border-top:0;border-right:0;padding-top:0;" title="SharePoint 2010 - FAST Search Architecture Diagram - Query Processing (Fault Tolerance)" border="0" alt="SharePoint 2010 - FAST Search Architecture Diagram - Query Processing (Fault Tolerance)" src="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-query-processing-fault-tolerance_thumb.jpg?w=579&#038;h=522" width="579" height="522"></a></p>
<p>In this diagram above a Query Processing server (with QRServer, QRProxy and FSA Worker components) was added to the FAST Search farm and also <a href="http://technet.microsoft.com/en-us/library/ff381251.aspx">properly configured in the FAST Query SSA by listing both servers in its setup</a>. With this configuration, queries will be sent to both servers in a round robin fashion, and if one of the servers fails the FAST Query SSA will keep sending queries just to the active server.</p>
<h3>Conclusion</h3>
<p>There is a lot you can configure in both SharePoint Search and FAST Search for SharePoint to increase performance and/or provide fault tolerance for components of your search farm. The important thing is to understand what options are available for each platform and keep them in mind when you first design your search architecture as well as after your search project is in production, in case you need to scale out your deployment.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/102/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=102&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2011/07/15/sharepoint-search-and-fast-search-for-sharepoint-architecture-diagrams-fault-tolerance-and-performance/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-component-fault-tolerance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Fault Tolerance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-component-performance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - SharePoint Search Architecture Diagram - Query Component (Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-property-db-performance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - SharePoint Search Architecture Diagram - Property Db (Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-query-processor-fault-tolerance-and-p1.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - SharePoint Search Architecture Diagram - Query Processor (Fault Tolerance and Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-crawl-component-fault-tolerance-and-p1.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component (Fault Tolerance and Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-sharepoint-search-architecture-diagram-crawl-component-and-crawl-db-performa1.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - SharePoint Search Architecture Diagram - Crawl Component and Crawl Db (Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-content-processing-fault-tolerance-and-perf1.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - FAST Search Architecture Diagram - Content Processing (Fault Tolerance and Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-fault-tolerance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Fault Tolerance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-performance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - FAST Search Architecture Diagram - Indexer (Performance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-indexer-and-search-fault-tolerance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - FAST Search Architecture Diagram - Indexer and Search (Fault Tolerance)</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/07/sharepoint-2010-fast-search-architecture-diagram-query-processing-fault-tolerance_thumb.jpg" medium="image">
			<media:title type="html">SharePoint 2010 - FAST Search Architecture Diagram - Query Processing (Fault Tolerance)</media:title>
		</media:content>
	</item>
		<item>
		<title>Learning roadmap for Search in SharePoint 2010 (including FAST Search for SharePoint) &#8211; Part 2: Planning, Scale, Installation and Deployment, and Crawling</title>
		<link>http://searchunleashed.wordpress.com/2011/06/11/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-2-planning-scale-installation-and-deployment-and-crawling/</link>
		<comments>http://searchunleashed.wordpress.com/2011/06/11/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-2-planning-scale-installation-and-deployment-and-crawling/#comments</comments>
		<pubDate>Sun, 12 Jun 2011 01:10:28 +0000</pubDate>
		<dc:creator>leonardocsouza</dc:creator>
				<category><![CDATA[FS4SP]]></category>
		<category><![CDATA[SP2010]]></category>
		<category><![CDATA[Learning Roadmap]]></category>

		<guid isPermaLink="false">https://searchunleashed.wordpress.com/2011/06/11/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-2-planning-scale-installation-and-deployment-and-crawling/</guid>
		<description><![CDATA[Did you enjoy your break since our last post in the series, when we finished up with some architecture diagrams for both SharePoint Search and FAST Search for SharePoint? Now let’s have a deeper look into some of those components, &#8230; <a href="http://searchunleashed.wordpress.com/2011/06/11/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-2-planning-scale-installation-and-deployment-and-crawling/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=73&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Did you enjoy your break since our <a href="http://searchunleashed.wordpress.com/2011/05/10/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-1-search-101-and-architecture/">last post in the series</a>, when we finished up with some architecture diagrams for both SharePoint Search and FAST Search for SharePoint? Now let’s have a deeper look into some of those components, focusing on some considerations to properly plan and scale search solutions. Following up, we will cover some installation and deployment topics and then close with crawling. This should be enough to keep you entertained for a few days. <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png?w=640"></p>
<p>In case you want the full list of this roadmap, the planned sections (so far) are the following:</p>
<ul>
<li><a href="http://searchunleashed.wordpress.com/2011/05/10/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-1-search-101-and-architecture#search101">Search 101</a> (previous post)
<li><a href="http://searchunleashed.wordpress.com/2011/05/10/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-1-search-101-and-architecture#architecture">Search Architecture in SharePoint 2010</a> (previous post)
<li><a href="#planning">Planning and Scale</a>&nbsp;
<li><a href="#installation">Installation / Deployment</a>
<li><a href="#crawling">Crawling</a>
<li>Processing (future post)
<li>Indexing (future post)
<li>Searching (future post)</li>
</ul>
<p><a name="planning"></a><br />
<h3>Planning and Scale</h3>
<p>Ready to dig a little deeper into SharePoint Search? Then read these two out-of-this-world articles that explain not only how the architecture of SharePoint Search works, but also how to scale it. Believe me, these two posts have saved me more times than I can count. Extra points for those working with FAST, as <em>almost</em> everything related to the crawling components, including scaling, also applies to FS4SP:</p>
<ul>
<li>Crawling &#8211; <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/16/search-2010-architecture-and-scale-part-1-crawl.aspx">http://blogs.msdn.com/b/russmax/archive/2010/04/16/search-2010-architecture-and-scale-part-1-crawl.aspx</a>
<li>Query &#8211; <a href="http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx">http://blogs.msdn.com/b/russmax/archive/2010/04/23/search-2010-architecture-and-scale-part-2-query.aspx</a></li>
</ul>
<p>In the links above you understood more about the SharePoint Search architecture, now in this next step you can expand your knowledge by looking at how these same things apply to FS4SP. It is important to note that scaling the FAST Query SSA is mostly done for failover reasons, as the hard work done during query time for FS4SP is executed in the FAST farm (and not in the SharePoint farm):</p>
<ul>
<li>Multiple server deployment of the Content SSA (FAST Search Server 2010 for SharePoint) &#8211; <a href="http://technet.microsoft.com/en-us/library/ff599537.aspx">http://technet.microsoft.com/en-us/library/ff599537.aspx</a>
<li>Multiple server deployment of the Query SSA (FAST Search Server 2010 for SharePoint) &#8211; <a href="http://technet.microsoft.com/en-us/library/ff602184.aspx">http://technet.microsoft.com/en-us/library/ff602184.aspx</a></li>
</ul>
<p>Now, if you got to here you understand about the crawling and query components running in the SharePoint farm, either for SharePoint Search or for FS4SP, so it is time to do some deep reading into the product documentation. I know hardly anyone likes to read the documentation (I don’t like it either <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png?w=640">), but there are great nuggets of useful information in the links below that will allow you to understand more about how to design the search solution and topology with FS4SP. The whole piece on performance and capacity management/testing/recommendations under the &#8220;Plan search topology&#8221; section is definitely worth a look (trust me, it will save you valuable time later on):</p>
<ul>
<li>Plan the search solution (FAST Search Server 2010 for SharePoint) &#8211; <a href="http://technet.microsoft.com/en-us/library/ff383310.aspx">http://technet.microsoft.com/en-us/library/ff383310.aspx</a>
<li>Plan search topology (FAST Search Server 2010 for SharePoint) &#8211; <a href="http://technet.microsoft.com/en-us/library/ff599528.aspx">http://technet.microsoft.com/en-us/library/ff599528.aspx</a></li>
</ul>
<p><em>Advanced Material on Planning, Design, High Availability</em>
<p>A scenario that I get inquired about somewhat often is the idea of sharing the search service application across multiple SharePoint farms (something much discussed when you have dispersed SharePoint farms and want to provide a central Search farm). If that caught your attention, first you can read the official documentation, then you can go ahead and check the very detailed blog post covering step-by-step instructions on how to set this up for the User Profile Service Application and Search Service Application. The same principles apply to both SharePoint Search and FS4SP (since you are publishing/consuming the SSAs on the SharePoint farm):</p>
<ul>
<li>Share service applications across farms &#8211; <a href="http://technet.microsoft.com/en-us/library/ff621100.aspx">http://technet.microsoft.com/en-us/library/ff621100.aspx</a>
<li>SharePoint Server 2010 Enterprise Service Application Publishing and Consuming Farms &#8211; <a href="http://www.kowalski.ms/2010/07/16/sharepoint-server-2010-enterprise-service-application-publishing-and-consuming-farms/">http://www.kowalski.ms/2010/07/16/sharepoint-server-2010-enterprise-service-application-publishing-and-consuming-farms/</a></li>
</ul>
<p><a name="installation"></a><br />
<h3>Installation / Deployment</h3>
<p>First, review and understand the steps required to configure search in SharePoint 2010. Even for those that will only work with FAST, this still matters, as a lot of the overall guidance here will also apply to FAST:</p>
<ul>
<li>Post-installation steps for search &#8211; <a href="http://technet.microsoft.com/en-us/library/ee808863.aspx">http://technet.microsoft.com/en-us/library/ee808863.aspx</a></li>
</ul>
<p>After you complete your reading above, you can go ahead and understand the steps required to deploy FS4SP from the official documentation:
<ul>
<li>Deployment for FAST Search Server 2010 for SharePoint &#8211; <a href="http://technet.microsoft.com/en-us/library/ff381267.aspx">http://technet.microsoft.com/en-us/library/ff381267.aspx</a></li>
</ul>
<p>Also, if you are planning to virtualize FS4SP, you better make sure to check the official recommendations here:
<ul>
<li>Recommendations: Virtualization (FAST Search Server 2010 for SharePoint) &#8211; <a title="http://technet.microsoft.com/en-us/library/gg702612.aspx" href="http://technet.microsoft.com/en-us/library/gg702612.aspx">http://technet.microsoft.com/en-us/library/gg702612.aspx</a></li>
</ul>
<p><a name="crawling"></a><br />
<h3>Crawling</h3>
<p>First, learn the basics of configuring a new Content Source to crawl content in SharePoint 2010, since you will have to do this at some point. The best part? Most of what you learn about defining content sources, crawl rules, starting and stopping here is also valid for FS4SP. The video linked below shows the sequence of events when you trigger a full crawl (the part about crawling is the same for both SharePoint Search and FS4SP, but the part about processing and indexing is different in FS4SP)</p>
<ul>
<li>Manage crawling (SharePoint Server 2010) &#8211; <a href="http://technet.microsoft.com/en-us/library/ee792876.aspx">http://technet.microsoft.com/en-us/library/ee792876.aspx</a>
<li>SharePoint Server 2010 Full Crawl Sequence Demo &#8211; <a href="http://www.microsoft.com/resources/msdn/en-us/office/media/video/sharepointestc.html?uuid=f716a6eb-9b74-45fc-acab-a2909f80d2d9&amp;from=mscomsharepoint">http://www.microsoft.com/resources/msdn/en-us/office/media/video/sharepointestc.html?uuid=f716a6eb-9b74-45fc-acab-a2909f80d2d9&amp;from=mscomsharepoint</a></li>
</ul>
<p>For similar information but specific to FS4SP, this is the official documentation:
<ul>
<li>Manage crawling with the FAST Search Content SSA (FAST Search Server 2010 for SharePoint) &#8211; <a href="http://technet.microsoft.com/en-us/library/ff384288.aspx">http://technet.microsoft.com/en-us/library/ff384288.aspx</a></li>
</ul>
<p>If you got through here, but still manage to recall the FS4SP architecture diagram from the previous post, you probably noticed that in FS4SP there are a bunch of new components, each with their own function. As I mentioned above, the crawling piece of FS4SP when you use the FAST Content SSA to define content sources will work the same way as it does for SharePoint Search. Below is one of my previous posts trying to explain the crawling/processing/indexing flow in FS4SP:
<ul>
<li><a href="http://searchunleashed.wordpress.com/2011/03/16/understand-scale-and-monitor-crawling-processing-indexing-in-fast-search-for-sharepoint/">http://searchunleashed.wordpress.com/2011/03/16/understand-scale-and-monitor-crawling-processing-indexing-in-fast-search-for-sharepoint/</a></li>
</ul>
<p>Another difference in FS4SP is the ability to use one of the FAST Search specific connectors (Web content, Database content, Lotus Notes content). Those are the connectors that came from the previous standalone version of the FAST product, and for those non-initiated in FAST administration, they may look a little strange (command line utilities only? xml configuration files?). These FAST Search specific connectors are completely unknown to your SharePoint farm (SharePoint basically doesn&#8217;t even know they exist, as they reside directly on the FAST farm) which means that a SP administrator will not have access to them through Central Administration, so you should be aware of that. My recommendation is that you always try to use the connectors through Central Administration (FAST Content SSA), and go to the FAST Search specific connectors only if you need a specific functionality that you can only get with them (such as the support to Lotus Notes security through the FAST Search Lotus Notes connector):
<ul>
<li>Manage crawling with the FAST Search specific connectors (FAST Search Server 2010 for SharePoint) &#8211; <a href="http://technet.microsoft.com/en-us/library/ff383272.aspx">http://technet.microsoft.com/en-us/library/ff383272.aspx</a></li>
</ul>
<p>Now that you already understand how to crawl standard content with both SharePoint Search and FS4SP, it is time to understand how to bring content from other external sources (beyond Web Sites, File Shares, etc.). So do yourself a big favor and learn about Business Connectivity Services (BCS) in SharePoint 2010. To me this is one of THE most important pieces of technology in SharePoint that can really make search shine, as it integrates with other sources in a company (databases, web services, whatever-you-want) bringing all together inside SharePoint. The best part? It is a technology that works with both SharePoint Search and FS4SP seamlessly. The post below has the most detailed explanation I have ever found on how to create the basic External Content Types (to get content from a database, probably the most common scenario):
<ul>
<li>Searching External Data in SharePoint 2010 Using Business Connectivity Services &#8211; <a href="http://blogs.msdn.com/b/ericwhite/archive/2010/04/28/searching-external-data-in-sharepoint-2010-using-business-connectivity-services.aspx">http://blogs.msdn.com/b/ericwhite/archive/2010/04/28/searching-external-data-in-sharepoint-2010-using-business-connectivity-services.aspx</a></li>
</ul>
<p>If you are looking for extra credits as an applied student (as you should <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png?w=640">), then you can not only learn about BCS for search, but explore the broader capabilities that BCS brings to SharePoint overall, besides search. Believe me, you won&#8217;t regret this.
<ul>
<li>BCS Overview Demo Part 1 of 3 &#8211; <a href="http://www.youtube.com/watch?v=82xzNsG0d5A">http://www.youtube.com/watch?v=82xzNsG0d5A</a>
<li>BCS Overview Demo Part 2 of 3 &#8211; <a href="http://www.youtube.com/watch?v=QUBqpYxkOEo">http://www.youtube.com/watch?v=QUBqpYxkOEo</a>
<li>BCS Overview Demo Part 3 of 3 &#8211; <a href="http://www.youtube.com/watch?v=aC15uqL-V0o">http://www.youtube.com/watch?v=aC15uqL-V0o</a></li>
</ul>
<p><em>Advanced Material on Crawling and Connectors</em>
<p>Through BCS you can also create your own connectors to link SharePoint with any external sources you want. The first post below is a great starting point on this, and is the exact post I first read to understand how this works:</p>
<ul>
<li>HOW TO: Create a Searchable SharePoint 2010 BDC .NET Assembly Connector Which Reads From A Flat File &#8211; <a href="http://www.toddbaginski.com/blog/archive/2009/11/05/how-to-create-a-searchable-sharepoint-2010-bdc-.net-assembly-connector-which-reads-from-a-flat-file.aspx">http://www.toddbaginski.com/blog/archive/2009/11/05/how-to-create-a-searchable-sharepoint-2010-bdc-.net-assembly-connector-which-reads-from-a-flat-file.aspx</a></li>
</ul>
<p>This second reference is a small gem buried on MSDN that explains how to create something that a lot of people want to do, which is to have a connector that aggregates metadata with an attached document and bring both together to be processed and indexed (such as indexing the metadata information for a candidate along with his/hers resume, allowing users to search for both and get just one result). Powerful stuff.
<ul>
<li>Creating .NET Assemblies That Aggregate Data from Multiple External Systems for Business Connectivity Services in SharePoint Server 2010 &#8211; <a href="http://msdn.microsoft.com/en-us/library/ff728359.aspx">http://msdn.microsoft.com/en-us/library/ff728359.aspx</a></li>
</ul>
<p>Another frequently asked question is about the possibility to use BCS to crawl databases other than SQL Server. The article below explains how to do this for Oracle, but gives some clues to the fact that you could do something similar for any other database supporting OLE DB or ODBC:
<ul>
<li>How to: Connect to an Oracle Database Using Business Connectivity Services &#8211; <a href="http://msdn.microsoft.com/library/ff464424(office.14).aspx">http://msdn.microsoft.com/library/ff464424(office.14).aspx</a></li>
</ul>
<p>&nbsp;
<p>This should keep you busy for a while. And remember that if you just want a quick way to get a server to try some of the things you read above, you can always play around with one of the MSDN Virtual Labs instances, such as <a href="http://go.microsoft.com/?linkid=9751769">this one here that will give you a VM with both SharePoint 2010 and FS4SP</a>.
<p>Didn’t understand some of the materials? Have other resources you want to share? By all means, feel free to comment below. <img style="border-style:none;" class="wlEmoticon wlEmoticon-smile" alt="Smile" src="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png?w=640"></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/searchunleashed.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/searchunleashed.wordpress.com/73/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=searchunleashed.wordpress.com&#038;blog=19239959&#038;post=73&#038;subd=searchunleashed&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://searchunleashed.wordpress.com/2011/06/11/learning-roadmap-for-search-in-sharepoint-2010-including-fast-search-for-sharepoint-part-2-planning-scale-installation-and-deployment-and-crawling/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4a7e4080ca4973b3374e7fd04c515fec?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">leonardocsouza</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>

		<media:content url="http://searchunleashed.files.wordpress.com/2011/06/wlemoticon-smile.png" medium="image">
			<media:title type="html">Smile</media:title>
		</media:content>
	</item>
	</channel>
</rss>
