Understanding Crawled Properties, Managed Properties and Full-text Index – Part 1

A while ago I received an email from a previous student asking my thoughts about an issue in his environment. He was crawling some content that had a metadata he wanted to use as the title for the items indexed, but even after doing the proper mapping to the Title managed property he was still not able to get his custom title to be used and appear in the results. “What could be going on?”, he wondered.

My first thought after reading his email was “I think I know what’s the issue”, followed closely by “Maybe I should look for another profession, since I was supposed to have taught this clearly during class”. And then I proceeded to explain to him what I will explain to you in this series of posts about Crawled Properties, Managed Properties, Full-text index, and how they all work together to make your search work.

To get to the bottom of this issue, we need to understand three concepts perfectly clear (ok, to tell you the truth you need to understand only two concepts for this particular issue, but the third one is also very important, so please play along Smile):

  • Crawled Properties (this post – Part 1)
  • Managed Properties (future post – Part 2)
  • Full-text Index (future post – Part 3)

Crawled Properties

As the official documentation states: “Crawled properties are automatically extracted from crawled content and grouped by category based on the protocol handler or IFilter used.”

What this means is that crawled properties are metadata associated with your content (such as title and url), which can be found in two ways: during crawling of the content and during content processing.

Crawled properties found during crawling

When you crawl content from a SharePoint Document Library, each column in the library is metadata associated to the corresponding document, therefore they end up being exposed as crawled properties. For example, in the document library below, each column will be exposed as a crawled property, including my custom column Department:

Document Library with custom column

After my first crawl of this content, I can quickly check that my custom column was exposed as a crawled property either through PowerShell or through Central Administration. To do it through PowerShell, I can run this simple cmdlet:

Get-FASTSearchMetadataCrawledProperty –Name ows_Department

image

In my case, it was easy to locate my property, since I already knew that all custom columns in a SharePoint document library are exposed as crawled properties with the prefix “ows_”, and they are also assigned to the category SharePoint. Now, what if I did not know this, or if this metadata was coming from somewhere else?

Then I could simply use the name of the column to do a wildcard search for it:

Get-FASTSearchMetadataCrawledProperty –Name *department

Crawled properties found during content processing

The second way crawled properties can be found is during content processing. One example are the metadata properties you can define for Office documents, such as the ones shown below for a Microsoft Word document:

Word document properties

Each one of the properties shown above is also extracted during content processing and exposed as a crawled property. Take the property Comments shown above, for example. Any information stored in this property will be extracted and exposed through the crawled property Office:6(Text).

Now you may be asking yourself: how am I supposed to know that a crawled property named Office:6(Text) will contain the values of the metadata property Comments from an Office document?

The only answer I have for that is this brilliant series of posts from Anne Stenberg where she reverse-engineer the out-of-the-box crawled properties and their mappings to figure out what metadata they represent. In my case, all I had to do was check this post here from her series that talks specifically about the crawled properties for Office documents.

Note: even though Anne’s articles mentioned above are all about SharePoint 2007 (MOSS 2007), so far they have been spot on every time I needed to find a specific crawled property in FAST Search for SharePoint.

Closing thoughts

Ok, so now we know that crawled properties are metadata found either through crawling or through content processing, and we also know how to identify specific crawled properties associated with our content. The real deal though will come from understanding how we can use these crawled properties during search, such as you can see bellow, where I’m searching for the contents I entered into the Comments property of my Word document:

Search for Comments

And this will be the subject of the next post in this series: Managed Properties. See you then! Smile

About these ads

About leonardocsouza

Mix together a passion for social media, search, recommendations, books, writing, movies, education, knowledge sharing plus a few other things and you get me as result :)
This entry was posted in FS4SP and tagged . Bookmark the permalink.

29 Responses to Understanding Crawled Properties, Managed Properties and Full-text Index – Part 1

  1. Christine says:

    Great post! Hoping that part 2 comes out soon.

  2. drakeja1 says:

    Good information… looking forward to the next part of the post!

  3. Ben Lloyd says:

    Great post,

    I have a problem that is driving me crazy and i wonder if you can help?

    My problem is that any items that I tag are not having values created in the SSA->Metadata properties section. (the ows_tax_…values)

    I can tag them, confirm that the hidden /Lists/TaxonomyHiddenList is being updated, but after a full crawl the mapped and crawled properties are not being autocreated. (even microsoft are having trouble helping me)

    All these actions are done via the UI as I am not a coder!

    On a different subject i am able to create a “normal” metadata property, mapp it and then get it to display in search refinement as you have done above.

    • Hi Ben!

      Are you referring to the tags that you can attribute to an item under the “Tags & Note” board? If that’s the case, these tags are searchable only through SharePoint Search (FAST Search for SharePoint has a different architecture and does not crawl those tags).

      Now if you are having trouble to locate the regular Crawled Properties for FAST Search for SharePoint, just note that they will be located under FAST Query SSA -> FAST Search Administration -> Crawled property categories (notice how this is different from SP Search, where you can find the crawled properties through SSA -> Metadata properties).

      And if I completely missed the point in your questions, just let me know and I can try again :)

      Thanks,
      Leo

  4. manyag says:

    Thanks for nice post. I also tried to put a explanation for Crawled and Managed properties @ http://manish-sharepoint.blogspot.co.uk/2012/06/crawled-properties-and-managed.html

  5. Brian Tsai says:

    Hi
    Refer to your sample, is that possible to omit prefix “doccomments” in the filter textbox.
    What I expect is to input the keyword directly. Just input “I wonder if I comment” to be the criteria.

    Thanks
    Brian

    • Yes, Brian. That is possible indeed. In this case you would need to add this content to be not only part of a managed property, but also part of the FullTextIndex. It’s one of the things I wanted to explore in parts 2 and 3, but never had the time :-/

      If you are still looking for this info, please let me know and I can try to get a screenshot of how to configure this from some publicly available FS4SP installation (since I don’t even have a test installation available anymore).

      Best,
      Leo

  6. Ben says:

    Thanks! Would you kindly supply links for the 2nd and 3rd parts?

    • I would love to, Ben! The problem is that those two other parts didn’t get written while I was still working with FS4SP. Now the likelihood of them happening is even lower. But if you have any specific questions (or even broader questions), please feel free to post them here and I will be glad to help (or maybe summarize in a post). I *may* be able to still remember some of the FS4SP stuff :)

      Best,
      Leo

  7. Ben says:

    Thanks Leo. My immediate problem is FullTextSqlQuery returns results only for Administrator users not *ordinary* users as explained here: http://social.technet.microsoft.com/Forums/en-US/sharepoint2010programming/thread/2d2c02c2-44bc-4c0f-8309-02c5b7e6c893

    • Thank you for the additional info, Ben! Unfortunately I will not be able to help you with this, as I haven’t played much (or better saying “at all”) with the FullTextSqlQuery function in SharePoint Search. I hope someone pick up your question on TechNet though!

      Good luck,
      Leo

  8. craigsizer says:

    Leo, I know you have not had time to write the next 2 parts of this article, but can you at least tell us the answer to the orginal problem i.e. “even after doing the proper mapping to the Title managed property he was still not able to get his custom title to be used and appear in the results”. Whats the trick here? I have the same issue?
    Thanks

    • I’m so sorry about the delay in replying to you with the answer! (and even more ashamed now that you made me realize I put out a question but never answered :-/ )

      The “problem” here is that if you go check the settings for the Managed Property Title you will notice that under the “Mappings to crawled properties” section this property is configured as “Include values from a single crawled property based on the order specified”. As documented here in this MSDN article about Managed Properties, whenever this option is selected, FS4SP will try to fill the managed property using the value from only one of the crawled properties listed in there. This means that if you just add your own crawled property to the list, by default it will go to the bottom of the list, and therefore it will only be picked up if none of the other crawled properties above it on the list have any value.

      To make sure your crawled property is picked up to assign a value for the managed property Title, you will want to select it and move it up on this list under the section “Mappings to crawled properties”. If you put it all the way to the top, for example, your crawled property will be used for the Title whenever it has some value.

      I hope that answers your question!

      Best,
      Leo

  9. Pingback: Building a custom pipeline extensibility for FastSearch for SharePoint 2010 | Big Data Unleashed!

  10. Pingback: All Things of World

  11. Azhar says:

    Hi this is superb article… thanks Leo….

    But i have a problem in my environment. Full crawl of fast content source is taking too much time nearly 30+ hours but not completing it. And crawled properties what am expecting it to get listed in the mappings window, it is getting listed in “metadata properties” but not in “managed properties” this is where i have to map the metadata with crawled properties…… Any help would be really appreciated ….

    • Thank you for the nice comments, Azhar! :)

      About your questions, the first thing I would inspect is your crawling that never completes. Has it ever completed before? If it did complete in previous times, how long did it take? Did anything change recently?

      To better debug this crawling-that-never-ends, I would check the performance counters related to content feeding from SharePoint to FS4SP. You can find these performance counters in the server hosting the FAST Content SSA under “OSS Search FAST Content Plugin”, and check all of the counters starting with Batches* (specially “Batches Failed” and “Batches Open”).

      Now about the second part of your question, I’m not exactly sure I understood your question correctly, but when you do a first crawl of a content source, it’s metadata will always come up first as “metadata properties” (if this is pure SharePoint Search) or “crawled properties” (if this is FAST Search for SharePoint), and then you need to map the metadata/crawled property to a managed property in order to use it.

      Hope that helps!

      Best,
      Leo

      • Azhar says:

        Hi Leo thanks for the quick response:)

        I want to tell you the back ground of my environment:
        FAST is hosted in one dev server(A) & We are crawling in another dev server(B).

        Coming to crawling – it has never completed :(

        As i think if when crawl completes on FAST content source then i can be able to see the crawled properties in the FAST managed propeties…

        And on my second question:
        There are two searches 1:sharepoint and 2:FAST
        crawling on sharepoint getting completed in 2-3 mins range and i can see all the crawled properties in the metadata properties of sharepoint search.

        And metadata properties of FAST we can see in the FAST query SSA->managed properties. correct ??

        Here am not able to see the any craled properties…

      • Hi Azhar!

        If crawling is never completing, then there is something funky going on with your configuration (network, permissions, etc.). I would inspect those performance counters I mentioned before, to check if there are any batches actually being sent from the FAST Content SSA (which resides in a SharePoint server) to the FAST Search server.

        About the properties, first you need to identify your desired crawled properties even in FS4SP, and then map them to a Managed Property. You can access the FAST-related crawled properties on the FAST Search Administration page, under Property Management, by clicking in Crawled Properties Categories. This article here has a lot more detailed info on it: http://office.microsoft.com/en-us/fast-search-server-help/property-management-HA010382016.aspx

        Hope that helps! If not, just let me know :)

        Best,
        Leo

  12. Gurjit says:

    Dear Leo,
    I have a question to which I have not been able to locate the answer. The Crawler, does it go through views in sharepoint libraries to get to the item level content or does it access the default view?

  13. Israel says:

    Dear Leo

    What about the other two posts?

    The first is very nice and for beginners the two next post will be so usefull

    • Hi Israel!

      Thank you for the nice comment! It’s really a shame I never got a chance to write up the other posts in the series. It turns out I left Microsoft a little after this post, and I haven’t touched FS4SP ever since.

      I wrote this post precisely because I could never find anywhere a simple, straightforward explanation of how this worked.

      If you do have any questions or concepts you would like to understand better though, please feel free to post them here, and I can do my best to try to answer (I may still remember some things from FS4SP after these 2 years!).

      Best,
      Leo

  14. SD says:

    hello

    i was looking forward to part 2 and part 3 of this post. we are having exactly the same issue where the search returns the first few words form a document in the search results instead of the Title of the document (Word/Excel/PowerPoint).
    I have associated a bunch of Crawled properties and to the Title metadata property but still search results bring in wrong information in the title field.
    some blogs pointed out to making a change in the registry to fix this which we did initially in Dev, and reset the index. but the issue wasnt resolved immediately. then i created the association of crawled and metadata “Title” field and ran the full crawl a bunch of time and finally the search returns correct titles for Search results. I am trying the same in QA right now and hopefully can do the same in Prod to resolve the issue. but I was looking at this post to get a better understanding of why this happens and how to avoid it.

    Thank.

    • Hi there!

      I’m happy to hear this post was useful to you! And yes, I wish I had finished this series before I stopped working with SharePoint Search. One of the most fun aspects of working with this technology was precisely in helping spread some deeper knowledge about how things worked under-the-hood of the platform.

      If you have other questions, please feel free to ask here. I haven’t been working with SharePoint/FAST for a while, but I may still have some memory of it left :)

      Best,
      Leo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s