Understanding Crawled Properties, Managed Properties and Full-text Index – Part 1

A while ago I received an email from a previous student asking my thoughts about an issue in his environment. He was crawling some content that had a metadata he wanted to use as the title for the items indexed, but even after doing the proper mapping to the Title managed property he was still not able to get his custom title to be used and appear in the results. “What could be going on?”, he wondered.

My first thought after reading his email was “I think I know what’s the issue”, followed closely by “Maybe I should look for another profession, since I was supposed to have taught this clearly during class”. And then I proceeded to explain to him what I will explain to you in this series of posts about Crawled Properties, Managed Properties, Full-text index, and how they all work together to make your search work.

To get to the bottom of this issue, we need to understand three concepts perfectly clear (ok, to tell you the truth you need to understand only two concepts for this particular issue, but the third one is also very important, so please play along Smile):

  • Crawled Properties (this post – Part 1)
  • Managed Properties (future post – Part 2)
  • Full-text Index (future post – Part 3)

Crawled Properties

As the official documentation states: “Crawled properties are automatically extracted from crawled content and grouped by category based on the protocol handler or IFilter used.”

What this means is that crawled properties are metadata associated with your content (such as title and url), which can be found in two ways: during crawling of the content and during content processing.

Crawled properties found during crawling

When you crawl content from a SharePoint Document Library, each column in the library is metadata associated to the corresponding document, therefore they end up being exposed as crawled properties. For example, in the document library below, each column will be exposed as a crawled property, including my custom column Department:

Document Library with custom column

After my first crawl of this content, I can quickly check that my custom column was exposed as a crawled property either through PowerShell or through Central Administration. To do it through PowerShell, I can run this simple cmdlet:

Get-FASTSearchMetadataCrawledProperty –Name ows_Department

image

In my case, it was easy to locate my property, since I already knew that all custom columns in a SharePoint document library are exposed as crawled properties with the prefix “ows_”, and they are also assigned to the category SharePoint. Now, what if I did not know this, or if this metadata was coming from somewhere else?

Then I could simply use the name of the column to do a wildcard search for it:

Get-FASTSearchMetadataCrawledProperty –Name *department

Crawled properties found during content processing

The second way crawled properties can be found is during content processing. One example are the metadata properties you can define for Office documents, such as the ones shown below for a Microsoft Word document:

Word document properties

Each one of the properties shown above is also extracted during content processing and exposed as a crawled property. Take the property Comments shown above, for example. Any information stored in this property will be extracted and exposed through the crawled property Office:6(Text).

Now you may be asking yourself: how am I supposed to know that a crawled property named Office:6(Text) will contain the values of the metadata property Comments from an Office document?

The only answer I have for that is this brilliant series of posts from Anne Stenberg where she reverse-engineer the out-of-the-box crawled properties and their mappings to figure out what metadata they represent. In my case, all I had to do was check this post here from her series that talks specifically about the crawled properties for Office documents.

Note: even though Anne’s articles mentioned above are all about SharePoint 2007 (MOSS 2007), so far they have been spot on every time I needed to find a specific crawled property in FAST Search for SharePoint.

Closing thoughts

Ok, so now we know that crawled properties are metadata found either through crawling or through content processing, and we also know how to identify specific crawled properties associated with our content. The real deal though will come from understanding how we can use these crawled properties during search, such as you can see bellow, where I’m searching for the contents I entered into the Comments property of my Word document:

Search for Comments

And this will be the subject of the next post in this series: Managed Properties. See you then! Smile

About these ads

About leonardocsouza

Mix together a passion for social media, search, recommendations, books, writing, movies, education, knowledge sharing plus a few other things and you get me as result :)
This entry was posted in FS4SP and tagged . Bookmark the permalink.

20 Responses to Understanding Crawled Properties, Managed Properties and Full-text Index – Part 1

  1. Christine says:

    Great post! Hoping that part 2 comes out soon.

  2. drakeja1 says:

    Good information… looking forward to the next part of the post!

  3. Ben Lloyd says:

    Great post,

    I have a problem that is driving me crazy and i wonder if you can help?

    My problem is that any items that I tag are not having values created in the SSA->Metadata properties section. (the ows_tax_…values)

    I can tag them, confirm that the hidden /Lists/TaxonomyHiddenList is being updated, but after a full crawl the mapped and crawled properties are not being autocreated. (even microsoft are having trouble helping me)

    All these actions are done via the UI as I am not a coder!

    On a different subject i am able to create a “normal” metadata property, mapp it and then get it to display in search refinement as you have done above.

    • Hi Ben!

      Are you referring to the tags that you can attribute to an item under the “Tags & Note” board? If that’s the case, these tags are searchable only through SharePoint Search (FAST Search for SharePoint has a different architecture and does not crawl those tags).

      Now if you are having trouble to locate the regular Crawled Properties for FAST Search for SharePoint, just note that they will be located under FAST Query SSA -> FAST Search Administration -> Crawled property categories (notice how this is different from SP Search, where you can find the crawled properties through SSA -> Metadata properties).

      And if I completely missed the point in your questions, just let me know and I can try again :)

      Thanks,
      Leo

  4. manyag says:

    Thanks for nice post. I also tried to put a explanation for Crawled and Managed properties @ http://manish-sharepoint.blogspot.co.uk/2012/06/crawled-properties-and-managed.html

  5. Brian Tsai says:

    Hi
    Refer to your sample, is that possible to omit prefix “doccomments” in the filter textbox.
    What I expect is to input the keyword directly. Just input “I wonder if I comment” to be the criteria.

    Thanks
    Brian

    • Yes, Brian. That is possible indeed. In this case you would need to add this content to be not only part of a managed property, but also part of the FullTextIndex. It’s one of the things I wanted to explore in parts 2 and 3, but never had the time :-/

      If you are still looking for this info, please let me know and I can try to get a screenshot of how to configure this from some publicly available FS4SP installation (since I don’t even have a test installation available anymore).

      Best,
      Leo

  6. Ben says:

    Thanks! Would you kindly supply links for the 2nd and 3rd parts?

    • I would love to, Ben! The problem is that those two other parts didn’t get written while I was still working with FS4SP. Now the likelihood of them happening is even lower. But if you have any specific questions (or even broader questions), please feel free to post them here and I will be glad to help (or maybe summarize in a post). I *may* be able to still remember some of the FS4SP stuff :)

      Best,
      Leo

  7. Ben says:

    Thanks Leo. My immediate problem is FullTextSqlQuery returns results only for Administrator users not *ordinary* users as explained here: http://social.technet.microsoft.com/Forums/en-US/sharepoint2010programming/thread/2d2c02c2-44bc-4c0f-8309-02c5b7e6c893

    • Thank you for the additional info, Ben! Unfortunately I will not be able to help you with this, as I haven’t played much (or better saying “at all”) with the FullTextSqlQuery function in SharePoint Search. I hope someone pick up your question on TechNet though!

      Good luck,
      Leo

  8. craigsizer says:

    Leo, I know you have not had time to write the next 2 parts of this article, but can you at least tell us the answer to the orginal problem i.e. “even after doing the proper mapping to the Title managed property he was still not able to get his custom title to be used and appear in the results”. Whats the trick here? I have the same issue?
    Thanks

    • I’m so sorry about the delay in replying to you with the answer! (and even more ashamed now that you made me realize I put out a question but never answered :-/ )

      The “problem” here is that if you go check the settings for the Managed Property Title you will notice that under the “Mappings to crawled properties” section this property is configured as “Include values from a single crawled property based on the order specified”. As documented here in this MSDN article about Managed Properties, whenever this option is selected, FS4SP will try to fill the managed property using the value from only one of the crawled properties listed in there. This means that if you just add your own crawled property to the list, by default it will go to the bottom of the list, and therefore it will only be picked up if none of the other crawled properties above it on the list have any value.

      To make sure your crawled property is picked up to assign a value for the managed property Title, you will want to select it and move it up on this list under the section “Mappings to crawled properties”. If you put it all the way to the top, for example, your crawled property will be used for the Title whenever it has some value.

      I hope that answers your question!

      Best,
      Leo

  9. Pingback: Building a custom pipeline extensibility for FastSearch for SharePoint 2010 | Big Data Unleashed!

  10. Pingback: All Things of World

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s