Skip Ribbon Commands
Skip to main content

Quick Launch

Todd Klindt's home page > Todd Klindt's Office 365 Admin Blog > Posts > Indexing files larger than 16 MB in SharePoint 2010
July 20
Indexing files larger than 16 MB in SharePoint 2010

As with previous versions of SharePoint, SharePoint 2010 will not index the contents of files larger than 16 MB. There are a couple of reasons for this such as network usage pulling large files across and the time it takes to break them apart. While the file itself isn't indexed, the metadata is. So you'll be able to find the location of a 17 MB or larger file by searching for its name, or its author, you won't be able to find it by searching for words that exist in it.

With previous versions of SharePoint, the fix for this was to add a Registry key called "MaxDownloadSize" and put a number between 17 and 64 in it. That tells the search engine to ignore the 16 MB limit, and go ahead and index files all the way up to 64 MB in size. However, in SharePoint 2010 this has changed a bit. The indexer still doesn't download files larger than 16 MB, so that's the same. The way to fix it though is different now. Thanks to the invention of PowerShell we can do that instead of getting our hands dirty in the Registry.

Here's the PowerShell code:

$s = Get-SPEnterpriseSearchServiceApplication

$s.GetProperty("MaxDownloadSize")

$s.SetProperty("MaxDownloadSize",25)

$s.Update()

Restart-Service osearch14

 

This is what it looks like in practice:

We can see here the default value is still 16 MB, but that is easily changed to something like 25 MB. We also need to bounce the search service for this to take effect. Then after your next full crawl the data in files larger than 16 MB will be indexed.

How do you know if you have documents larger than 16 MB? Unfortunately that seems to have changed for the worse in SharePoint 2010. In SharePoint 2007 if the indexer came across a file larger than 16 MB it would throw a warning in the crawl log. SharePoint 2010 doesn't do this. I haven't found a way to determine which files are skipped because that are larger than the current MaxDownloadSize setting. If anyone knows how to determine this, let me know.

tk

Comments

Perfect timing!

Hot topic for me, thanks Todd. 
 on 7/21/2010 9:57 AM

The warning is still logged

Hi Todd,

I was following up on this, and found that (at least in the environment I was testing) the warning does still get recorded with the message "The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled."

 - Woody -
 on 7/21/2010 10:16 AM

Re: The warning is still logged Edit

Hey Woody,
Is it logged in the crawl log? I swear it wasn't there when I checked.

tk
Todd O. KlindtNo presence information on 7/22/2010 10:11 PM

Max size limit in SP 2010

Hi Tk,
As you said you will extend the size limit in 2007 till 64 MB. What about SP 2010? How much SP 2010 can extend the size limit?

Thanks
Ani
 on 8/8/2010 7:44 PM

Time-outs?

First off, thanks Todd!

I was looking for this info but was only finding the solution for SP2007. There they also talk about time-outs occurring while crawling because of the increased maximum size of files. I currently have a setup with a limited number of pdf-files that's working great. I am almost ready to put this into production and enlarge the number of files from 100 now to a few thousand with a filesize ranging from 5MB to 50MB.

Did anyone working with larger content sources experience time-outs while crawling in SP2010?
 on 11/25/2010 10:11 AM

Time outs, etc.

Thanks Todd for this info, great to find the 2010 update to the 2007 registry hack.

We are running some tests with all kinds of large files to see how SharePoint (and FAST) behaves in these scenarios.

I'll come back in a while and update any findings we have ref: max limits (256 we think) plus time outs etc.

Todd - the errors are recorded for sure in the 2010 crawl logs but there seems to be different behaviour between a full crawl and an incremental crawl.  Again, I'll update here if we find anything worth reporting.

Best

Seb
twitter.com/sebmatthews
 on 2/5/2011 11:44 AM

Disc Spaces Requirements for 64 MB increase

Todd, We have heard of surprising high disc space requirements for SharePoint indexing compared to other document management products.  Example 4 TB = 3.3 TB of disc space for index and query and crawl and search admin databases.  Will increasing this value to 64 MB have noticeable impact the disc space requirement’s for SharePoint indexing and have you heard of a way to increase value greater than 64 MB.  Our organization has many power points presentations greater than 64 MB.
c3.jones@ngc.com
 on 10/11/2011 2:41 PM

Re: Disc Spaces Requirements for 64 MB increase

I think those numbers are very, very high. Usually the space needed for the index files and databases is 10% or less of the size of your content. I can't imagine 4 TB needing 3.3 TB for search.

If you bump the filesize to 64 MB you'll be indexing more files, so it will take more space. But think about those 63 MB PowerPoints. They have very few words in them. So while they add 63 MB to the size of your content, they don't take anywhere near 63 MB of space in search.

My advice is to do this in a test environment and bring over a content database or two from production and just see how much space it really takes. I think you'll find it's way less then you were lead to believe.

tk
Todd O. KlindtNo presence information on 10/12/2011 1:53 PM

Index is too large

My problem is that I only have 100GB allocated to the index. I can crawl all 7TB of my data and and the total index is about 94 to 96GB. The search service works perfectly untill the next crawl. Beacuse Incremental crawls are additive and full crawls consume the needed space before eliminating the old full crawl the search service will fail when the drive runs out of space.

Is there a way to trim the crawl by indexing less of each document. Any suggestions would help.

Thanks,
Greg
 on 1/16/2012 11:33 AM

Re: Index is too large

Greg,
I think you might be out of luck. You might be able to reset the Index and do a new crawl. I think the real solution is to get more drive space for your index. You could also split the Index Files across two index servers. That will buy you some time.

tk
Todd O. KlindtNo presence information on 1/18/2012 9:49 PM
1 - 10Next

Add Comment

Items on this list require content approval. Your submission will not appear in public views until approved by someone with proper rights. More information on content approval.

Title


Body *


Today's date *

Select a date from the calendar.
Please enter today's date so I know you are a real person

Twitter


Want a message when I reply to your comment? Put your Twitter handle here.

Attachments

 

 SysKit