-
Preparing PDF Documents for search
When deploying search technologies to client websites, I frequently get feedback from clients who aren’t entirely happy with the way their search results are returned or thier ranking. Nine times out of ten, this is due entirely to inappropriate document properties - in particular, the internal title of the document. Search technologies e.g. Google, Microsoft Index Server and Verity, usually give highest ranking to search terms found in the title of the document. They also typically use the value of this title when displaying search results. This is easy enough to set in HTML documents but what about Word an PDF documents?
This is a typical instance of the kind of problem I see all the time - the title actually shows the original filename of the Quark document which was used to create the PDF document and as you can see these are rarely indicative of what content will be found and these obscure titles would have not helped in establishing the relevance of these documents.
The problem however is that Acrobat does not allow us to set a new title here (Acrobat is essentially a file viewing tool not a file editing tool) and to do this you would typically use a PDF editing tool. As we are only interested in editing the title in this instance, there are fortunately a number of free utilities available in the public domain which will allow us to do this e.g. Bureausoft produce an utility called PDF Info which you can download to help in this task.
Treating your PDF documents in this way will make it easy for people to find the most relevant documents on your site from both your internal search engine and external search engines





