![]() |
|
01.15.09 Breaking Down The SEO Of Page Segmentation By David Harry One area that is worth looking at for SEO in 2009 (for me at least) is page segmentation. Now this approach really isn't new and I came across papers as far back as 1997 and beyond. But unsurprisingly most IR methods don't just appear over night. The big three, (of search) have each had various research papers and patents dating back as far as 2003-4. It just seems to have some traction and is sensible as well. Essentially page segmentation is when a search engine looks to break a given web page down into its component parts. They could analyze a web page and assign various relevance or importance scoring for different regions of a page. Some of the methods include fixed-length page segmentation (FixedPS), DOM, (DomPS) and location based and white spaces (vision based or VIPS) and even a combined approach (CombPS). As with many IR methodologies they try to improve the signal to noise ratio. In this case by hopefully identify the noisy segments; resources can be focused on the relevant areas of a web page. Furthermore most people do tend to understand web pages in a segmented or structured view. When you arrived at this page did you instinctively know where to find the main content? Aware of common locations for navigation and other elements? Banner blindness? You get the idea.Advantages of page segmentation The main advantages are increased relevance and streamlining processing elements. Search engines hope to use page segmentation to be able to asses a more finite understanding of a given pages relevancy, but also (theoretically) be capable of dealing with multi-topic pages, semantically related or not.
The second advantage, processing and resource management, can be achieved as they could define site templates in an attempt to only crawl/index the relevant parts of the page and not the boilerplate elements. Now, while there are a few ways of going about it, what's important here is that such systems are sensible not only from a relevancy perspective, but could also help crawling and indexing resource management. One has to imagine new ideas at the big three will be tempered in a volatile economy. Once a template has been established, indexing a site on a regular basis could be far easier on a search engine (and site owner as well). Just have a little 'template bot' crawl a few pages now and again to ensure the profile is unchanged.. but I'm rambling now... Another implementation (as noted by the Google patent) could be pages that have a number of listings that are geographic in nature. As search for 'stone oven pizza, Toronto' could produce better results as larger listings of pizza shops in Toronto could be segmented and digested by more finite parameters than normal. "The text associated with the smallest hierarchical level surrounding a business listing may be associated with that business listing" - Patent ; Document segmentation based on visual gaps Segmenting the page The nuts and bolts I shant trouble you with (links later as always) but it varies from code analysis (DOM) approaches to vision based. The main idea is establishing common (boilerplate) segments of a web page... And from there the systems can be set to even more granular levels to find an optimal rate (playing with the dials). Continue reading this article. About the Author: David Harry is the President of Reliable SEO and has been building and marketing websites since 1998. He can be found writing about search and internet marketing on the Fire Horse Trail and is the author of the SEO Handbook series. http://www.reliable-seo.com http://www.huomah.com http://www.the-seo-handbook |
|
| ||||||||||||||||
|
-- SEOArticles is an iEntry, Inc. publication -- iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 2009 iEntry, Inc. All Rights Reserved | Privacy Policy | Legal archives | advertising info | news headlines | free newsletters | comments/feedback | submit article |