The Boundaries of Search Engine Technology:
"No smart engineer would ever build a search engine that requires websites to follow certain rules or principles in order to be ranked or indexed. Anyone with half a brain would want a system that can crawl through any architecture, parse any amount of complex or imperfect code, and still find a way to return the most relevant results, not the ones that have been 'optimized' by unlicensed search marketing experts."
The leading search engines many operate on the similar principles. Automated search bots crawl the web, follow links, and index content in huge databases. These execute this with stunning artificial intelligence, however current search technology is not really all-powerful. You can find numerous technical limitations which cause considerable problems in both inclusion and rankings.
Problems Crawling and Indexing:
- Online forms: Search engines tend to be not good at completing online forms (such as a login), and hence any content contained behind them may remain hidden.
- Duplicate pages: Websites making use of a CMS (Content Management System) usually create duplicate versions of the same page; this is a major problem for search engines seeking for completely original content.
- Blocked in the code: Errors in a website's crawling directives (robots.txt) may possibly lead to embarrassing search engines entirely.
- Poor link structures: If a website's link structure is not easy to understand to the search engines, they could possibly not reach all of a website's content; or, if it is crawled, the minimally-exposed content may be considered unimportant by the engine's index.
- Non-text Content: Despite the fact that the engines are getting better at reading non-HTML text, content in rich media format is continue to difficult for search engines to parse. This involves text in Flash files, images, photos, video, audio, and plug-in content.