Analysis of love Shanghai how spiders crawl the site and improve the capture frequency

the Internet due to huge amount of information, in this case is not a policy which content is to give priority to grasp, and this is the time to build a variety of preferential grab strategy, the main methods are: depth first, breadth first, PR chain priority, priority, in my contact in a long time, PR the priority is often encountered.

Shanghai dragon buddy is a penchant for search engine spiders love Shanghai and Shanghai is love ah, because the current domestic PC and mobile end search engine, Shanghai dragon buddy is of course that love Shanghai more spiders can crawl the site, only to grab the page more, possible included, rankings and flow better. The love of spiders in Shanghai: Baiduspider, 1818

three, how to improve the love of Shanghai.

5, cheat on information capture

4, can’t grab data acquisition

When Two,

love Shanghai spiders in the grab of information on the Internet to get more and more accurate information, will make a rule to maximize the use of bandwidth and resources to obtain information, also can only minimize the pressure to crawl the site.

protocols involved in Shanghai spider crawling process

1, on the site to grab a friendly


3, robots protocol: this document is the first file love Shanghai spiders visit, it will tell the spider love Shanghai, which pages can crawl, which can not crawl.

may lead to various problems like Shanghai spiders can’t grab information on the Internet, in this case the love Shanghai opened a manual submission of data.


2, URL redirect


described above is some love Shanghai crawl strategy design, inside more strategies we can make nothing of it.

2, HTTPS protocol: at present, Shanghai has achieved HTTPS love the whole network, this protocol is more secure.

grab the page often low quality page, link problems, love Shanghai introduced green, pomegranate filtering algorithm, it is said that the internal and some other methods to detect, these methods did not reveal.


3, love Shanghai spiders crawl the rational use of

Internet data is very large, involving many links, but in the process may be due to various reasons to redirect page links, to identify the courtship of spiders in Shanghai in the process of URL redirection.

, a Shanghai love spiders crawl rules

here to share with you about the love of spiders in Shanghai is how to develop from the original strategy to grab.

1, HTTP protocol: Hypertext Transfer Protocol

