love Shanghai webmasters college a few days ago published a case about some of the details, particularly worthy of attention webmaster.
website crawlers do optimization, then the length of the page to within 128K, not too long. Otherwise, the crawler content, page content has long truncated, grabbing part does not recognize the main content, resulting in page was identified as empty short and not included.
Shanghai: "love will affect the length of more than 128K even not included
Original address: 贵族宝贝lusongsong贵族宝贝/blog/post/8966.html
love Shanghai engineers suggested:
page with a crawl optimization of the pictures directly binary content into HTML causes the page length is too long, the size of 164K, lead content is not included in Shanghai love.
source: Lou loose blog, welcome to share
does not recommend the site using the JS generation of the main content, such as JS rendering error, is likely to lead to the page content read error page cannot capture
implies that it might be love Shanghai technical defect, if the web crawler can not crawl over 128K, not included. If you try to delete the content of the website is too long, a part of the information is not too important, to ensure that the content included.
3, when the crawler crawl do optimization, please put in front of the theme, avoid crawling cut off the content capture all
It is this
2, such as the site for crawler crawling do optimization, recommended page length within 128K, not too long