Dynamics Search Engine
The amount of time it takes to index your documents depends on several factors. Here are some suggestions for speeding up the index process.
Use command-line indexing
The command-line indexing process is about 20 times faster than the web-based indexing process. It is also much more stable, since it sidesteps the CGI resource limits imposed on the web-based indexing process.
See How to use FDSE from the command line and How to automatically rebuild the index for examples.
The command-line interface is only available to those who have telnet or console access to their web server.
Note that this only refers to the preferred browser configuration for the administrator who is rebuilding the index. Actual visitors to your search pages can use any browser. And any browser will still work for rebuilding the index, just not as well.
Use "file system discovery" rather than "web crawler discovery"
When indexing web sites which are located on the same physical web server as the FDSE script, you can use "file system discovery" instead of "web crawler discovery". Accessing files directly via the file system is faster and more efficient than going over the network.
See Admin Page => Manage Realms => Create New Realm to review all the different options available for each realm, including the discovery method.
The General Setting "Crawler: Max Pages Per Batch" controls the maximum number of documents that will be processed before the live index file is updated. Updating the live index file is a time-consuming process due to the search-and-replace nature of the update, and also because the indexing process needs to wait for all search processes to finish reading before it is allowed to update. Thus, maximizing the documents per batch increases the efficiency of the overall process.
The General Setting "Timeout" is similar to "Max Pages Per Batch" but limits the clock time of the batch, rather than the document count. Experiment with setting each value very high.
Note that, in between writing to the live index file, the indexed documents are stored in memory. If the pages per batch is very high - like more than a few hundred - then server memory may be used up and the process will fail or page to disk, which will be very slow. Thus, the total documents per batch should not be increased without limit.
Optimizing Pages Per Batch - File System Discovery Realms:
File System Discovery realms always write directly to a temp file while rebuilding, rather than to the live index file, and so they do not share the slow update problems found in crawler realms. Also, because they write directly, the memory consumption is much lower. The General Setting "Timeout" is used to throttle the indexing across multiple CGI processes to prevent a web server time-out. Setting the time-out to a high value will save some time, since there is an automatic sleep of 15 seconds between each process.
Use "Revisit Old" instead of "Rebuild"
The "Rebuild" command will re-index every document in the index.
The "Revisit Old" command is more selective.
For realms that use the file system, the "Revisit Old" command will only re-index documents that have been updated since the index was last built. Also, any new documents will be indexed, and document which no longer exist will be removed from the index.
For realms that use the crawler ("website - web crawler", "open", and "file-fed" realms), the "Revisit Old" command will only re-index web pages that have not been visited in the last 30 days *. That command will also index any web page that is in the queue waiting to be indexed but has never been indexed before. So, if you are building a website realm for the first time and then the indexing process fails, you should continue using the "Revisit Old" command. That way you will still get all the new links as they are discovered but you won't overwrite the work you've accomplished so far.
* The actual number of days used to decide whether a document is "old" is governed by the General Setting "Crawler: Days Til Refresh" which has a default value of 30. When recovering from a failing rebuild, it might be helpful to set this value to 1.
Selecting a Realm Type:
Indexing with the File System Crawler is very fast compared to the Web Crawler. Use "Website Realms - File System Crawler" to maximize the speed of indexing local sites. The File System Crawler can also detect which files have changed when rebuilding the index, allowing it to index only updated files (this algorithm is used with the "Revisit Old" command). The Web Crawler must always re-index every file in the realm.
Reduce the total number of documents indexed:
Any document that is not likely to be useful to visitors should be removed from the index. From the Admin Page, choose "Review" to see which documents are currently indexed. Documents which are not useful can be permanently removed by choosing "Delete" and then selecting option 2, "add to forbidden sites list".
The "Forbid Pages" filter rule and the robots.txt file are more efficient at blocking pages than the Robots META tag, since parsing the META tag requires the crawler to first access and parse the document.
Optimizing Document Size:
The General Setting "Max Characters: File" allows you to determine the maximum number of bytes read from any document. Keeping this setting at a low value, like 64000 or 32000, will save time during the index process at the expense of some accuracy in searches.
Optimizing Realm Architecture:
Realms are used to group web pages together for indexing purposes. When possible, it is best to group pages based on the necessary frequency of re-indexing. For example, if you have a single web site with 10,000 documents, of which 1,000 change daily, you could create two realms covering each group. Then create a task for daily re-indexing only the smaller realm.
Trade-Offs Between Index and Search:
In many cases, investing extra time while indexing can save time while searching. In every case where this trade-off is available, FDSE has taken it, since indexing is only done once every day or so, but searches are performed thousands of times each day. For example, having a large set of "Ignore Words" slows down the index process because it has to parse each word from each document. However, the result is a much smaller, more quickly searched index file, and so all resulting searches are faster, and the overall CPU utilization of the web server will be minimized.
The following features will use more resources during indexing to save time during searching:
Google Indexing Problems :
This article contains useful tips, help and advice to solve common Google indexing and crawling problems including partial website indexing. We offer a separate page on solving Google Supplemental Results.
Firstly, it's common for a low ranking website to be included in the Google index without actually being visible in the search engine results (SERPS).
You can check whether your website is in fact indexed and cached by Google, using a simple query command. To accomplish this, type site:www.mydomain.com in a Google search window, replacing mydomain with your registered domain name.
If Google returns a message stating: "sorry no information is available for the URL www.mydomain.com" then none of your website pages are Google indexed and you may have a Google indexing or crawling problem.
Sometimes Google indexing problems can be the result of little more than Robots.txt file errors. This is a text file which sits in the root directory of your web server informing search engine robots what they should exclude when indexing a website. A Robots.txt error can sometimes prevent Googlebot (Google's search spider) from crawling your website altogether. For help with creating or formatting a Robots.txt file click here.
The Google site:www.mydomain.com query returns a list of all web pages in your domain which are indexed and cached in the Google index. If no webpages are indexed, this is often due the web domain being new or recently launched with not enough quality backlinks to make it into the index. To solve a Google indexing problem (including partial indexing) first check for website navigation problems which will prevent Google crawling your website.
If no website navigation problem problem is found, we recommend getting more quality links to your website from other WWW websites and that you consider submitting a Google Sitemap to inform Google about your website hierarchy and how often your content is updated. This will help influence how often Googlebot crawls your website with a view to helping Google keep a fresh cache of your recently updated page URL's.
Even a few additional links from other websites pointed to your domain can help increase the crawl rate and frequency of Googlebot visits, ensuring that your website is deep crawled more regularly.
It's worth mentioning that Google operates a smart-crawling system so it will notice when extensive page updates are made to a site as it interrogates and utilises the web server responses. Matt Cutts did an interesting video on how Google crawls sites and we'd recommend taking 5 minutes to watch it: Matt Cutts Googlebot crawl method video.
For more help and advice on acquiring additional inbound links for your website, read our informative link building and Google SEO strategies articles.
If, on the other hand, Googlebot is visiting your site too often then the crawl rate can be manually reduced from the Google Webmster Console (Webmaster Tools). Unfortunately, Webmaster Tools does not allow upward adjustment where a site gets infrequently crawled - you really need to get more inbound links (backlinks) and to update your website content frequently to encourage that.
More backlinks will significantly help to get more of your page URL's Google cached and fully indexed.
Google Big Daddy Update
Following the "Big Daddy" Google infrastructure update in Spring 2006, the crawling rate of websites is now heavily influenced by the number and quality of backlinks the site has acquired. For this reason, it is not unusual for a website with few inbound links to receive one or less Googlebot deep crawls a month.
Since Google's Big Daddy update, many websites have developed website indexing and Googlebot crawling rate problems. After Big Daddy, Google is also indexing fewer web pages, particularly on recently launched website domains and low quality sites. This has affected the backlink count for many sites, which previously relied on low quality directory links amongst other sources.
Partial Google indexing is now common for websites with few inbound links. This frequently results in only the top level domain (homepage) being Google crawled and included in the Google index, with other internal pages being partially indexed or not indexed at all.
The well documented Big Daddy update problems have now been resolved, but some new less new domains and older, trusted sites are still left with significant numbers of partially indexed pages in the Google index. These pages would have shown up as Google Supplemental Results until the labelling of such pages was removed in the summer of 2007.
For more expert help in solving these website indexing issues read our Supplemental Results page or contact us for expert advice.
Meta tags: title, keywords, description
The meta tags are a very important part of the HTML code of your web page. They are read by the search engines but are not displayed as a part of your web page design. Usually they include a concise summary of the web page content and you should include your relevant keywords in them. Most meta tags are included within the 'header' code of a website. The most important tags are the title, description, keyword s and robot tags.
How to optimize meta tags?
The title tag and the meta description and keywords tags should include keywords relevant to the content of the web page they describe. Besides that, you should consider the length and the order of the characters/words included in each of the meta tags. Note that the search engine robots read from left to right and those words that come first are more important than those that come towards the end of the page.
Enter your URL here to see a full meta tag analysis:
With the help of the SiteGround special meta tag analyzing tool you can see how well you have optimized you meta tags. To learn how you can improve them read below!
It could be said that the title is one of the most important factors for a successful search engine optimization of your website. Located within the <head> section, right above the Description and Keywords tag, it provides summarized information about your website. Besides that, the title is what appears on search engines result page (SERP).
The title tags should be between 10-60 characters. This is not a law, but a relative guideline - a few more symbols is not a problem. You won't get penalized for having longer title tags, but the search engine will simply ignore the longer part.
Meta Description tag
The description tag should be written in such way that it will show what information your website contains or what your website is about. Write short and clear sentences that will not confuse your visitors.
The description tag should be less than 200 characters. The meta description tag also has a great importance for the SEO optimization of your page. It is most important for the prospect visitor when looking at the search engine result page - this tag is often displayed there and helps you to distinguish your site from the others in the list.
Meta Keywords tag
Lately, the meta keyword tag has become the least important tag for the search engines and especially Google. However, it is an easy way to reinforce once again your most important keywords. We recommend its usage as we believe that it may help the SEO process, especially if you follow the rules mentioned below.
The keyword tags should contain between 4 and 10 keywords. They should be listed with commas and should correspond to the major search phrases you are targeting. Every word in this tag should appear somewhere in the body, or you might get penalized for irrelevance. No single word should appear more than twice, or it may be considered spam.
Meta Robots tag
This tag helps you to specify the way your website will be crawled by the search engine. There are 4 types of Meta Robots Tag:
Index, Follow - The search engine robots will start crawling your website from the main/index page and then will continue to the rest of the pages.
Index, NoFollow - The search engine robots will start crawling your website from the main/index page and then will NOT continue to the rest of the pages.
NoIndex, Follow - The search engine robots will skip the main/index page, but will crawl the rest of the pages.
NoIndex, NoFollow - None of your pages will be crawled by the robot and your website will not be indexes by the search engines.
If you want to be sure that all robots will crawl you website we advise you to add a "Index, follow" meta robot tag. Please note that most of the search engine crawlers will index your page staring from the index page, continuing to the rest of the pages, even if you do not have a robot tag. So if you wish your page not to be crawled or to be crawled differently use the appropriate robot tag.
You can edit your meta tags through the File Manager in the cPanel of your hosting account. You need to edit the file of each web page. The file contains the HTML code of the pag
More information to WEBMASTERS for GOOGLE applications.
Secure Online SHOPPING with you Credit or Debit Card