Saturday, February 23, 2013

How Google Closed Source Wanted To Change The Destiny Of Humanity By Changing Your DNA Information During 2003

how people store their identity, how to share dna, how to change your dna

Change In The DNA 3: What Happened To My Site On Google?

As I explained earlier this week, a significant change to Google's ranking algorithm has caused some web sites to lose top positions for some search terms. The outcry from affected site owners has been unprecedented, in my opinion. In this article, I'll take a Q&A-style approach to examine many of the issues and questions that have arisen from the change. Also be sure to see other articles about the recent changes on the Florida Google Dance Resources page.
Please note that the longer version of this article for Search Engine Watch members also cover these questions:
  • Does Google favor some large sites like Amazon because of partnerships it has?
  • Does being in AdSense help sites rank better?
  • Can I sue Google for being dropped?

Q. My page no longer comes up tops at Google for a particular search term. Why not?

Google, like all search engines, uses a system called an algorithm to rank the web pages it knows about. All search engines make periodic changes to their ranking algorithms in an effort to improve the results they show searchers. These changes can cause pages to rise or fall in rank. Small changes may produce little ranking differences, while large changes may have a dramatic impact.

Google made a change to its algorithm at the end of last month. This fact is obvious to any educated search observer, plus Google itself confirms it. The change has caused many people to report that some of their pages fell in ranking. These pages no longer please Google's algorithm as much as in the past.

If your page has suddenly dropped after being top ranked for a relatively long period of time (at least two or three months), then it's likely that your page is one of those no longer pleasing the new Google algorithm. Running what's called the filter test may help confirm this for you, at least in the short term.

Keep in mind that while many pages dropped in rank, many pages also consequently rose. However, those who dropped are more likely to complain about this in public forums that those who've benefited from the move. That's one reason why you may hear that "everyone" has lost ranking. In reality, for any page that's been dropped, another page has gained. In fact, WebmasterWorld is even featuring a thread with some comments from those who feel the change has helped them.

Q. Why does/did running the filter test bring my site back into the top results?

My belief is that Google, for the first time, has been using two significantly different algorithms at the same time. The "new" system has been used for many queries since the change, but some queries were still handled by the "old" system. More and more queries now appear to be processed by the new system, suggesting that the old algorithm is being phased out entirely.

Why would Google run two different systems? My ideas are covered more in the Speculation On Google Changes article for Search Engine Watch members. The short answer is that I think the new system requires much more processing power than the old one. If so, then Google probably applied it initially to "easy" queries, such as those that didn't involve the exclusion or "subtraction" of terms

Why are more and more "hard" queries now going through the new system? It could be that Google was testing out the new system on easier queries and then planned to slowly unleash it on everything.
Alternatively, Google may have intended to run two algorithms all along but is being forced to abandon that plan because of the furor as site owners who've lost rankings use the filter test to see what they consider to be "old" Google.

Since it was discovered, the filter test has been used by hundreds, if not thousands of webmasters. These queries are processor intensive. They also have created an embarrassing situation Google has never faced before, where anyone can compare what looks to be "old" versus "new" results to show how the old results are better. Sometimes the new results might be better, of course -- but it's the mistakes in relevancy that get the most attention. They can be used as proof that new Google is worse than old Google.

As a result, Google may have ultimately decided that it needs to bring all queries into the new system -- if only to plug a "hole" it may have never anticipated opening into how it works internally.

Google won't confirm if it has been using two algorithms simultaneously. I can only tell you I've spoken with them at length about the recent changes, and that they've reviewed the article your reading now.

Whether you choose to believe my speculation or instead the idea that Google has employed some type of "filter" almost makes no difference. The end result is the same. For some queries, there are now dramatic difference from what "old" Google was showing.

Q. Has Google done this to force people to buy ads?

Some feel Google has dropped their sites to make them buy ads. In the short term, purchasing ads will be the only way they can be found. For some, it may even be the only long-term solution. In either case, it means more money for Google.

However, there's also plenty of evidence of people who, despite being advertisers, lost their "free" top rankings. There are also people who've never run ads that continue to rank well. This makes it difficult for anyone to conclusively say that this change was ad driven.

Google completely denies charges it's trying to boost ad sales. The company says the algorithm change was done as part of its continual efforts to improve results. Google has always said that there is no connection between paying for an ad and getting listed in its "free" results.

In my view, there are far easier ways that Google could boost ad revenue uptake without doing sneaky, behind-the-scene actions -- which is why I tend to believe this is not why the change happened.

For instance, Google could make the first five links on a page -- rather than the first two links -- be paid ads for certain queries. They might also make this happen for terms determined to be commercial in orientation and offer up a defense that they've determined the commercial intent of the query is strong enough to justify this.

Q. Is there REALLY no connection with ads and free listings at Google?

In terms of boosting rankings, yes, I believe this doesn't happen at Google. Neither does Andrew Goodman, in his recent article about the Google changes. Other serious observers of search engines I know also doubt this, though certainly not all. Those in the "I believe" camp feel Google would simply risk too much in the long-term for any short-term gains it might get.

In terms of listing support, buying ads may be helpful. Some who spend a lot on paid listings at Google have reported success in getting their ad reps to pass along problems about their entirely separate free listings to Google's engineering department for investigation.

To some degree, this is like a backdoor for fast support. Those who aren't spending with Google's AdWords program have no such speedy solution to getting an answer back. Google has continually rejected suggestions that it should offer a "listing support" or paid inclusion program, saying it fears this might be seen as establishing a link between payment and its free results. For a deeper exploration of this, see my article for Search Engine Watch members from last year, Coping With Listing Problems At Google.

For the record, Google flatly denies that those who are advertising get more access. The company says it takes feedback from many sources, and every report is assessed for how it might have an impact on search quality.

Indeed, it's important to note that Google does provide another backdoor that plenty of non-advertisers have made use of. This is the forum site, where public and private messages to "GoogleGuy," a Google employee monitoring discussions, have been acted upon.

Google also turns out to various search engine conferences, such as the Search Engine Strategies show produced by Search Engine Watch that begins in Chicago on Tuesday. Google provides assistance to those with questions at these type of conferences, as well.

Google also offers a front door in the form of email addresses it publishes. Yes, expect you'll likely get a canned response to many queries. However, people do get some more personal investigation, as well.
It's also crucial to make the HUGE distinction between listing support and rank boosting. Investigating why a page may not be listed at all (rather than ranking well) is an appropriate activity for Google or any search engine. Boosting the rank of a particular page in return for payment, and not disclosing this, is not acceptable.

Q. Does Google have a "dictionary" of "money terms" it uses to decide when to filter out some web sites?

This theory has emerged as people have run the filter test and discovered that for some queries, Google will show many more changes than for others. The Scroogle hit list provides a long look at examples like this. It reflects 24 hours worth of queries various people have tried at Scroogle to see if they've declined in the new ranking algorithm. Terms that had many changes are at the top of the list.

For example, earlier this week the Scroogle hit list showed that the top 99 of 100 results in a search for christmas present idea at Google were different under the new algorithm compared to the old. That's not entirely accurate, as explained more in my previous article. But overall, it's close enough. For that query, things have radically changed. The same was true for terms such as diet pill and poker gambling, both of which could be considered highly commercial in nature.

That's where the idea of there being "money terms" comes out of. Sites aiming to rank well for these terms may be expecting to make money. Some believe Google has thus decided to filter out some of these sites -- particularly the ones showing an intent to optimize their pages for Google and which are not major commercial entities -- and force them into buying ads.

It's a compelling theory. However, there are also commercial terms that showed little change, such as christmas time, books, sharp ringtones and games. The hit list is also compiled by those who are checking their own terms. As you might expect, that means it will be heavily skewed toward commercial queries. If a bunch of librarians entered a mass of non-commercial terms, there might have been some dramatic changes seen for that class of queries, as well.

In fact, a search for 1 2 3 4 5 6 7 8 was on the Scroogle hit list, someone obviously trying to test what happens with non-commercial searches. It came up with a score of 36 dropped pages. That's high enough to make you think that phrase might be in the "money list" dictionary, yet nothing about it appears commercial in nature.

There's no doubt the new algorithm does seem to have impacted many commercial queries very hard, in terms of the amount of change that's been seen. However, this seems more a consequence of how the new algorithm works rather than it coming into play only for certain terms. In other words, new criteria on how much links should count, whether to count particular links, when to count anchor text more (text in a hyperlink) and even what's considered spam probably have more impact on commercially-oriented queries.
It is possible that Google is also making use of its AdWords data. It wouldn't be difficult to examine what terms attract a lot of earning and use that data to make a list or even to feed the new algorithm.
For its part, Google won't confirm whether it is using some type of list or not.

In the end, whether there's a predefined list of terms or this is something happening just as a consequence of the new algorithm is moot. The final result is the same -- many sites that did well in the past are no longer ranking so highly, leaving many feeling as if they've been targeted.

Q. How can Google be allowed to hurt my business in this way?

There's no end of people complaining how they're losing business because Google is no longer sending them traffic for free. The painful lesson to be learned is that it's foolish to assume that any search engine will deliver traffic for free.

Back before we had paid listings, one of my top search engine optimization tips was not to depend solely on search engines. They have always been fickle creatures. Today's cries about Google and lost traffic are certainly the worst I've ever heard. But I can remember similar complaints being made about other major search engines in the past, when algorithm changes have happened. even has a good thread going where people are sharing past memories of this.

We do have paid listings today, of course. That means you can now depend on search engines solely for traffic -- but only if you are prepared to buy ads.

As for free listings, these are the search engine world's equivalent of PR. No newspaper is forced to run favorable stories constantly about particular businesses. It runs the stories it decides to run, with the angles it determines to be appropriate. Free listings at search engines are the same. The search engines can, will and have in the past ranked sites by whatever criteria they determine to be best. That includes all of the major search engines, not just Google.

To me, the main reason Google's changes are so painful is because of the huge amount of reach it has. Google provides results to three of the four most popular search sites on the web: Google, AOL and Yahoo. No other search engine has ever had this much range, in the past. Go back in time, and if you were dropped by AltaVista, you might still continue to get plenty of free traffic from other major search engines such as Excite or Infoseek. No one player powered so many important other search engines, nor were typical web sites potentially left so vulnerable to losing traffic.

The good news for those who've seen drops on Google is that its reach is about to be curtailed. By the middle of January, it will be Yahoo-owned Inktomi results that are the main "free" listings used by MSN. Sometime early in next year, if not earlier, I'd also expect Yahoo to finally stop using Google for its free results and instead switch over to Inktomi listings.

When these changes happen, Google will suddenly be reduced from having about three quarters of the search pie to instead controlling about half. That means a drop on Google won't hurt as much.
Inktomi will have most of the other half of that pie. Perhaps that will be better for some who were recently dropped in ranking at Google. However, it's possible they'll find problems with Inktomi, as well.
In the past, I've heard people complain that paid inclusion content with Inktomi gets boosted or that crawling seems curtailed to force them into paid inclusion programs. Those complaints have diminished primarily because Inktomi's importance has diminished. Indeed, when Inktomi changed its algorithm in October, there were some negative impacts on site owners that surfaced. However, those concerns were hardly a ripple compared to the tidal wave of concern over Google. Once Inktomi's importance returns, so will likely a focus on any perceived injustices by Inktomi.

Q. I heard Google's dropping pages that show signs of search engine optimization. Do I need to deoptimize my web pages?

If you absolutely know you are doing something that's on the edge of spam -- invisible text, hidden links or other things that Google specifically warns about -- yes, I would change these.
Aside from that, I'd be careful about altering stuff that you honestly believe is what Google and other search engines want. In particular, I would continue to do these main things:
  • Have a good, descriptive HTML title tag that reflects the two or three key search phrases you want your page to be found for.
  • Have good, descriptive body copy that make use of the phrases you want to be found for in an appropriate manner.
  • Seek out links from other web sites that are appropriate to you in content
Should you start removing H1 text around copy? Drop comment tags that are loaded with keywords? Cease doing other specific things you've heard might help with search engines. If you put these there only because you thought it helped with search engines, then perhaps. It wasn't natural to do this, and Google potentially could seek such indicators to determine you have an overly optimized page.

I almost hesitate to write the above. That's because I'm fearful many people will assume that some innocent things they may have done are hurting them on Google. I really don't feel that many people have dropped because Google is suddenly penalizing them. Instead, I think it's more a case that Google has done a major reweighing of factors it uses, in particular how it analyzes link text. In fact, that's exactly what Google says. Most changes people are seeing are due to new ranking factors, not because someone has suddenly been seen to spam the service, the company tells me.

Should you start asking sites to delink to you, or to drop the terms you want to be found for from the anchor text of those links? Some have suggested this. If these sites have naturally linked to you, I wouldn't bother. Links to you shouldn't hurt. In fact, the biggest reason for a lot of these changes is likely that links are simply being counted in an entirely new way -- and some links just may not count for as much.

Should you not link out to people? Linking out is fine in my view and should only hurt you if you are linking to perhaps "bad" sites such as porn content. Do that, and you could be associated with that content.
It's also a good time for me to repeat my three golden rules of link building:

  1. Get links from web pages that are read by the audience you want.
  2. Buy links if visitors that come solely from the links will justify the cost.
  3. Link to sites because you want your visitors to know about them.
None of these rules involve linking for purely search engine reasons -- and so doing them should keep you on the right path in terms of getting appropriate links, I feel.

Q. Does the filter test indicate that I've spammed Google?

No, just because your site no longer ranks so highly on Google does not necessarily mean that you've spammed Google. Instead, it most likely means that some of the many factors Google uses to rank web pages have been adjusted -- and you no longer do so well with these. In other words, you haven't done anything wrong. It's simply that the scoring criteria has changed.

Think about it like a test. Let's say that in this test, people were judged best primarily on how they answered a written question, but multiple choice and verbal portions of the test also counted. Now the criteria has changed. The verbal portion counts for more, and you might be weaker in this area. That means someone stronger might do better in the test. You aren't doing worse because of any attempt to "cheat" but simply because the criteria is different.

Q. Does this mean Google no longer uses the PageRank algorithm?

Google never used the PageRank algorithm to rank web pages. PageRank is simply a component of that overall algorithm, a system Google uses to measure how important a page is based on links to it. It has always -- ALWAYS -- been the case that the context of links to the page was also considered, as well as the content on the page itself.

Unfortunately, some writing about Google have called its system of ranking PageRank, and Google itself sometimes makes this mistake, as seen in its webmaster's information page:

The method by which we find pages and rank them as search results is determined by the PageRank technology developed by our founders, Larry Page and Sergey Brin.
In reality, the page describing Google's technology more accurately puts PageRank at the "heart" of the overall system, rather than giving the system that overall name.

By the way, PageRank has never been the factor that beats all others. It's has been and continues to be the case that a page with low PageRank might get ranked higher than another page. Search for books, and if you have the PageRank meter switched on in the Google Toolbar, you'll see how the third-ranked Online Books Page with a PageRank of 8 comes above O'Reilly, even though O'Reilly has a PageRank of 9. That's just one quick example, but I've seen others exactly like this in the past, and you can see plenty first-hand by checking yourself.

Q. I thought the Google Dance was over, that the massive monthly update of pages had been replaced by a consistent crawl?

To some degree, the Google Dance had diminished. Historically, the Google Dance has been the time every month when Google updated its web servers with new web pages. That naturally produced changes in the rankings and so was closely monitored. Sometimes, an algorithm change would also be pushed out. That could produce a much more chaotic dance.

Since June, life has been mercifully quiet on the dance front. Google has been moving to refresh more of its database on a constant basis, rather than once per month. That's resulted in small changes spread out over time.

Google says that continual updates are still happening. The dance came back not because of a return to updating all of its servers at once but rather because of pushing out a new ranking system.

Q. If we remove our shopping cart, could that help us get back on Google, even though we'd be booted off Froogle?

This question coincidentally came in just after I saw Google implement Froogle links in it search results for the first time. Talk about timing!

No, removing your shopping cart really shouldn't have an impact on your regular Google web page rankings. Lots of sites have shopping carts. It's perfectly normal to have them.

As you also note, having an online shopping service means you have data to feed Google's shopping search engine Froogle. And Froogle's now hit Google in a big way. If Froogle has matches to a query, then Froogle links may be shown above web page matches at Google.

It happens similar to the way you may get news headlines. Search for iraq, and you'll see headlines appear above the regular web listings next to the word "News." If you search for a product, then you may see similar links appear listing product information from Froogle, next to the words "Product Search."

Google unveiled the new feature late Friday, and it's to be rolled out over this weekend, the company tells me. A formal announcement is planned for next week, and Search Engine Watch will bring you more about this.

In the meantime, anyone who's been dropped by Google in its regular web search results should seize upon Froogle as a potential free solution to getting back in. Froogle accepts product feeds for free -- see its Information For Merchants page for more. And since Froogle listing are now integrated into Google's pages, it means you can perhaps regain visibility this way.
For more about Froogle, see these past articles from Search Engine Watch:

Q. Can you get on a soapbox about all these Google changes?

Sure. Let me start with something one of my readers emailed:
I truly believe that Google has done us wrong. We worked hard to play by the rules, and Google shot us in the back of the head.

That comment is typical of many you see in the forums. Many people are mystified as to why they are suddenly no longer deemed good enough by Google, especially if they had been doing well for a long period of time and feel they played by the "rules."

Yes, free listings aren't guaranteed. Yes, search engines can do what they want. Yes, it's foolish for anyone to have built a business around getting what are essentially free business phone calls via Google.

None of that helps the people feeling lost about what to do next. Many have been dropped but may see sites similar to theirs still making it in. That suggests there's a hope of being listed, if they only understood what to do. So what should they do? Or what shouldn't they be doing?

My advice is unchanged -- do the basic, simple things that have historically helped with search engines. Have good titles. Have good content. Build good links. Don't try to highly-engineer pages that you think will please a search engine's algorithm. Focus instead on building the best site you can for your visitors, offering content that goes beyond just selling but which also offers information, and I feel you should succeed.

Want some more advice along these lines? Brett Tabke has an excellent short guide of steps to take for ranking better with Google, though I think the tips are valid for any search engine. Note that when GoogleGuy was recently asked in a WebmasterWorld members discussion what people should do to get back in Google's good graces, he pointed people at these tips.

I Did That -- And Look At How It Hasn't Helped!
Unfortunately, some believe they've followed these type of tips already. Indeed, one of the nice things about Google's growth over the past three years is that it has rewarded webmasters who have good content. As they've learned this, we've seen a real shift away from people feeling they need to do what's often dubbed "black hat" techniques such as targeted doorway pages, multiple mirror sites and cloaking.

That's why it's so alarming to see the sudden reversal. Some people who believe they've been "white hat" now feel Google's abandoned them. Perhaps some have not been as white hat as they thought, but plenty are. Many good web sites have lost positions on Google, and now their owners may think they need to turn to aggressive tactics. This thread at WebmasterWorld is only one of several that show comments along these lines.

Maybe the aggressive techniques will work, and maybe not. By my concern is really reserved for the mom-and-pop style operations that often have no real idea what "aggressive" means. To them, aggressive means that they think they need to place H1 tags around everything, or that every ALT tag should be filled with keywords, or that they should use the useless meta revisit tag because somewhere, somehow, they heard this was what you need to do.

More Openness From Google
One thing that would help is for Google to open up more. It has a new ranking system, obviously. It should be trumpeting this fact and outlining generally what some of these new mystery "signals" are that it is using to help determine page quality and context.

Google can provide some additional details about how it is ranking pages in a way that wouldn't give away trade secrets to competitors nor necessarily give some site owners a better ability to manipulate its listings. Doing so would make the company look less secretive. It might also help explain some of the logic about why sites have been dropped. That would help readers like this:

What really concerns me right now is that there doesn't appear to be any rhyme or reason as to why some sites have a good ranking and what we could do to improve our rankings.

Maybe Google has decided that it makes more sense to provide informational pages on certain topics, because otherwise its listings look the same as ads (see the honeymoon case study for an example of this).
If so, that's fine. It can defend this as helping users, ensuring they have a variety of results. But at least the declaration that it is doing so will let site owners understand that they may need to create compelling informational content, not sales literature. They may also realize that they simply are not going to get back free listings, for some terms. With that understanding, they can move on to ads or other non-search promotional efforts.

Searchers Want To Know, Too
Google doesn't just need to explain what's going on to help webmasters and marketers. Most important, some of Google's searchers want to know how it works behind the scenes.

Google has set itself up almost as a Consumer Reports of web pages, effectively evaluating pages on behalf of its searchers. But Consumer Reports publishes its testing criteria, so that readers can be informed about how decisions are made. It's essential that Google -- that any search engine -- be forthcoming in the same manner.

To its credit, Google has given out much information. There's a huge amount published for webmasters, and even more is shared through forums and conferences. But if Google is now doing things beyond on-the-page text analysis and link analysis that it has publicly discussed, it needs to share this so searchers themselves can be more informed about how decisions are reached.

Right now, some of these searchers are reading news reports that a search for miserable failure brings up US president George W. Bush's biography as the top result. They'll want to understand why. Is Google calling Bush a miserable failure? Is this an example of Google's "honest and objective way to find high-quality websites with information relevant to your search," as its technology page describes?

The answer to both question is no. Google Bombing has made that biography come up first, and those doing the bombing have no "objective" intentions behind it. They think Bush is a failure, and they are using Google as a means to broadcast that view.

Does this mean Google is a miserable failure as a search engine? No. Ideally, Google should have caught such an overt attempt to influence its rankings, and it's notable that this got past even its new ranking system. However, Google is not perfect, nor will it ever be. Fortunately, searchers seeing a listing like that can understand why it came up if they understand a bit about how link analysis works. That helps them better evaluate the information they've received.

Now go search for christmas at Google. I bet plenty of searchers are wondering why, like my colleague Gary Price of ResourceShelf who reported this to me, Marylaine Block's web site is ranked sixth for christmas out of 36 million possible web pages?

Block's not sure herself. Links may have something to do with it, but so might some of these new "signals" about page quality and content of which Google cannot speak. Since Google's not talking, we can't understand -- and crucially -- forgive when it makes mistakes.

Marketer Reality Check
Having dumped on Google, it's also important that webmasters and marketers understand that Google is never going to outline exactly how it works. No popular search engine will ever do this, because the volume of successful spam that would result would bring the search engine to its knees.
Marketers also have to recognize that Google and other search engines will continue altering their ranking systems, just as they always have done -- and that listings will change, sometimes dramatically, as a result.
Whether Google and the others discuss openly how they work or not, people eventually discover new ways to be successful with spam. That has to be fought.

More important, the nature of search keeps changing. Links were a useful "signal" to use and one that gave the relevancy of web crawling a new lease on life several years ago. Now linking is different. Blogs link in a way that didn't exist when Google launched. Reciprocal linking and link selling is much more sophisticated and often designed to take Google and search engines into account. These are just two reasons why the methods of analyzing links has to change.

It's also a certain fact that the most popular and lucrative real estate on a search engine is not going to continue to use web crawling as its first source of data. It simply makes more sense to go with specialized data sources when these are available. Web search's destiny is to be backfill for when these other forms of data fail to find matches.

Free traffic from web listings will inevitably decline as search engines make use of specialized data sources through invisible tabs. It won't go away entirely, and there's always going to be a need to understand "search engine PR" to influence free results. But smart marketers will realize that they need to look beyond web search to stay ahead.

If Google dropped you, Froogle just got a promotion as a new way to get back in. So, too, will other opportunities come up. The downside is, unlike Google -- or even Froogle -- they'll likely cost money. Smart businesses will realize they need to budget for this, just as they budget for advertising and to obtain leads in the real world. It's the rare and exceptional company that can get by on PR alone -- even the UK's popular Pizza Express chain had to diversify into advertising.

SES New York Become an Expert Digital Marketer at SES New York
March 25-28, 2013: With dozens of sessions on Search, Social, Local and Mobile, you'll leave SES with everything and everyone you need to know. Hurry, early bird rates expire February 21. Register today!


Change In The DNA 4: Google - Update "Cassandra" is here

  1. Brendon Scott Senior SEO at Weboptimiser
    11 April 2003 11:57am
    Brendon Scott Well, Google started updating late on the 10th April (UK time), and you can see the beginnings of the new database at www2 and Look slike multiple links from the same site are being hit this time round, so we may see some cleaner results this time, as the "Damn the crosslinking spam filters! Full speed ahead!" crowd might take a real knock this month.

    As promised, Google are including previously banned sites which have mended their wicked ways, and are requesting that anyone else who wishes to be reconsidered sends an e-mail to, with a subject of "Reinclusion request", or similar, along with a confessional ("Receive my confession, O most loving and gracious GoogleGuy...") saying what you did, and how you've cleaned up. If you are going to do it, do it quickly, before the next deep crawl starts for maximum chance of reinstatement

    I personally would be a bit economical with the truth here; these requests are one of Googles best sources of information on how people spam them and exploit the more glaring weaknesses in their algorithm, and it wouldn't do for them to be able to find and fix ALL of them ;)

    Also, GoogleGuy dropped a BIG HINT :
    He is actively requesting spam reports on hidden text, and hidden links. If you are using these techniques, I would SERIOUSLY look at ways to remove/replace them, BEFORE the deep crawl comes around in a few days time (DeepBot comes from the 216.* block, don't confuse it with the FreshBot out of the 64.* block. Both report themselves as GoogleBot, but its Ms. DeepBot who you need to court most)

    So, there we have it, Google update "Cassandra", April 2003. Luck to you all




     12:38 am on May 16, 2003 (gmt 0)

    Earlier this week, albert [] started a thread called “Understanding Dominc [].” In that thread, he compiled a list of all of GoogleGuy’s recent comments concerning update dominic. The excerpts were a big help for many of our members. However, the following discussion quickly ended up much like the various threads that albert originally had to dig through in order to compile his list.
    Trying to wade through all the various update threads has clearly caused some frustration [] for many of our members. So in order to try and alleviate some of that frustration, I thought we might try a new version of albert’s original post. This version includes all of albert’s original excerpts. It will also be updated periodically. However, we are going to keep this thread “read only.”
    Anyone interested in following GoogleGuy’s contributions to the current update discussions can bookmark or flag this thread. If/when we post updates, this thread will appear on the active list.
    If you have any questions or comments relating to anything in this thread, I would greatly appreciate it if you post them in one of our current update threads.
    Update dominic – Part 9 []
    Serious Google update analysis thread []

    SJ started testing a new index with a sightly different build of backlinks.
    GoogleGuy msg #44 May 5

    rfgdxm1, every index has to pass a really stringent battery of tests before it becomes visible. SEOs might notice a slightly different build of backlinks, but things like that could be balanced by factors that improve search quality more in other areas. The other thing to bear in mind is that it's easy to re-sync something like backlinks or spam snapshots once you're convinced that an algorithm or method is an improvement.
    SJ results will show up at other data centers soon
    GoogleGuy msg #107 May 5

    Critter, it wouldn't surprise me to see SJ results start to show up at other data centers soon.
    Test of new method with a known base of backlinks, bringing in more up-to-date backlink and spam info later on
    GoogleGuy msg #134 May 5

    Traveler, good question. From the first few posts of that 500+ thread, several people mentioned that they have some very new results in SJ. It's natural that we would test new methods by using a known base of backlinks, but that shouldn't be discouraging to people--backlinks are the sort of data that Google could bring back in over a relatively short time frame. And the same thing goes for known snapshots of spam--that can be brought in fairly quickly as well. SEOs notice whether a backlink comes from two months ago or one month ago, but typical users would care more about fresher pages.
    SJ index is not old
    GoogleGuy msg #146 May 5

    Critter, the SJ index isn't an older index. You can verify that by doing a topical query such as SARS. The results are more fresh in SJ than they are in our regular index.
    About backlinks from forums
    GoogleGuy msg #160 May 5

    Critter, if it's the site listed in your profile, it looks like you only have 5-6 domains that link to your site. A few of those are forum links that might not have made it into the base of backlinks. Getting links from places like the Open Directory Project would help, for example.
    About guestbook links
    GoogleGuy msg 165 May 5

    Much more likely that those guestbook links just aren't given weight now, rfgdxm1.
    Backlinks and spam snapshots will be added later
    GoogleGuy msg 178 May 5

    mcavic, I think I did say that newer backlinks and spam snapshots would be pending to be applied over time. Or at least I tried to. :)
    SJ data will show up at other data centers first, new data / filters after that
    GoogleGuy msg #298 May 5

    albert, what you said, except I wouldn't be surprised to see SJ show up at other data centers first, and then to start applying the newer data/filters after that.
    Less Backlinks for all sites
    GoogleGuy msg #108 May 6

    [i]Don't be alarmed if the number of reported backlinks goes down. That's actually to be expected in the update. Most of it affects all sites uniformly, so it comes out in the wash as being equal. The better way to measure it is how your rankings/traffic change.
    SJ results will show up at other data centers
    GoogleGuy msg #43 May 6

    What rfgdxm1 said. I think you'll see SJ results appear at more data centers over time.
    Nothing new.
    GoogleGuy msg #11 May 8

    I'm still hanging around. There's not that much new info to convey, but I'm here.
    A few more backlinks were added
    GoogleGuy msg #197 May 10

    I think we added a few more backlinks in yesterday. I'm assuming people have read HitProf's thread on backlinks too? rfgdxm1, sorry to hear that you don't like the SJ index. I also checked your ingredient theory in your spam report. People had suggested that a long time ago at the GooglePlex, but that's not the primary addition for SJ.
    It was only a minor update (so far)
    GoogleGuy msg 203 May 10

    Twas a minor update in backlinks, MyWifeSays. I still expect SJ results to be seen at more data centers first.
    sj/fi index will shift to other data centers. After more backlinks and other data
    GoogleGuy msg #52 May 14

    webdev, I think almost all of these questions have been answered several hundred threads ago. As late as this morning, I posted saying that I expected sj/fi index data to make its way to other data centers and to various sites. Once that data appears more broadly, we'll gradually be pulling in more backlinks and applying other data.
    sj/fi data centers have been approved
    GoogleGuy msg #59 May 14

    steve128, the sj/fi data centers have been tested and approved. What I said several hundred posts ago was that you can expect an index like that to show up at more (and possibly all) data centers in the future.
    More pages and backlinks to be added
    GoogleGuy msg #73 May 14

    Ltribe, I expect more pages and backlinks to be brought in with time.
    Help spread the word ...
    GoogleGuy msg#86 May 14

    Maybe I'll just set things up to auto-post every 50 posts or so. :)
    Please help spread the word so people know what to expect and don't worry too much.
    SJ and/or FI - and other data centers
    GoogleGuy msg #377 May 14

    I wouldn't draw huge distinctions between sj and fi. When I say "I expect the sj index to spread to other data centers," that could be sj or fi.
    SJ and FI similar, but emphasizing different things like topicality - same with different data centers
    Google Guy in this thread msg #16 May 14

    There's a lot of backlinks on the web. :) It will take some time to bring them all in. To clarify a couple things:
    - it helps to think of sj and fi as similar. It's better conceptually to think of them as cut from the same cloth.
    - chiyo, every data center has different machine characteristics. So similar/identical indices might look slightly different at different data centers. This goes back to the point above. Don't think of it as if we build a different index with a different theme for different customers. Our partners get the same scoring/data that we use. That said, the global index that we build can emphasize different things more, such as topicality or more diverse file types.
    So sj/fi are different in several ways. I would expect that difference to spread to other data centers. Then things will resume moving forward.
    Hope that helps,
    Different nature of SJ and FI is expected to spread to more data centers
    Google Guy in this thread msg #28 May 14

    trillianjedi, I'm saying that sj and fi are of a different nature than previous indices, and that I expect that different nature to spread to more data centers (ex, anyone? :)
    I've been using "sj" to denote this different nature, but I appreciate the chance to clarify.
    Measuring the size/usefulness of an index
    GoogleGuy msg 4 May 14

    There are lots of different ways to measure the size/usefulness of an index. Nice job to Allergic for noticing something that most people usually don't. :)
    Timeframe to bring in backlinks - and: should we watch sj/fi
    GoogleGuy msg 43 May 14

    x_m, you could always do that yourself by just not querying sj/fi. :)
    trillianjedi, just to clarify on the other point, I mentioned that backlinks could be brought in on a relatively short timeframe, but remember that we are talking about terabytes of data here--the web is a big chunk of data. I think I also replied to someone else at one point that bringing in those backlinks wasn't the sort of thing that could happen in a day or two. Hope that sheds more light on things


     9:45 pm on May 17, 2003 (gmt 0)

    Backlink reduction due to improved estimation of link counts
    [ ]
    GoogleGuy msg #7 May 9

    HitProf, I really like your insights. I think our newer systems do a much better job of estimating link counts.
    -dc entered the game
    [ ]
    GoogleGuy msg #2 May 17

    Good deal. Sounds like things are on schedule.
    Recent indexing changes - Google gives no guarantee for static search results
    [ ]
    GoogleGuy msg #3 May 17

    Hollywood, we never promise that we're going to return the same static set of results for any length of time for any query. The only thing we promise is that we're going to try to return the best results that we can. I would not base your business deals on the assumption that Google search results will remain static for long periods of time. That's just not something that we promise.
    Bringing in backlinks is no overnight thing
    [ ]
    GoogleGuy msg #2 May 17

    Bringing in backlinks and other data will be a gradual process over time. It won't be an overnight thing once all the data centers have the sj-type index. Once that index is everywhere, I'm looking forward to us bringing in fresher data--but I want to set expectations that it will take some time.
    Timeframe is more than days, less than months. - Calm down, and remember last september
    [ ]
    GoogleGuy msg #16 May 17

    I would say more than days, less than months. That's just my personal take. A lot of people are paying attention to every microdetail at this point. I would say that stepping back a level of detail would give better insight and less stress. Suppose you're on a long bus trip. If you scrutinize the road for every bump, pebble, or sharp turn, you're going to be more stressed than you need to be. If people are newer to WebmasterWorld, I'd recommend going back to last September, when we improved our scoring. If you re-read those threads, you'll see the same sorts of reactions that you see now. People claimed the sky was falling. You saw 5-10 people making pretty alarmist claims as loudly as possible. People suggested that scoring changes were a secret attempt to boost AdWords sales. The imminent destruction of Google was predicted several times. A few personal attacks on GoogleGuy took place. :) We got through that, and re-reading those threads will give you a different perspective on these more recent threads. Take that, plus the fact that I've said 5-6 (9-10?) times that we'll be bringing in better filters and fresher data, and most people should feel better, I hope.
    We still use the scoring improvements from last September, but you don't hear people worrying about it now. I'm looking forward to doing a similar post for our next major improvement, whenever that comes: "You know, if you go back to May, you'll see several webmasters were worried about that change too. Would you believe that there were over 4000+ posts from people who were anxious about that change? ..."
    Anyway, each webmaster is free to do whatever they want, of course. I do think that people would feel less stressed if they took a step (or two) back though. I think you've seen that somewhat; many of the senior members of WebmasterWorld haven't been doing tons of posts in the dominic discussions.
    Spam showing up right now will be short-lived
    [ ]
    GoogleGuy msg #24 May 17

    tigger, sometimes weekends are easier for making it over here. :)
    Hey nutsandbolts, let's try something new. If you have feedback about this index, do a spam report with your nickname, mention webmasterworld, but also include "dominic" in the comments. If people have general feedback or specific examples about this index, that's the best way to get the examples to us. I'm not worried about spammers that were banned and are back for a brief time--that will be short-lived. But if you have comments about searches that seem better/worse (preferably not just searches for your own site), send a spam report with "dominic" somewhere in the comments. We'll read what people have to say when they can give specific searches.
    [edited by: Marcia at 3:39 pm (utc) on May 19, 2003]
    [edit reason] Modified nesting for formatting. [/edit]

     3:35 pm on May 19, 2003 (gmt 0)

    Look at the overall picture. - Do you show well at sj/cw/fi and the like? - Spam will disappear again
    [ ]
    GoogleGuy msg 39 May 17

    Kirby, I'm not worried about spammers that have been handled before and appear to be back. Most of those will be gone again. But I do want to hear spam reports like "this type of search seems to work better or worse with the new system."
    worker, if I were playing Yoda like Alphawolf suggests, I'd say: worry less about PR on the toolbar and more about rankings. And less about rankings for high-profile phrases and more about overall rankings. And less about rankings and more about traffic. And less about traffic and more about conversions. Maybe that doesn't sound much like Yoda though. I think the shorter answer would be that if you're showing well at datacenters like sj/cw/fi, I wouldn't worry much about what the PR says.
    I just wanted to say thanks to the other posters who are taking a step back and looking at the overall picture. Please help to remind people that the longer-term view will keep folks less stressed and more productive. :)
    About micro-level views, larger landscapes, and a post showing deep understanding
    [ ]
    GoogleGuy msg #54 May 17

    teeceo, when bringing up a new system, you want to work from a known base of data. I fully expect that after that, we'll be working to bring in newer sites. And your anchortext, rfgdxm1. ;) FWIW, BigDave has (in my experience reading his posts--I don't know who he is) a deep understanding of Google's workings and perspective. I would give his comments as much consideration as you would mine.
    mrdch, I understand why webmasters may be anxious--mainly my post was meant to bring out a few points that people may have missed. WebmasterWorld is great for the micro-level view ("I dropped from #4 to #16, but only in cw!"), but not always for the larger landscape.
    On-page factors: Well optimized individual pages bring traffic from different important key phrases
    [ ]
    GoogleGuy msg #78 May 17 (referring to msg #71)

    annej, I really wish every webmaster would do the log analysis that you just did. :)
    Still a lot of work left to do
    [ ]
    GoogleGuy msg #109 May 18

    Stefan, there was a lot of work at Google behind these changes. We're trying the make the transition as gentle as possible, but there's still a lot of work left to do.
    Specific feedback about improved or worsened searches welcome
    [ ]
    Google Guy msg #181 May 18
    (Remark: quote shortened)

    This is just my take, but I don't think it helps anybody to have folks call other people dancing monkeys, or post claiming "B£*LLSH!T" in all caps, or virtually jump out of windows. If people have constructive comments for this index, I gave a method back in msg #24 of giving us specific searches or types of searches that you consider good or bad. I just checked, and I don't mind telling you that so far it's a single digit number of reports. I'm guessing that number will go up at least some after this reminder post :) but if you have specific searches or suggestions to pass on, that's probably the best way to get them to us.
    Current status of update
    [ ]
    GoogleGuy msg #205 May 18
    (Remark: read accurately, also posts referred to)

    Sure, Anon27 and deanril. I think the plan will be
    1) deploy the new index/system across all data centers
    2) begin pulling in more data (i.e. newer backlinks, pages, and spam updates)
    3) once new data is into the system, begin pulling in new algorithms that have been waiting in the wings
    I believe the current status is that we're around step 1.5 or thereabouts; something like 7 or so data centers have the new index/system. I expect the current pace of switching data centers to continue about as it has been. I would expect step two to occur over roughly the same timeframe as a typical index cycle (thus the "more than weeks, less than months" comment). Step 3 is longer-term and ongoing, but I'm really excited about what we'll be able to do to improve quality across the board.
    Hope that helps,
    (Staza, that should help with your questions: yes, and I believe so to #1 and #2.)
    [edited by: Marcia at 3:44 pm (utc) on May 19, 2003]
    [edit reason] Formatting of nested elements. [/edit]

     8:31 pm on May 21, 2003 (gmt 0)

    About hidden text
    [ ]
    GoogleGuy msg #22 May 19

    Thanks for the feedback. If people want to mention specific searches/sites, they can do a spam report and put "dominic" in the comments somewhere--and hopefully mention your nickname.
    Just today, I've fielded questions from two sites that had hidden text and were taken out for 30 days. One site owner knew that they had had hidden text and one didn't. If you had hidden text on your pages and you think that might be the problem, you'll want to double-check that all the hidden text is gone. The first set of hidden-text penalties will be expiring over the next week or two. Sounds like that probably isn't an issue, but it's something to bear in mind.
    Don't gun for #1 at "that major keyword"
    [ ]
    GoogleGuy msg #36 May 20

    paynt, feel free to use that. I think I'm paraphrasing something Danny Sullivan would say when I talk like that. :) Just the idea that paying more attention to users, what they're finding and what they want, rather than gunning for #1 on that "major keyword" is one thing that demonstrates a wise SEO. :)
    TLD / language, give specific comments to Google
    [ ]
    GoogleGuy msg #31 May 20

    Hey Napoleon, I think I've already commented on our SJ index at length, and given people some idea of what to expect. I believe the change for non-U.S. IP addresses is geared to allow users to separate language from TLD, and I believe that it was tested for several weeks on one TLD, with us watching for feedback. Given that it's a UI change on the home page, the amount of negative feedback has been very low--a good sign. I know that the UI team is interested in the best way to present international users with a choice of which TLD to use and which language to use. If you write to with some subject line such as "UI for TLD/language," I'm sure it will reach open ears on the UI team--they want feedback on how to make things smoother for everyone. I'll be curious to read suggestions for how to make that UI better myself. As far as things like expired domains, changes such as SJ are exactly the sort of thing that will allow us to pull in better algorithms that can take full advantage of that data.
    Anyway, I hope that addresses some of your questions. We're always looking for feedback on what we can do better. In another thread, I gave a way to give specific feedback to Google about the SJ index: do a spam report with "dominic" in the comments. I've since reminded people about that method a couple times. Yet the number of reports via that form has been less than even the number of posts on this thread. After 3000+ posts about this index on WebmasterWorld, we're sitting at around 30 concrete suggestions about what's bad/good about the SJ index. By that measure, SJ is definitely an improvement over past indices.
    More changes to come
    [ ]
    GoogleGuy msg #42 May 20 re msg #37 (shortened)

    NovaW, I would certainly expect to more changes like the ones you mentioned, which is a good thing.
    links=PR0 like red herring
    [ ]
    GoogleGuy msg #45 May 20

    Napoleon, the links = PR0 sounds like a red herring, but I'll be happy to root around and read it. :)
    Keep on doing what's right
    [ ]
    GoogleGuy msg #11 May 20

    Oh well. People knock on Google sometimes. If you just keep doing what you know is right, things seem to work out just fine. :)
    About link pages
    [ ]
    GoogleGuy msg 158 May 20

    Sorry, that's what I was trying to say earlier. :) Links pages don't get PR0, and we index them.
    Funny one about link farms
    [ ]
    GoogleGuy msg 161 May 20

    They're mostly located in the Midwest, I think. Lots of irrigation and sunshine to raise those little links. The links need time to mature, or else they aren't as good. Once the links are harvested, they're shipped throughout the world to different companies that make HTML editors, and those companies can feed the links right in.
    If you drive across the U.S., you can still see link farmers from time to time.
    Links will be 'organic'
    [ ]
    GoogleGuy msg #170 May 20

    walthamstow, just remember that you want all-organic links. :)

     9:28 pm on May 23, 2003 (gmt 0)

    Phase 2 Begins
    Status: index is at all data centers. Subtle spam filter comes soon. Fresh data coming in next days. Timeframe: more than weeks, less than months.
    [ ]
    GoogleGuy msg #53 May 23

    Some fresh data might be incorporated next-day, but I wouldn't expect everything freshbot found yesterday to make it in a day. There's a lot of data that needs to be fetched and cross-checked--I would expect the full data to show up more on the timeframe of what you would expect from a crawl/index. Step 1 is done (index is at all data centers). I know that one subtle spam filter is going in soon, but Napoleon, I would start counting on the weeks-but-not-months comment beginning from the time that the index switchover finished. Again, just trying to give webmasters iinformation so they have the right expectations: more than weeks, less than months.
    But I'm glad that people have noticed that freshie has been crawling deeply in addition to normal freshie duties.
    Timeframe again. And changes expected for country language / TLD / redirection next week.
    [ ]
    GoogleGuy msg #58 May 23

    I think we're on the same wavelength, Napoleon.
    P.S. About the questions you raised about country language/tld/redirection, I think we've got some changes scheduled soon (next week) that should bring it in line with what many users expect again. Just wanted to let you know that that was coming too.
    Traditional update(s) expected for a little while longer. After, things will be more gradual.
    [ ]
    GoogleGuy msg #65 May 23

    crobb305, some things will be filtering in sooner (I know a few more autospam filters will happen earlier), and I wouldn't be surprised to see fluctuations in backlinks and pages, but I would hold on to the idea of an update that brings in more data for a little while longer. In time, I do think things will be more gradual. However, we're still in the transition period for this system, so I wouldn't be surprised to see a traditional update for a little while longer. Hope that helps.

     10:42 pm on May 23, 2003 (gmt 0)

    At least another traditional update expected
    [ ]
    GoogleGuy msg #81 May 23

    Sure, reneewood. I would expect at least another update of the form where the crawl/index cycle finishes and then data centers are updated in the traditional dance.
    Next update: what will/should be brought in. Freshbot looks like Deepbot.
    [ ]
    GoogleGuy msg #88 May 23

    dnbjason, I've learned never to make promises, but I would say the odds are good. :) rfgdxm1, more anchor text, backlinks, spam filters, etc. should be brought in by the next update. I think someone else remarked that freshbot is looking more like deepbot these days.
    Definitely expect newer data affecting ranking
    [ ]
    GoogleGuy msg #91 May 23

    I would definitely expect newer data that we bring in to affect rankings.

     10:30 pm on May 25, 2003 (gmt 0)

    About toolbar, status of index, and backlinks
    [ ]
    GoogleGuy msg #43 May 23

    MurphyDog and notsleepy, I wouldn't worry. I've seen at least a couple grey bars for well-known sites, so I'm asking around about that. I wouldn't worry about the toolbar display during this transition. shaoye, the index includes many new pages and new backlinks, but also includes an older snapshot of backlinks for now as well.
    What exactly should happen with next update
    [ ]
    GoogleGuy msg #102 May 23

    Let me clarify. The next update should bring in more backlinks, data, etc.
    Those dang prepositions. :)
    Fluctuations before next update
    [ ]
    GoogleGuy msg #109 May 23

    rfgdxm1, I wouldn't go as far as saying that SERPs aren't going to change at all until the next update. Some amount of fluctuations might occur before then (e.g. a couple of small spam filters, or some fresher data).
    Older data for cross-verification and worst case backup
    [ ]
    GoogleGuy msg #113 May 23

    TheAutarch, that data can be used for cross-verification. It's also there as a worst-case back-up, but I don't think we'll need to use it.
    Some new data such as spam filters come before next index, larger set of data comes with next index
    [ ]
    GoogleGuy msg 120 May 23

    jojojo, some data will be brought in, such as spam filters. A larger set of data will be brought in with the next index.

     8:05 pm on Jun 10, 2003 (gmt 0)

    Thanks for the killer post WebG.

    Change In The DNA 6: Google Update Esmeralda

    By kpaul in Op-Ed
    Tue Jun 24, 2003 at 07:34:57 AM EST
    Tags: Internet (all tags)

    Google is dancing to a different tune this month. Some think the beat is from war drums playing in anticipation of a Microsoft search product. Following the naming convention started at WebmasterWorld (Google updates are given names like hurricanes), the Esmeralda Google Update is currently winding down.

    For those of you who aren't regular Google-watchers, the Google dance (update) refers to a time roughly once a month when Google updates their index of the web with info from the last month's crawl of websites. During the update, the results Google displays differ in various servers in the Google cluster, hence the term Google is dancing.

    Google has been undergoing a lot of big changes during the last couple months, though, and things have been quite odd to say the least. A lot of people seem to think Google is moving toward a rolling update instead of a re-ranking of the entire web once a month. The weirdness started in May. The monthly update (Dominic) was late - very late. The Deepcrawler, which spidered sites deeply once a month for the big update, hadn't been seen in the logs of anyone who watched for it. Very strange. Forums across the 'net experienced a flurry of activity as people tried to figure out what was going on at the Googleplex.
    To add to the mysteriousness, Freshbot began to act a little more like the Deepcrawler. Traditionally, Freshbot came from a different IP range and added pages to the index immediately as opposed to once a month like Deepbot did. Freshbot's purpose was to add 'freshness' to the search results in between monthly updates.

    Did the late monthly update and disappearance of Deepbot signal the birth of a new bot? Names like deepfreshBot and FredBot were posted around the 'net as people pondered the new activity in their logs.

    GoogleGuy (a Google employee who posts anonymously at WebmasterWorld - yes, it's been verified) confirmed that a big change was in the works and that people should be patient as Google evolved. He said the process would take "more than weeks, less than months."
    Cryptic, but to be honest, I'm happy to get any first-hand, 'unofficial' info from them. During Dominic, SERPs (search engine result pages) were all over the place. Some pointed back to the now infamous September '02 update that was talked about in Wired and on that other site.
    The doomsayers always appear during an update, though, predicting the downfall of the 'obviously-evil-because-they-don't-list-my-site' Google. The conspiracy theorists come out in droves as well.
    One theory that was offered to the world was that Google had run out of unique id's because they used four bytes instead of five or more when designing their database infrastructure. Googleguy mentioned that a Google employee fell out of his chair laughing when he heard that.

    Beyond the anger and conspiracies, though, some took GoogleGuy's messages and tried to make sense of what was going on. (This whole marketing thing Google has going on with GoogleGuy may be a story in itself.)

    Anyway, although no one from Google has officially confirmed it, they're apparently moving away from a once-a-month crawl/update cycle to a more continuous updating process. As the web grows, this is becoming a necessity to have the freshest results possible. They're also perfecting (as much as they can) automated spam detectors to help clean up the SERPs.

    For Google to stay on top (as they edge closer to an eventual IPO), these are worthy tasks. They need to innovate. They need to stay two steps ahead of their closest competitors, even if it means a little collateral damage in the SERPs for a little while. At this time, in my humble opinion, the only SE that comes close to rivaling Google is FAST's AllTheWeb.

    Yahoo is most likely going to dump Google as a provider of their search engine results at some point this year, though. And Microsoft is also showing an interest in the search engine market. With two giants like that (not to mention FAST becoming a giant in its own right), it's becoming more and more important for Google to maintain their lead.
    As I said in the intro, the Esmeralda update appears to be settling down somewhat and should stabilize sometime in the next few days. From there, SEOs (search engine optimizers) will again look at how Google ranks the web and how best to optimize for it.
    The bigger question, though, is what it means for the Internet at large? Personally, I think it's an effort by Google to maintain their dominance in the search engine game. And Google has always been a good neighbor in cyberspace so this is not necessarily a bad thing.

    Sure, some may snort and chuckle when Microsoft and search engines are mentioned in the same sentence, but they have a *lot* of money. In our society this means they don't necessarily need the best product to have the most market share. Just take a look at Internet Explorer and its dominance over Netscape and other browsers.
    The thing is, if Google can remember what they were like when they were still the little guys, they have a fighting chance at fending off Microsoft and the others. This latest major change to their algorithm and crawling methods is, I think, a move by Google to sprint ahead of the competition before it's too late.


Change In The DNA 7: Explaining algorithm updates and data refreshes

by on December 23, 2006
A thread on WMW started Dec. 20th asking whether there was an update, so I’m taking a break from wrapping presents for an ultra-quick answer: no, there wasn’t.

To answer in more detail, let’s review the definitions. You may want to review this post or re-watch this video (session #8 from my videos). I’ll try to summarize the gist in very few words though:

Algorithm update: Typically yields changes in the search results on the larger end of the spectrum. Algorithms can change at any time, but noticeable changes tend to be less frequent.

Data refresh: When data is refreshed within an existing algorithm. Changes are typically toward the less-impactful end of the spectrum, and are often so small that people don’t even notice. One of the smallest types of data refreshes is an:

Index update: When new indexing data is pushed out to data centers. From the summer of 2000 to the summer of 2003, index updates tended to happen about once a month. The resulting changes were called the Google Dance. The Google Dance occurred over the course of 6-8 days because each data center in turn had to be taken out of rotation and loaded with an entirely new web index, and that took time. In the summer of 2003 (the Google Dance called “Update Fritz”), Google switched to an index that was incrementally updated every day (or faster). Instead of a monolithic monthly event, the Google would refresh some of its index pretty much every day, which generated much smaller day-to-day changes that some people called everflux.

Over the years, Google’s indexing has been streamlined, to the point where most regular people don’t even notice the index updating. As a result, the terms “everflux,” “Google Dance,” and “index update” are hardly ever used anymore (or they’re used incorrectly :) ). Instead, most SEOs talk about algorithm updates or data updates/refreshes. Most data refreshes are index updates, although occasionally a data refresh will happen outside of the day-to-day index updates. For example, updated backlinks and PageRanks are made visible every 3-4 months.

Okay, here’s a pop quiz to see if you’ve been paying attention:
Q: True or false: an index update is a type of data refresh.

A: Of course an index update is a type of data refresh! Pay attention, I just said that 2-3 paragraphs ago. :) Don’t get hung up on “update” vs. “refresh” since they’re basically the same thing. There’s algorithms, and the data that the algorithms work on. A large part of changing data is our index being updated.

I know for a fact that there haven’t been any major algorithm updates to our scoring in the last few days, and I believe the only data refreshes have been normal (index updates). So what are the people on WMW talking about? Here’s my best MEGO guess. Go re-watch this video. Listen to the part about “data refreshes on June 27th, July 27th, and August 17th 2006.” Somewhere on the web (can’t remember where, and it’s Christmas weekend and after midnight, so I’m not super-motivated to hunt down where I said it) in the last few months, I said to expect those (roughly monthly) updates to become more of a daily thing. That data refresh became more frequent (roughly daily instead of every 3-4 weeks or so) well over a month ago. My best guess is that any changes people are seeing are because that particular data is being refreshed more frequently.


Change In The DNA 8: Search Engine Size Wars & Google's Supplemental Results

A longer, more detailed version of this article is
available to Search Engine Watch members.
Click here to learn more about becoming a member
Ah, summer. Time to play on the beach, head out on vacation and if you're a search engine, announce to the world that you've got the largest index.

Around this time last year, AllTheWeb kicked off a round of "who's biggest" by claiming the largest index size. Now it's happened again, when AllTheWeb said last month that its index had increased to 3.2 billion documents, toppling the leader, Google.

Google took only days to respond, quietly but deliberately notching up the number of web pages listed on its home page that it claims to index. Like the McDonald's signs of old that were gradually increased to show how many customers had been served, Google went from 3.1 billion to 3.3 billion web pages indexed.

Actually, not yawn. Instead, I'm filled with Andrew Goodman-style rage (and that's a compliment to Andrew) that the search engine size wars may erupt once again. In terms of documents indexed, Google and AllTheWeb are now essentially tied for biggest -- and hey, so is Inktomi. So what? Knowing this still gives you no idea which is actually better in terms of relevancy.

Size figures have long been used as a surrogate for the missing relevancy figures that the search engine industry as a whole has failed to provide. Size figures are also a bad surrogate, because more pages in no way guarantees better results.

How Big Is Your Haystack?

There's a haystack analogy I often use to explain this, the idea that size doesn't equal relevancy. If you want to find a needle in the haystack, then you need to search through the entire haystack, right? And if the web is a haystack, then a search engine that looks through only part of it may miss the portion with the needle!
That sounds convincing, the reality is more like this. The web is a haystack, and even if a search engine has every straw, you'll never find the needle if the haystack is dumped over your head. That's what happens when the focus is solely on size, with relevancy ranking a secondary concern. A search engine with good relevancy is like a person equipped with a powerful magnet -- you'll find the needle without digging through the entire haystack because it will be pulled to the surface.

Google's Supplemental Index

I especially hate when the periodic size wars erupt because examining the latest claims takes time away from other more important things to write about. In fact, it was a great relief to have my associate editor Chris Sherman cover this story initially in SearchDay last week (Google to Overture: Mine's Bigger). But I'm returning to it because of a twist in the current game: Google's new "supplemental results."

What are supplemental results? At the same time Google posted new size figures, it also unveiled a new, separate index of pages that it will query if it fails to find good matches within its main web index. For obscure or unusual queries, you may see some results appear from this index. They'll be flagged as "Supplemental Result" next to the URL and date that Google shows for the listing.

Google's How To Interpret Your Search Results page illustrates this, but how about some real-life examples you can try? Here are some provided by Google to show when supplemental results might kick in:
  • "St. Andrews United Methodist Church" Homewood, IL
  • "nalanda residential junior college" alumni
  • "illegal access error" jdk 1.2b4
  • supercilious supernovas

Two Web Page Indexes Not Better Than One

Using a supplemental index may be new for Google, but it's old to the search engine industry. Inktomi did the same thing in the past, rolling out what became known as the small "Best Of The Web" and larger "Rest Of The Web" indexes in June 2000.

It was a terrible, terrible system. Horrible. As a search expert, you never seemed to know which of Inktomi's partners was hitting all of its information or only the popular Best Of The Web index. As for consumers, well, forget it -- they had no clue.

It also doesn't sound reassuring to say, "we'll check the good stuff first, then the other stuff only if we need to." What if some good stuff for whatever reason is in the second index? That's a fear some searchers had in the past -- and it will remain with Google's revival of this system.

Why not simply expand the existing Google index, rather than go to a two tier approach?

"The supplemental is simply a new Google experiment. As you know we're always trying new and different ways to provide high quality search results," said Google spokesperson Nate Tyler.

OK, it's new, it's experimental -- but Google also says there are currently no plans to eventually integrate it into the main index.

Deconstructing The Size Hot Dog

Much as I hate to, yeah, let's talk about what's in the numbers that are quoted. The figures you hear are self-reported, unaudited and don't come with a list of ingredients about what's inside them. Consider the hot dog metaphor. It looks like it's full of meat, but if you analyze it, it could be there's a lot of water and filler making it appear plump.

Let's deconstruct Google's figure, since it has the biggest self-reported number, at the moment. The Google home page now reports "searching 3,307,998,701 web pages." What's inside that hot dog?

First, "web pages" actually includes some things that aren't web pages, such as Word documents, PDF files and even text documents. It would be more accurate to say "3.3 billion documents indexed" or "3.3 billion text documents indexed," because that's what we're really talking about.

Next, not all of those 3.3 billion documents have actually been indexed. There are some documents that Google has never actually indexed. It may list these in search results based on links it has seen to the documents. The links give Google some very rough idea of what a page may be about.

For example, try a search for pontneddfechan, a little village in South Wales where my mother-in-law lives. You should see in the top results a listing simply titled "" That's a partially indexed page, as Google calls it. It would be fairer to say it's an unindexed page, since in reality, it hasn't actually been indexed.

What chunk of the 3.3 billion has really been indexed? Google's checking on that for me. They don't always provide an answer to this particular question, however. Last time I got a figure was in June 2002. Then, 75 percent of the 2 billion pages Google listed as "searching" on its home page had actually been indexed. If that percentage holds true today, then the number of documents Google actually has indexed might be closer to 2.5 billion, rather than the 3.3 billion claimed.

But wait! The supplemental index has yet to be counted. Sorry, we can't count it, as Google isn't saying how big it is. Certainly it adds to Google's overall figure, but how much is a mystery.

Let's mix in some more complications. For HTML documents, Google only indexes the first 101K that it reads. Given this, some long documents may not be totally indexed -- so do they count as "whole" documents in the overall figure? FYI, Google says only a small minority of documents are over this size.

Auditing Sizes

OK, we've raised a lot of questions about what's in Google's size figure. There are even more we could ask -- and the same questions should be directed at the other search engines, as well. AllTheWeb's 3.2 billion figure may include some pages only known by seeing links and might include some duplicates, for example. But instead of asking questions, why not just test or audit the figures ourselves?
That's exactly what Greg Notess of Search Engine Showdown is especially known for. You can expect Greg will probably take a swing at these figures in the near future -- and we'll certainly report on his findings. The last test was done in December. His test involves searching for single word queries, then examining each result that appears -- a time-consuming task. But it's a necessary one, since the counts from search engines have often not been trustworthy.

Grow, But Be Relevant, Too

I'm certainly not against index sizes growing. I do find self-reported figures to also be useful, at least as a means of figuring out who is approximately near each other. Maybe Google is slightly larger than AllTheWeb or maybe AllTheWeb just squeaks past Google -- the more important point is that both are without a doubt well above a small service like Gigablast, which has only 200 million pages indexed.

However, that's not to say that a little service like Gigablast isn't relevant. It may very well be, for certain queries. Indeed, Google gained many converts back when it launched with a much smaller index than the established major players. It was Google's greater relevancy -- the ability to find the needle in the haystack, rather than bury you in straw -- that was the important factor. And so if the latest size wars should continue, look beyond the numbers listed at the bottom of the various search engine home pages consider instead the key question. Is the search engine finding what you want?

By the way, the baby of the current major search engine line-up Teoma did some growing up last month. The service moved from 500 million to 1.5 billion documents indexed.

Paul Gardi, vice president of search for Ask Jeeves, which owns Teoma, wants to grow even more. He adds that Teoma is focused mainly on English language content at the moment -- so the perceived smaller size of Teoma may not be an issue for English speakers. Subtract non-English language pages from Teoma's competitors, and the size differences may be much less.

"Comparatively speaking, I would argue that we are very close to Google's size in English," Gardi said.



  1. Did you know you can shorten your long links with Shortest and make dollars for every click on your short links.

  2. Get daily suggestions and guides for generating $1,000s per day FROM HOME totally FREE.

  3. Slim-Fizz is a distinctive appetite suppressant which contains the ground breaking fibre Glucomannan, which is an organic dissolvable fibre derived from high quality fresh Konjac.

  4. You could be qualified for a free $1,000 Amazon Gift Card.

  5. Have you ever consider maximizing your free BTC claims with a BTC FAUCET ROTATOR?


Related Posts Plugin for WordPress, Blogger...