Warning: session_start() [function.session-start]: Cannot send session cookie - headers already sent by (output started at /how_google_works.php:2) in /incs/functions.php on line 28

Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent (output started at /how_google_works.php:2) in /incs/functions.php on line 28

Warning: Cannot modify header information - headers already sent by (output started at /how_google_works.php:2) in /header.php on line 142

Warning: Cannot modify header information - headers already sent by (output started at /how_google_works.php:2) in /header.php on line 143

Warning: Cannot modify header information - headers already sent by (output started at /how_google_works.php:2) in /header.php on line 144

Warning: Cannot modify header information - headers already sent by (output started at /how_google_works.php:2) in /header.php on line 145

Warning: Cannot modify header information - headers already sent by (output started at /how_google_works.php:2) in /header.php on line 146
#1 Pixel Advertising Site on the Internet - MillionPixelsAdvert.com
www.millionpixelsadvert.com
Buy pixels and gain visitors to your site --- or advertise using banner ads --- and don't forget to get listed on The Business Topsites List --- and Link Directory
If you are looking for a good online advertising solution for your company you came to the right place. Pixel advertising is one of the best
internet marketing strategies that will bring many new visitors to your site. Advertise your site with pixels using millionpixelsadvert.com
Always the real thing, always millionpixelsadvert.com

How Google Works?

by www.millionpixelsadvert.com


I. What is Google?


     Internet is no longer reserved for scientists. Nowadays millions of users add content to billions of web pages every day. It has became fast growing medium with over 1,173,109,925 users. There is more than 100 million registered domain names and about 8 billion pages. The information became the most valuable possession. But how, among so many different pages find what you are interested in? Without search engines it would be extremely hard task. Fortunately search engines, such as Google, help users to successfully and easily find what they are looking for.
     Search engines are tools that, thanks to built-in mechanisms and used algorithms crawl through internet, collect data, classify and catalog the information and present it in a logical and user-friendly way. Thanks to their job, the number of 8 billion pages is minimized to only those, that are related to the search term entered by the internet user.
     Search engines can be divided to author-controlled, editor-controlled and user-controlled. Google and AltaVista belong to the first group as they create their rankings based on the keywords found on indexed websites. Second group represents Yahoo! and LookSmart, which add sites to catalogs creating a tree structure. User-Controlled group can be represented by Direct Hit. This last group values each page depending on amount of visitors to the page.
     Each website owner wants his/her site to be viewed and visited by many visitors. To achieve that the site has to be visible in search engine results for keywords related to this website. But the position on which it appears is very important. It is not enough just to “be” in search engines, you have to “be high in results”. The best position is among first 3 results for a keyword. Why is it so important? Most users find the first search engine results page (SERP) as the most valuable and important to their query. They will look through this page to find the most suitable site for them to visit.
     Internet users rarely watch the second and third SERP as they think they are less related and less important. It shows how seriously users treat search engines and how much they trust their algorithms. Most people recognize search engines to be reliable, trustworthy and impartial as a source of information about internet websites. That’s why no site owner should forget about search engines as it is great source of possible visitors and potential clients to our website.
     The most known and recognized search engine among internet users (and not only) is Google. It is ascetic search engine that revolutionized the internet. It was founded in 1998 by Sergey Brin and Lawrence Page, two very talented computer scientists who introduced the prototype of Google search engine at Stanford University.
     Most internet users have no idea what complex technology stands behind Google. This search engine is able to search its indexes around 1000 per second returning results for different search terms. Google is based on a network of thousand of low-cost computers and can therefore carry out fats parallel processing. Some calculations show that if combined together, Google would posses the biggest computational center in the world!
     Google physical structure consist of clusters of computers situated all around the world known as server farms. They run Linux based systems with GFS, Google File System.
Some of the key features about Google:
· About 4 billion indexed pages, each average 10 kB (4 Tb overall data)
· 6 Tb of hard drive space
· 104 languages, including “Klingon”, used by characters from StarTrek
· Around 30 clusters
· 80,000 machines
· 300 Ghz of processing power
· Up to 2000 computers in one cluster
· 160 CPUs
· One cluster is equal to 1 Pb of data, which is million gigabytes
· No breakdown since year 2000 when main switch broke down
· 160 Gb of RAM drive
· 800 employed computer scientists, among which 200 have PhDs

     Google now is not only search engine. The company have many different projects. Such as:
· Google Desktop - search application, that indexes e-mails, documents, music, photos, chats, Web history and other files. It allows the installation of Google Gadgets.
· Google Earth - virtual globe that uses satellite imagery, aerial photography and GIS over a 3D globe.
· Picasa - photo organization and editing application, providing photo library options and simple effects.
· Google AdSense - Advertisement program for Website owners. Adverts generate revenue on either a per-click or per-thousand-ads-displayed basis, and are adverts shown are from AdWords users, depending on which adverts are relevant.
· AdWords - Google's flagship advertising product, and main source of revenue. AdWords offers pay-per-click (PPC) advertising, and site-targeted advertising for both text and banner ads.
· Blogger - Weblog publishing tool. Users can create a custom, hosted blogs with features such as photo publishing, comments, group blogs, blogger profiles and mobile-based posting with little technical knowledge.
· YouTube - popular free video sharing Web site which lets users upload, view, and share video clips. In October 2006, Google, Inc., announced that it had reached a deal to acquire the company for $1.65 billion USD in Google's stock. The deal closed on 13 November 2006.
· Google maps - mapping service that indexes streets and satellite imagery, providing driving directions and local business search.
· Google Analytics - traffic statistics generator for defined websites, with strong AdWords integration. Webmasters can optimize their ad campaigns, based on the statistics that are given. Analytics is based on the Urchin software and the new version released in May 2007 integrates improvements based on Measure Map.



II. Google File System.

     Google File System (GFS) is a file system developed by Sanjay Ghemawat, Shun-Tak Leung and Urs Holzle for Google to handle rapid growth of the company’s infrastructure and to suit their needs. The company was looking for a way to improve the stability and reliability of their services and the only reasonable way to do that was to build their own file management system as no other system was able to handle massive number of requests over huge amount of servers.
     The GFS supports gigantic user population combined with very cheap computer equipment that tends to break down quite often. From the beginning of Google, the company used cheap computers running Linux to maintain the need for storage space for information gathered by web crawlers (and other services, like GMail, etc.). It caused problems with reliability of service and low efficiency. They had to create reliable software to run over unreliable hardware. It was a total change to the economy of the IT companies.
     The system was designed to be fully scaled, flexible and be easy to expand. It also doesn’t remove or override data, but rather append it. It is easier and faster way of storing data to add new files instead of updating and deleting old ones. Google system is also planned to resist numerous hardware failures and human error factors. That’s why each file in the system has three copies (default, but for high demand files even more) on separate servers to ensure that if one server goes down, there are two back up servers ready to take the request and fulfill the search process. And all of that is performed in milliseconds.
     Files managed by GFS range from hundred of megabytes to several gigabytes as it is better to operate on couple of huge files rather than handle with millions of small sized files. So, to manage efficiently GFS stores data in chunks of about 64 megabytes each – chunks are similar to clusters or sectors in regular file system – the smallest part of data that the system supports.
For example, to store 128 megabytes of data GFS will use 2 chunks. But there are files smaller than 64 megabytes, like 20 megabytes, what to do with them? Fortunately there are so few files small as that, that Google doesn’t bother with those files. Common files consume multiply chunks.
     GFS consists of master server and chunk servers. Master server contains metadata, like file names, file location on server and their sizes. When there is a request to certain file, the master server gives directions to the file in proper chunk server and the file is accessed.

III. Google PageRank™ system.

     The source of Google success is page ranking system called after Larry Page, Google PageRank™. It is a mathematical algorithm calculating the importance of a website according to many onsite and offsite factors.
     Basically, a website that has more inbound links will be higher in search engine results page (SERP) than the same website with less sites linking to it. It is an idea taken from the science world, where the importance of publication is measured through the number of quoting in periodics. Placing a link to some website Google treats as some kind of vote for this site and gives it some significance. But weight and quality of a website also counts. It is known that a government website is more important than a personal home webpage. It is like that because government have power and right to publicize official announcements, changes to law, new tax systems and so on. Our home page can only refer and link to the above information’s.
     That’s why a link from a site with bigger credibility and higher meaning weights more than couple or even some tens of links from an unknown pages with little or no importance. Logically it appears to use this system to prioritize sites on internet.
     Google sorts websites according to 10 scale system, where 0 is site with no importance and 10 is a well known, credible website with many visitors and high traffic (very few websites have PageRank™ of 10).

     Google, by adding some new features to its algorithms gives higher importance to websites that are loner on the internet. In natural way these sites should possess some number of inbound links from other websites and some constant number of visitors. That’s why new websites get higher link weights than old sites to close the gap between them. New sites have no or very few inbound links and Google tries to make it equal to everyone by doing that. New sites can much easier get high listings in search engine results and high PageRank™ in the early phases of their existence. With time they usually lose their high positions and only then it is possible to determine what weight does the website really have.
     It is not a good idea to buy a lot of links from a high PageRank™ websites to newly created web pages. Google assumes that such pages don’t have reach and developed content like sites that exist longer. It is common mistake while web optimization process. If link from a newly published website from a new web domain is added to numerous web directories and websites with high PageRank™ it is possible that such site will land in sandbox. Do not mistake sandbox with ban – a removal from indexes. Sandbox means lowered website PageRank™ and lower position in SERP, but is not equal to removal from indexing. Such site might still appear in SERP, but will lose its high positions for some or all keywords. The site will continuously be visited by Google web crawlers so don’t worry. After a while (from several weeks to several months) it will leave sandbox and be restored on full rights to search engine indexes.
     However the trademark rights to PageRank™ system has Google Inc., the patent rights belong to Stanford University, where Larry Page and Sergey Brin, the founders of Google studied.

     Google itself describes PageRank™ as:
“PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." Using these and other factors, Google provides its views on pages' relative importance.”

IV. Google web crawlers.

     There are different types of search engines that use different methods of searching the Internet resources.
     Robots read only links located on already found websites and based on that they create tree-like link hierarchy. Spiders read the whole content of a website, the title, links, the document text and information inside meta tags. There are also engines like Metacrawlers, which include Meta Search and Smarter Meta Search engines. The first one sends the search term entered by the user to indexes of different search engines simultaneously and returns requested information’s based on found search results. So they don’t have their own index servers and they do not search the internet in an active way. They use already existing indexes of other search engines. Smarter Meta Search engines use the same method with a difference. They use linguistic and collective analysis to determine even more accurate search results than Meta Search engines.
     Search engines do not index everything that is available on the internet though. They do not index:
· Multimedia files - mp3, mpeg, avi, jpg, gif, png
· Documents to which a password is required – e.g. mailboxes or intranet, an inside Internet of companies
· Sites that have been excluded from indexing by its author using robots.txt file or metatag robots
     All of the websites that exist but are excluded from search engine indexes for different reasons create so called “invisible network” or “invisible internet”. This invisible internet is three times bigger than all sites creating the visible internet. It is like that because companies or government institutions in an obvious way do not want to share with their data. That’s why they exclude their content from indexes or hide it inside intranet.

     Google calls its spider “Googlebot”. It divides into Freshbot and Deepbot. The task of Freshbot is to find new, fresh content on a website and that’s why it visits the same sites even couple times a day. Deepbot on the other hand is responsible for deep website crawling. Its main purpose is to create a full view, a picture of your website content, navigation system and all the inbound and outbound links. If you see a search engine results change it means the Deepbot was active.
     In Google the search process is divided into three parts. First the Googlebot crawls the internet in search for changes to existing websites and for new websites. It works like standard web browser. It sends request to a server for specific website and then saves it at sends it to index servers. It can request thousands of different websites simultaneously, but it deliberately makes it slower to avoid crashing web servers or overcrowding the real human requests to the same server. Googlebot finds sites in two ways: by adding a website through an add URL form (www.google.com/addurl.html) or by finding links by crawling the internet.
     When Google fetches a website it collects all the links on this site and adds it to “visit soon” URL list. That way in short time it can visit wide area of internet and make the search process faster. But it also causes problems. Google have to examine the “visit soon” URL list to check if there are duplicates of URL addresses and if so to delete the duplicate to prevent from visiting the same site too often.
     To keep the indexes up to date Google re-crawls sites on regular basis. For a newspaper site or a highly visited portal it can be daily, for a stock quotes much more frequently and for other pages once a month or several times a month.

     Googlebot sends then the whole text it finds on sites to Google indexer, an indexing database servers. They store the text sorted alphabetically by term - a keyword or phrase. To each term a list of documents in which it appears and where on site is attached so it’s easy to find the location of correct document for certain user query. To eliminate unimportant words, to improve the search process and to make it faster Google indexer doesn’t take into consideration most frequent words in each language. These words, called “stop words” (such as is, on, or, the, at, in, how, why) don’t make the search process any more precise, so they can be ignored.
     When a user enter a search term to Google, it sends the request to indexing servers to find out if the term exists in the database. At this point it is important to realize that because of the amount of data Google holds it would be to difficult to store all information’s in one indexing server. That’s why Google uses many separate servers where each holds some part of all data. The query is send therefore to different servers simultaneously and if the term exist in the database Google generates 1000 most relevant results based on more than 100 factors (such as PageRank™, metatags, age of a website, traffic on website and many more).
     At the same time to each document a special number is attached. The Document-ID is then send to file servers where a title and description of a website is added based on its metatags. If there are no metatags the title and description are generated automatically based on sites content. Also in this case there are many file servers working simultaneously.
     The last stage is adding advertisements to the search results taken from the ad-servers adequate to the search term. The ad-servers keep information’s about advertisers, campaigns and they determine which advert should be published on search results page. Those ad-servers are the main income of Google company. They bring 98% of overall revenue of this search engine.
     All the information’s are put together and displayed in a user web browser as dynamically generated website. And all of that is done is seconds.


References

I think now you get the idea how big Google really is. If you still feel doubts visit this page:
http://en.wikipedia.org/wiki/List_of_Google_products
It’s a comprehensive list of all Google projects with short descriptions. I’m sure you will find many interesting things there.

Very good and more technical articles about Google File System can be found here:
http://storagemojo.com/?page_id=152
http://www.baselinemag.com/article2/0,1397,1985047,00.asp
http://labs.google.com/papers/gfs-sosp2003.pdf (PDF file)

If you want to read more about how Google PageRank™ algorithm works visit http://en.wikipedia.org/wiki/PageRank where waits for you great article about technology that PageRank™ relies and was built on.
Full Google explanation of its ranking system can be viewed under this address:
http://www.google.com/technology/

For more information’s about how Google works take a look on these websites:
http://www.googleguide.com/google_works.html
http://www.portfolio.com/interactive-features/2007/08/google
http://www.portfolio.com/culture-lifestyle/goods/gadgets/2007/08/13/How-Google-Works
http://en.wikipedia.org/wiki/Index_%28search_engine%29

And for last, but not least: What Google thinks about SEO and what they suggest?
http://www.google.com/support/webmasters/bin/answer.py?answer=35291

Powered by Million Pixel Script 3 © texmedia.de

Legal notice | Privacy Policy | Terms & Conditions | Sitemap | Up

Locations of visitors to this page

.::Get noticed and advertise your business using banner ads. Site sponsors ::.


Pixel WatchThis is my Google PageRank™ - SmE Rank free service Powered by Scriptme Firefox 2

TemplatePixels.com Top Sites
Pixel Ad List Topsites The Business Topsites List at www.millionpixelsadvert.com SiteFight