Getting indexed by Google can be a pain, but getting highly placed rankings for specific keywords seems to be the nut that not many web developers without seo (search engine optimization) experience can crack.
Today we’re going to give you an informative primer on the basics of search engine optimization techniques -- many of which we use everyday to optimize our websites and stay ahead of our competitors.
For example, let's talk about www.devedit.com -- DevEdit is our WYSIWYG HTML editing component that drops into browser-based applications.
The problem is, there are a LOT of WYSIWYG HTML editors, but how can we get DevEdit to appear in Google's top 10 rankings? Well, let's see. Trying to optimize for the keyword "HTML" alone would be a tough task, as it's too general. There are HTML editors, HTML tutorials, HTML articles, etc.
We need to be more specific, which means:
- Targeting a more suitable market that is looking for a content editing solution
- Competing with fewer websites targeting the same keywords
- Optimizing for keywords that People actually use when performing searches
For example, if you're optimizing for a web development site and you're located in Sydney, Australia, use keywords such as "web development Sydney" or "web development services Australia".
To find out how many websites are competing with your keywords -- either intentionally or not -- simply do a search on Google and note down how many results are returned. In our case, for "online html editor", we're competing with 9,080,000 sites. The more sites that are competing for your keywords, the harder it will be to get on the front page.
Alternatively, to get a rough indication of how many people are actually searching for the keywords you want to optimize your site for, use the Overture search suggestion tool. It's not exact, and doesn't measure Google searches, but it does give a very good estimate.
The Overture search suggestion tool will also provide you with a list of similar keywords, based on the keywords you enter. This can be a great way to find other keywords to optimize your site for.
As a rough guideline, try to optimize every page on your site for a different search phrase. Each search phrase should contain 2 to 3 highly targeted keywords.
http://www.web-development-sydney.com will generally get ranked higher than http://www.companyname.com, assuming that they had identical keywords and page content.
For some of us, keywords in the domain name look too unprofessional, and we've already registered our domain, so it’s too late to change. An alternative -- and also a useful tactic -- is to add your keywords into the names of your pages, such as
http://www.companyname.com/web-development-services.html
Your title tag is equally as important as your domain name. Using keywords in your title tag can improve your Google ranking significantly. Trying to achieve a balance of professionalism with keyword density in the title tag however is sometimes a little more difficult.
Going back to our example of a web development company earlier, a good title tag would be:
<title>"Company name provides professional affordable web development services in Sydney Australia."</title>
Usually, the closer to the front of your title tag the keywords are placed, the better.
The Google ranking algorithm dictates that if you're using a <h1> tag, then the text in between this tag must be more important than the content on the rest of the page. Here's a quick example:
<h1>Google sees this text as more important</h1>By default, H1 tags aren't the prettiest in terms of formatting, so using a CSS style to override the default look is usually a good idea:
<p>... than this text</p>
H1 { color: blue; font-family: Verdana; font-size: 16px }Sprinkling keywords throughout your page content can also improve your sites keyword density. Keyword density simply means the ratio of optimized keywords to the rest of the content on your page. It is usually expressed as a percentage, and should be between 7% and 10% for each page on your site.
Don't overdo the keyword density, however, but don't overlook it either. A good example would be:
Before:
- Company name provides web design and site management services to our clients.
- Company name provides web development services to the Sydney region in Australia
And this leads us to the toughest part of the Google seo process -- back-links. Back links are websites that link directly to your website. The general principal is the more back links you have, the higher your pages will be ranked, as your website must be good if so many other sites are linking back to it.
If you run a web development company, then adding a simple link to the bottom of each of your client's websites, such as:
<a href=http://www.yoursite.com>Web _fcksavedurl="http://www.yoursite.com>Web" development by Company Name</a>... (With your clients permission of course) can help boost your back links, which will help boost your ranking position in searches.
Submitting your site to dmoz.org, Yahoo! and other directories is also an important step to increase the number of sites linking back to yours. Do remember however, that setting up back links takes time. I would recommend emailing 5-10 websites each and every day to request back-links or partnership links (keeping in mind that the sites contacted should be relevant but not competitive) e.g. - If you sell chocolate, partnering with a company that sells Roses may just be a good idea. Within a couple of weeks, you should have a good 100 or so sites happily linking back to yours!
关键词:"url rewrite" mod_rewrite isapi rewrite path_info iis "search engine friendly"
内容摘要:不得不承认,将动态网页链接rewriting成静态链接是最保险和稳定的面向搜索引擎优化方式
此外随着互联网上的内容以惊人速度的增长也越来越突出了搜索引擎的重要性,如果网站想更好地被搜索引擎收录,网站设计除了面向用户友好(User Friendly)外,搜索引擎友好(Search Engine Friendly)的设计也是非常重要的。进入搜索引擎的页面内容越多,则被用户用不同的关键词找到的几率越大。在Google的算法调查一文中提到一个站点被Google索引页面的数量其实对PageRank也是有一定影响的。由于Google 突出的是整个网络中相对静态的部分(动态网页索引量比较小),链接地址相对固定的静态网页比较适合被Google索引(怪不得很多大网站的邮件列表归档和BLOG按日期归档的文档很容被搜的到),因此很多关于面向搜索引擎 URL设计优化(URI Pretty)的文章中提到了很多利用一定机制将动态网页参数变成像静态网页的形式:
比如可以将:
http://phpunixman.sourceforge.net/index.php?mode=man¶meter=ls
变成:
http://phpunixman.sourceforge.net/index.php/man/ls
实现方式主要有2种:
把URI地址用作参数传递:URL REWRITE
最简单的是基于各种WEB服务器中的URL重写转向(Rewrite)模块的URL转换:
这样几乎可以不修改程序的实现将 news.asp?id=234 这样的链接映射成 news/234.html,从外面看上去和静态链接一样。Apache服务器上有一个模块(非缺省):mod_rewrite:URL REWRITE功能之强大足够写上一本书。
当我需要将将news.asp?id=234的映射成news/234.html时,只需设置:
RewriteRule /news/(\d+)\.html /news\.asp\?id=$1 [N,I]
这样就把 /news/234.html 这样的请求映射成了 /news.asp?id=234
当有对/news/234.html的请求时:web服务器会把实际请求转发给/news.asp?id=234
而在IIS也有相应的REWRITE模块:比如ISAPI REWRITE和IIS REWRITE,语法都是基于正则表达式,因此配置几乎和apache的mod_rewrite是相同的:
比对于某一个简单应用可以是:
RewriteRule /news/(\d+)\.html /news/news\.php\?id=$1 [N,I]
这样就把 http://www.chedong.com/news/234.html 映射到了 http://www.chedong.com/news/news.php?id=234
一个更通用的能够将所有的动态页面进行参数映射的表达式是:
把 http://www.myhost.com/foo.php?a=A&b=B&c=C
表现成 http://www.myhost.com/foo.php/a/A/b/B/c/C。
RewriteRule (.*?\.php)(\?[^/]*)?/([^/]*)/([^/]*)(.+?)?$1(?2$2&:\?)$3=$4?5$5: [N,I]
以下是针对phpBB的一个Apache mod_rewrite配置样例:
RewriteEngine On
RewriteRule /forum/topic_(.+)\.html$ /forum/viewtopic.php?t=$1 [L]
RewriteRule /forum/forum_(.+)\.html$ /forum/viewforum.php?f=$1 [L]
RewriteRule /forum/user_(.+)\.html$ /forum/profile.php?mode=viewprofile&u=$1 [L]
这样设置后就可以通过topic_1234.html forum_2.html user_34.html这样的链接访问原来的动态页面了。
通过URL REWRITE还有一些好处:
mod_rewrite和isapirewrite基本兼容,但是还是有些不同,比如:isapirewrite中"?"需要转义成"\?",mod_rewrite不用,isapirewrite支持 "\d+" (全部数字),mod_rewrite不支持
- 隐藏后台实现:这在后台应用平台的迁移时非常有用:当从asp迁移到java平台时,对于前台用户来说,根本感受不到后台应用的变化;
- 简化数据校验:因为像(\d+)这样的参数,可以有效的控制数字的格式甚至位数;
比如我们需要将应用从news.asp?id=234迁移成news.php?query=234时,前台的表现可以一直保持为 news/234.html。从实现应用和前台表现的分离:保持了URL的稳定性,而使用mod_rewrite甚至可以把请求转发到其他后台服务器上。
基于PATH_INFO的URL美化
Url美化的另外一个方式就是基于PATH_INFO:
PATH_INFO是一个CGI 1.1的标准,经常发现很多跟在CGI后面的"/value_1/value_2"就是PATH_INFO参数:
比如:http://phpunixman.sourceforge.net/index.php/man/ls 中:$PATH_INFO = "/man/ls"
PATH_INFO是CGI标准,因此PHP Servlet等都有的支持。比如Servlet中就有request.getPathInfo()方法。
注意:/myapp/servlet/Hello/foo的 getPathInfo()返回的是/foo,而/myapp/dir/hello.jsp/foo的getPathInfo()将返回的 /hello.jsp,从这里你也可以知道jsp其实就是一个Servlet的PATH_INFO参数。ASP不支持PATH_INFO
PHP中基于PATH_INFO的参数解析的例子如下:
//注意:参数按"/"分割,第一个参数是空的:从/param1/param2中解析出$param1 $param2这2个参数
if ( isset($_SERVER["PATH_INFO"]) ) {
list($nothing, $param1, $param2) = explode('/', $_SERVER["PATH_INFO"]);
}
如何隐蔽应用:例如 .php,的扩展名:
在APACHE中这样配置:
<FilesMatch "^app_name$">
ForceType application/x-httpd-php
</FilesMatch>
如何更像静态页面:app_name/my/app.html
解析的PATH_INFO参数的时候,把最后一个参数的最后5个字符“.html”截断即可。
注意:APACHE2中缺省是不允许PATH_INFO的,需要设置 AcceptPathInfo on
特别是针对使用虚拟主机用户,无权安装和配置mod_rewrite的时候,PATH_INFO往往就成了唯一的选择。
OK,这样以后看见类似于http://www.example.com/article/234这样的网页你就知道可能是 article/show.php?id=234这个php程序生成的动态网页,很多站点表面看上去可能有很多静态目录,其实很有可能都是使用1,2个程序实现的内容发布。比如很多WIKIWIKI系统都使用了这个机制:整个系统就一个简单的wiki程序,而看上去的目录其实都是这个应用拿后面的地址作为参数的查询结果。
利用基于MOD_REWRITE/PATH_INFO + CACHE服务器的解决方案对原有的动态发布系统进行改造,也可以大大降低旧有系统升级到新的内容管理系统的成本。并且方便了搜索引擎收录入索引。
附:如何在IIS上利用PHP支持PATH_INFO
PHP的ISAPI模式安装备忘:只试成 php-4.2.3-Win32
解包目录
========
php-4.2.3-Win32.zip c:\php
PHP.INI初始化文件
=================
复制:c:\php\php.ini-dist 到 c:\winnt\php.ini
配置文件关联
============
按照install.txt中的说明配置文件关联
运行库文件
==========
复制 c:\php\php4ts.dll 到 c:\winnt\system32\php4ts.dll
这样运行后:会发现php把PATH_INFO映射到了物理路径上
Warning: Unknown(C:\CheDong\Downloads\ariadne\www\test.php\path): failed to create stream: No such file or directory in Unknown on line 0
Warning: Unknown(): Failed opening 'C:\CheDong\Downloads\ariadne\www\test.php\path' for inclusion (include_path='.;c:\php4\pear') in Unknown on line 0
安装ariadne的PATCH
==================
停止IIS服务
net stop iisadmin
ftp://ftp.muze.nl/pub/ariadne/win/iis/php-4.2.3/php4isapi.dll
覆盖原有的c:\php\sapi\php4isapi.dll
注:
ariadne是一个基于PATH_INFO的内容发布系统,
PHP 4.3.2 RC2中CGI模式的PATH_INFO已经修正,照常安装即可。
参考资料:
URL Rewrite文档:
ISAPI REWRITE文档
IIS的ISAPI REWRITE下载(免费)
http://httpd.apache.org/docs/mod/mod_rewrite.html
http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html
搜索引擎友好的URL设计
http://www.sitepoint.com/article/485
说不定这个URL原来就是articel.php?id=485
一个基于PATH_INFO的开源内容管理系统
http://typo3.com/
Google的PageRank算法说明:
http://pr.efactory.de/
原文出处:<a href="http://www.chedong.com/tech/google_url.html">http://www.chedong.com/tech/google_url.html</a>
IIS Rewrite
Product Summary
IISRewrite is a rule-based rewriting engine that allows a webmaster to manipulate URLs on the fly in IIS.
URLs are rewritten before IIS has handed over the request to be processed, so requests for HTML files, graphics, program files, and even entire directory structures can be rewritten before they are passed to ASP scripts for processing.
IISRewrite was written to solve some practical problems that are nearly impossible to solve with IIS and ASP. It solves the compatibility issues when doing dynamic downloads with ASP, it allows portions of dynamic sites to be indexed by search engines as if they were static HTML files, and can provide a way to customize web sites based on the client's browser type without the use of Javascript.
IISRewrite is a stripped down implementation of Apache's mod_rewrite modules for IIS. Webmasters who have used Apache's mod_rewrite in the past will find that much of the configuration and functionality is the same.
IISRewrite is compatible with Microsoft's ISAPI specification and has been tested on Windows NT Server 4.0 running IIS 4 and Windows 2000 Server running IIS 5.
IISRewrite was featured in the February 2002 edition of Microsoft's MSDN Magazine.
On today’s Internet, database driven or dynamic sites are very popular. Unfortunately the easiest way to pass information between your pages is with a query string. In case you don’t know what a query string is, it's a string of information tacked onto the end of a URL after a question mark.
robots.txt 文件对抓取网络的搜索引擎漫游器(称为漫游器)进行限制。这些漫游器是自动的,在其访问网页前会查看是否存在阻止其访问特定网页的 robots.txt 文件。
如何创建 robots.txt 文件?
可以在任何文本编辑器中创建此文件。该文件应为 ASCII 编码的文本文件,而非 HTML 文件。文件名应使用小写字母。
语法
最简单的 robots.txt 文件使用两条规则:
- User-Agent:适用下列规则的漫游器
- Disallow:要拦截的网页
这两行被视为文件中的一个条目。您可根据需要包含任意多个条目。您可在一个条目中包含多个 Disallow 行和多个 User-Agent。
应在 User-Agent 行中列出什么内容?
user-agent 是特定的搜索引擎漫游器。网络漫游器数据库列出了许多常用漫游器。您可设置应用于特定漫游器的条目(通过列示名称)或设置为应用于所有漫游器(通过列示星号)。应用于所有漫游器的条目应与下列条目类似:
User-Agent:*
Google 使用多种不同漫游器(用户代理)。用于网络搜索的漫游器是 Googlebot。Googlebot-Mobile 和 Googlebot-Image 等其他漫游器遵循您为 Googlebot 设置的规则,您还可为这些特定漫游器设置附加规则。
应在 Disallow 行中列出什么内容?
Disallow 行列出了您要拦截的网页。您可列出具体网址或网址模式。条目应以正斜杠开头 (/)。
- 要拦截整个网站,请使用正斜扛。
Disallow:/
- 要拦截目录及其中的所有内容,请在目录名后添加正斜扛。
Disallow:/private_directory/
- 要拦截网页,请列出该网页。
Disallow:/private_file.html
网址区分大小写。例如,Disallow: /private_file.html 将拦截 http://www.example.com/private_file.html,但允许 http://www.example.com/Private_File.html。
仅当您的网站包含不希望搜索引擎编入索引的内容时,才需要使用 robots.txt 文件。如果您希望搜索引擎将网站上的所有内容编入索引,则不需要 robots.txt 文件(甚至连空文件也不需要)。
示例:
------------------------------------------------------------------------------------------------------------------------------------------
#
# robots.txt for NetMao Movie
# Version 2.0.x
#
User-agent: *
Disallow: /admin/
Disallow: /inc/
Disallow: /html/
Disallow: /templates/
网站管理员支持中心 - 为什么我的网站没有针对特定的关键字展示?
我们不会用手动的方式将关键字分配到网站,网站管理员也不能提交能使其网站显示出来的首选关键字。
要确保您的网站针对您所选的关键字返回结果,最好的方法是将这些关键字包括在您的网页中。我们的抓取工具会分析索引中网页的内容,以确定与其相关性最强的搜索查询。如果您创建了一个信息丰富的网站,且清晰、准确地表述了您的主题,您的网站就很可能为所选的关键字返回结果。
请记住,您的网站可能针对所选的关键字返回结果,但可能不会在搜索结果的前几页中显示。要确定是不是这种情况,请将我们的网站操作符与关键字一起使用,来搜索您的网站,例如:[ site:Google.com doodle]。此搜索返回以下结果:http://www.google.com/search?hl=en&q=site%3AGoogle.com+doodle
通常,网站管理员可通过增加与其网页链接的高质量网站之数量来提高自己网站的排名。您可以通过以下网址了解 Google 如何对网页进行排名的详细信息:http://www.google.com/technology/index.html。另外,我们始终建议您查看我们的指南,了解如何创建和维护对 Google 友好的网站。