Learn web development

Client-Server overview

既然你已经了解了服务器端编程的目的和潜在的好处,接下来我们将非常细致地去说明当服务器接收到了来自浏览器的“动态请求”时到底发生了什么。因为大多数的服务器端代码通过相同的方式来处理请求并回复,这将帮助你理解当编写你自己的大量代码时你需要做什么。

预先要求: 基本电脑素养、对于什么是网络服务器的基本了解
目标: 理解在动态网站中的客户端-服务器端交互过程,尤其是服务器端代码需要承担的工作

到目前为止的讨论中还没有真正的代码,因为我们还没有选择一个web框架来写我们的代码呢!然而这个讨论与我们的目的还是非常相关的,因为我们描述的行为必须通过你的服务器端代码来实现,不管你选择什么编程语言和web框架。

网络服务器和HTTP(入门)

网络浏览器通过超文本标记语言传输协议(HTTP)与网络服务器(web servers)。 当你在网页上点击一个链接、提交一个表单、或者进行一次搜索的时候,浏览器发送一个HTTP请求给服务器。

这个请求包含:

  • 一个用来识别目标服务或者资源(比如一个HTML文档、存储在服务器上的一个特定的数据、或者一个用来运行的工具等)的URL。 
  • 一个定义了请求行为的方法(比如,获得一个文档或者上传某些数据)。不同的方法/动作以及与他们相关的行为罗列如下:
    • GET:获取一份特定资源(比如一个包含了一个产品或者一系列产品相关信息的HTML文档)。
    • POST:创建一份新的资源(比如给wiki增加一片新的文章、给数据库增加一个新的节点)。
    • HEAD:  获取有关特定资源的元数据信息,而不会得到像GET那样获取资源的内容部分。例如,您可以使用HEAD请求来查找上次更新资源的时间,然后仅使用(更“昂贵”)GET请求下载资源(如果已更改)。
    • PUT:更新一份已经存在的资源(或者在不存在的情况下创建一份新的)。
    • DELETE:删除特定资源。
    • TRACEOPTIONSCONNECTPATCH等动作是为一些不常见任务设计的,因此我们在这里的讲解不会涉及到他们。
  • 额外的信息可以和请求一起被解码(比如HTML格式的数据)。信息可以被解码成如下:
    • URL参数:GET请求通过在URL末尾增加的键值对,来解码包含在发送给服务器的URL中的数据——比如,http://mysite.com?name=Fred&age=11,你经常会用到问号(?)来将URL剩余的部分和URL参数分隔开来,一个赋值符号(=)将名称和与之相关的值分隔开来,然后一个“&”符号分割不同的键值对。当他们被用户改变然后提交时,URL参数具有与生俱来地“不安全性”。因此,一个URL参数或者GET请求是不会用来在服务器上更新数据库的。
    • POST数据:POST请求会增加新的资源,支持这个的数据在请求主体内部被解码。
    • 客户端cookie:cookies包含与客户相关的会话数据,服务器可以用这些数据来判断用户的登录状态以及用户是否有访问资源的权限。

网络服务器等待来自客户的请求信息,当信息到达时处理它们,然后用HTTP回应消息来答复网络浏览器。回应包含一个HTTP相应状态码(HTTP Response status code)indicating whether or not the request succeeded (e.g. "200 OK" for success, "404 Not Found" if the resource cannot be found, "403 Forbidden" if the user isn't authorised to see the resource, etc). The body of a successful response to a GET request would contain the requested resource.

当一个HTML页面被返时,页面会被网络浏览器渲染出来。作为处理工作的一部分,浏览器会发现指向其他资源的链接(比如,一个HTML页面通常会参考Javascript和CSS页面),并且会发送独立的HTTP请求来下载这些文件。

静态网站和动态网站(在接下来的部分讨论到的)正是使用同一种通信协议/模式

GET请求/响应模型

你可以通过点击一个链接或者在网站进行一次搜索(比如搜索引擎的首页)做出一次简单的GET请求。比如,当你在MDN上进行一次对“客户端概览”词条的搜索时,HTTP请求就被发送出去了,你将会看到正如下面一样被展示出来的文本信息(展示出来的信息不一定是相同的,因为其中一部分信息还取决于你的浏览器)。

HTTP消息的格式是被“网络标准”(RFC7230)定义的。你不需要知道这个标准的细节,但是现在你至少得知道所有这些是来自哪儿的!

请求

每一行请求都包含着相关信息。第一部分被称为header,并且包含着关于这个请求的有用信息,同样地一个HTML head包含着关于HTML文档的有用信息(但是却没有内容自身,内容在主体里面)。

GET https://developer.mozilla.org/en-
US/search?q=client+server+overview&topic=apps&topic=html&topic=css&topic=js&topic=api&topic=webdev HTTP/1.1
Host: developer.mozilla.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://developer.mozilla.org/en-US/
Accept-Encoding: gzip, deflate, sdch, br
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7
Accept-Language: en-US,en;q=0.8,es;q=0.6
Cookie: sessionid=6ynxs23n521lu21b1t136rhbv7ezngie; csrftoken=zIPUJsAZv6pcgCBJSCj1zU6pQZbfMUAT; dwf_section_edit=False; dwf_sg_task_completion=False; _gat=1; _ga=GA1.2.1688886003.1471911953; ffo=true

第一行和第二行包含了我们在上面讨论过的最重要的信息

  • 请求类型(GET)。
  • 目标资源的URL(/en-US/search)。
  • URL参数(q=client%2Bserver%2Boverview&topic=apps&topic=html&topic=css&topic=js&topic=api&topic=webdev)。
  • 目标网站(developer.mozilla.org)。
  • 第一行的末尾也包含了一个简短的包含了标识协议版本的字符串(HTTP/1.1)。

最后一行包括一些关于客户端cookies的信息——你可以看到在这种情况下cookies包含一个为处理远程会话准备的ID(Cookie: sessionid=6ynxs23n521lu21b1t136rhbv7ezngie; ...)。

剩余几行包含着所使用的浏览器以及浏览器所能处理的回应类型等信息。比如,你可以在下面看到这些相关信息:

  • 我的浏览器上(User-Agent)是火狐(Mozilla/5.0).
  • 它可以接收gzip压缩信息(Accept-Encoding: gzip).
  • 它可以接收的具体编码类型(Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7)和语言(Accept-Language: de,en;q=0.7,en-us;q=0.3).
  • The Referer line提示这个网页的地址可能包含指向这个资源的链接(i.e. the origin of the request, https://developer.mozilla.org/en-US/).

请求也可以有一个主体,不过在这个例子中请求的主体是空的。

回应

针对这个请求的回应的第一部分内容展示如下。The header包含了如下信息:

  • 第一行包括了回应状态码200 OK,这告诉我们请求是成功的。
  •  我们可以看到回应是文本/html格式的(Content-Type).
  • 我们也可以看到它使用的是UTF-8字符集(Content-Type: text/html; charset=utf-8).
  • The head也告诉我们它有多大(Content-Length: 41823).

在消息的末尾我们可以看到主体内容——包含了针对请求返回的真实的HTML。

HTTP/1.1 200 OK
Server: Apache
X-Backend-Server: developer1.webapp.scl3.mozilla.com
Vary: Accept,Cookie, Accept-Encoding
Content-Type: text/html; charset=utf-8
Date: Wed, 07 Sep 2016 00:11:31 GMT
Keep-Alive: timeout=5, max=999
Connection: Keep-Alive
X-Frame-Options: DENY
Allow: GET
X-Cache-Info: caching
Content-Length: 41823
<!DOCTYPE html>
<html lang="en-US" dir="ltr" class="redesign no-js"  data-ffo-opensanslight=false data-ffo-opensans=false >
<head prefix="og: http://ogp.me/ns#">
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=Edge">
  <script>(function(d) { d.className = d.className.replace(/\bno-js/, ''); })(document.documentElement);</script>
  ...

请求的剩余部分还包括一些关于回应的信息(比如回应应该什么时候生成?),有关服务器的信息,还有它期望浏览器如何处理这个包(比如,the X-Frame-Options: DENY line tells the browser not to allow this page to be embedded in an <iframe> in another site)。

POST 请求/回应模板

当你提交一个表单,并且希望表单所包含的信息存储到服务器的时候,你就生成了一次HTTP POST请求。

请求

下面的文本展示了当用户在网站上提交新的文件的时候,生成的一个HTTP请求求的格式和之前展示的GET请求是非常相似的,只是第一行标识这个请求为POST。

POST https://developer.mozilla.org/en-US/profiles/hamishwillee/edit HTTP/1.1
Host: developer.mozilla.org
Connection: keep-alive
Content-Length: 432
Pragma: no-cache
Cache-Control: no-cache
Origin: https://developer.mozilla.org
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://developer.mozilla.org/en-US/profiles/hamishwillee/edit
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8,es;q=0.6
Cookie: sessionid=6ynxs23n521lu21b1t136rhbv7ezngie; _gat=1; csrftoken=zIPUJsAZv6pcgCBJSCj1zU6pQZbfMUAT; dwf_section_edit=False; dwf_sg_task_completion=False; _ga=GA1.2.1688886003.1471911953; ffo=true
csrfmiddlewaretoken=zIPUJsAZv6pcgCBJSCj1zU6pQZbfMUAT&user-username=hamishwillee&user-fullname=Hamish+Willee&user-title=&user-organization=&user-location=Australia&user-locale=en-US&user-timezone=Australia%2FMelbourne&user-irc_nickname=&user-interests=&user-expertise=&user-twitter_url=&user-stackoverflow_url=&user-linkedin_url=&user-mozillians_url=&user-facebook_url=

最主要的不同在于URL不再包含任何参数。正如你所见,表单提交的信息被编码后放入消息主体中了。(比如:新的用户全名像这样处理的:&user-fullname=Hamish+Willee)

响应

请求的响应如下。状态码"302 FOUND"告知浏览器服务端已收到它提交的请求。它必须再发生第二个请求来获取相关的页面内容。 The information is otherwise similar to that for the response to a GET request.

HTTP/1.1 302 FOUND
Server: Apache
X-Backend-Server: developer3.webapp.scl3.mozilla.com
Vary: Cookie
Vary: Accept-Encoding
Content-Type: text/html; charset=utf-8
Date: Wed, 07 Sep 2016 00:38:13 GMT
Location: https://developer.mozilla.org/en-US/profiles/hamishwillee
Keep-Alive: timeout=5, max=1000
Connection: Keep-Alive
X-Frame-Options: DENY
X-Cache-Info: not cacheable; request wasn't a GET or HEAD
Content-Length: 0

Note: The HTTP responses and requests shown in these examples were captured using the Fiddler application, but you can get similar information using web sniffers (e.g. http://web-sniffer.net/) or using browser extensions like HttpFox. You can try this yourself. Use any of the linked tools, and then navigate through a site and edit profile information to see the different requests and responses. Most modern browsers also have tools that monitor network requests (for example, the Network Monitor tool in Firefox).

Static sites

A static site is one that returns the same hard coded content from the server whenever a particular resource is requested. So for example if you have a page about a product at /static/myproduct1.html , this same page will be returned to every user. If you add another similar product to your site you will need to add another page (e.g. myproduct2.html) and so on. This can start to get really inefficient — what happens when you get to thousands of product pages? You would repeat a lot of code across each page (the basic page template, structure, etc.), and if you wanted to change anything about the page structure — like add a new "related products" section for example — then you'd have to change every page individually. 

Note: Static sites are excellent when you have a small number of pages and you want to send the same content to every user. However they can have a significant cost to maintain as the number of pages becomes larger.

Let's recap on how this works, by looking again at the static site architecture diagram we looked at in the last article.

A simplified diagram of a static web server.

When a user wants to navigate to a page, the browser sends an HTTP GET request specifying the URL of its HTML page. The server retrieves the requested document from its file system and returns an HTTP response containing the document and an HTTP Response status code of "200 OK" (indicating success). The server might return a different status code, for example "404 Not Found" if the file is not present on the server, or "301 Moved Permanently" if the file exists but has been redirected to a different location.

The server for a static site will only ever need to process GET requests, because the server doesn't store any modifiable data. It also doesn't change its responses based on HTTP Request data (e.g. URL parameters or cookies). 

Understanding how static sites work is nevertheless useful when learning server-side programming, because dynamic sites handle requests for static files (CSS, JavaScript, static images, etc.) in exactly the same way.

Dynamic sites

A dynamic site is one that can generate and return content based on the specific request URL and data (rather than always returning the same hard-coded file for a particular URL). Using the example of a product site, the server would store product "data" in a database rather than individual HTML files. When receiving an HTTP GET Request for a product, the server determines the product ID, fetches the data from the database, and then constructs the HTML page for the response by inserting the data into an HTML template. This has major advantages over a static site:

Using a database allows the product information to be stored efficiently in an easily extensible, modifiable, and searchable way.

Using HTML templates makes it very easy to change the HTML structure, because this only needs to be done in one place, in a single template, and not across potentially thousands of static pages.

Anatomy of a dynamic request

This section provides a step-by-step overview of the "dynamic" HTTP request and response cycle, building on what we looked at in the last article with much more detail. In order to "keep things real" we'll use the context of a sports-team manager website where a coach can select their team name and team size in an HTML form and get back a suggested "best lineup" for their next game. 

The diagram below shows the main elements of the "team coach" website, along with numbered labels for the sequence of operations when the coach accesses their "best team" list. The parts of the site that make it dynamic are the Web Application (this is how we will refer to the server-side code that processes HTTP requests and returns HTTP responses), the Database, which contains information about players, teams, coaches and their relationships, and the HTML Templates.

This is a diagram of a simple web server with step numbers for each of step of the client-server interaction.

After the coach submits the form with the team name and number of players, the sequence of operations is:

  1. The web browser creates an HTTP GET request to the server using the base URL for the resource (/best) and encoding the team and player number either as URL parameters (e.g. /best?team=my_team_name&show=11) or as part of the URL pattern (e.g. /best/my_team_name/11/). A GET request is used because the request is only fetching data (not modifying data).
  2. The Web Server detects that the request is "dynamic" and forwards it to the Web Application for processing (the web server determines how to handle different URLs based on pattern matching rules defined in its configuration).
  3. The Web Application identifies that the intention of the request is to get the "best team list" based on the URL (/best/) and finds out the required team name and number of players from the URL. The Web Application then gets the required information from the database (using additional "internal" parameters to define which players are "best", and possibly also getting the identity of the logged in coach from a client-side cookie).
  4. The web application dynamically creates an HTML page by putting the data (from the database) into placeholders inside an HTML template.
  5. The Web Application returns the generated HTML to the web browser (via the Web Server), along with an HTTP status code of 200 ("success"). If anything prevents the HTML being returned then the Web Application will return another code — for example "404" to indicate that the team does not exist.
  6. The Web Browser will then start to process the returned HTML, sending separate requests to get any other CSS or JavaScript files that it references (see step 7).
  7. The Web Server loads static files from the file system and returns them to the browser directly (again, correct file handling is based on configuration rules and URL pattern matching).

An operation to update a record in the database would be handled similarly, except that like any database update, the HTTP request from the browser should be encoded as a POST request. 

Doing other work

A Web Application's job is to receive HTTP requests and return HTTP responses. While interacting with a database to get or update information are very common tasks, the code may do other things at the same time, or not interact with a database at all.

A good example of an additional task that a Web Application might perform would be sending an email to users to confirm their registration with the site. The site might also perform logging or other operations. 

Returning something other than HTML

Server-side website code does not have to return HTML snippets/files in the response. It can instead dynamically create and return other types of files (text, PDF, CSV, etc.) or even data (JSON, XML, etc.).

The idea of returning data to a web browser so that it can dynamically update its own content (AJAX) has been around for quite a while. More recently "Single-page apps" have become popular, where the whole website is written with a single HTML file that is dynamically updated when needed. Websites created using this style of application push a lot of computational cost from the server to the web browser, and can result in websites that appear to behave a lot more like native apps (highly responsive, etc.).

web框架简化服务器端的web编程

服务器端web框架使得编写解决我们上面描述的操作的代码变得简单得多。

web框架可以提供的一个最重要的功能就是,提供简单的机制以将不同的资源和页面定位到具体的处理函数。这使得保持代码和各个不同形式的资源的联系变得简单。它也非常利于代码的维护,因为你可以直接改变在一个地方用来传输特定功能的URL,而不用改变处理函数。

举个例子,我们来思考一下下面的Django(python)代码,这些代码将两个URL模式定位到两个视图函数。第一个模式确保了,一个包含了一个资源URL的HTTP请求,可以被传递到一个在views模块的被命名为index( )的函数。一个含有"/best/junior"的请求则会被传递到junior( )视图函数。

# file: best/urls.py
#
from django.conf.urls import url
from . import views
urlpatterns = [
    # example: /best/
    url(r'^$', views.index),
    # example: /best/junior/
    url(r'^junior/$', views.junior),
]

注意: 在url()函数中的第一个参数可能看起来有点古怪 (比如r'^junior/$)  因为他们使用一个叫做“正则表达式”(RegEx, or RE)的字符匹配机制。在这里,你还不需要知道正则表达式是如何工作的,除了要知道它们是如何允许我们在URL中匹配到字符的 (而不是像上面的硬编码) 并且知道如何在我们的视图函数中将它们用作参数。举个例子,一个真正简单的正则表达式可能会说“匹配一个大写字母,后面跟着4到7个小写字母”"

The web framework also makes it easy for a view function to fetmatch a single uppercase letter, followed by between 4 and 7 lower case letters."ch information from the database. The structure of our data is defined in models, which are Python classes that define the fields to be stored in the underlying database. If we have a model named Team with a field of "team_type" then we can use a simple query syntax to get back all teams that have a particular type.

The example below gets a list of all teams that have the exact (case sensitive) team_type of "junior" — note the format: field name (team_type) followed by double underscore, and then the type of match to use (in this case exact). There are many other types of matches and we can daisy chain them. We can also control the order and the number of results returned. 

#best/views.py
from django.shortcuts import render
from .models import Team
def junior(request):
    list_teams = Team.objects.filter(team_type__exact="junior")
    context = {'list': list_teams}
    return render(request, 'best/index.html', context)

After the junior() function gets the list of junior teams, it calls the render() function, passing the original HttpRequest, an HTTP template, and a "context" object defining the information to be included in the template. The  render() function is a convenience function that generates HTML using a context and an HTML template, and returns it in an HttpResponse object.

显然地web框架可以帮助你解决很多问题。我们在下一篇文章里将会大量讨论这些好处和一些流行的web框架。

总结

到这里你应该对于服务器端代码不得不进行的操作有一个整体上的理解,并且知道一个服务器端web框架是从那些方面让这些变得更简单的。

在接下来的模块里面我们会帮助你选择对于你的第一个网站来说最适合的web框架。

文档标签和贡献者