Learn web development

发送表单数据

通常,HTML 表单的作用的是将数据发送到服务器上。服务器会处理这些数据并将结果返回给用户。这个过程似乎很容易,但是必须时刻注意要确保这些数据不能够破坏服务器或者给用户带来麻烦。

数据到哪里去了?

关于 客户端/服务端 结构

web构建于一个非常基本的客户端/服务端 结构,可以总结为:一个客户端(通常是一个浏览器)使用HTTP协议向服务端发送了一个请求(大多数情况下服务端程序有 Apache,Nginx,IIS,Tomcat 等等),服务端使用相同的协议相应请求。

A basic schema of the Web client/server architecture

在客户端,一个HTML表单可以将用户的数据以一种友好的方式发送到服务端,这样用户可以将许多数据分成一个个片段发送到服务端。

在客户端:定义如何来发送数据

【<form></form>】这是一对HTML的表单元素标签,它定义了数据将以怎样的方式发送到服务器。当按下“发送”按钮的时候,它将产生一个到服务器请求。需要牢记它的action和method方法。

action方法

这个方法定义了数据将被发送到的位置。它的值必须是一个合法的URL地址,如果没有包含该属性,数据将被发送到当前页面的地址。

例子

在这个例子中,数据将被发送到 http://foo.com 这个地址:

<form action="http://foo.com">

在下面这个例子中,数据将被发送到当前的服务器上,但是到了一个不同的地址。

<form action="/somewhere_else">

当没有指定属性的时候,如下所示:当前数据将被发送到当前页面中

<form>

许多旧的编写方式如下所示,加上#来指示数据应该被发送到当前页面,这在HTML5出现之前都是必须要有的,但是现在不再需要了。

<form action="#">

注意:有时候action指定的URL地址会用到HTTPS协议,当使用HTTPS协议后,所请求的数据都将被加密,即使当前表单所在页面使用的是非HTTPS协议。或者,如果表单在一个HTTPS协议的页面上,但是指定的action地址是非HTTPS协议的,这样每次请求的时候,所有的浏览器都将显示一个安全警告,因为数据是没有加密的。

method方法

这个属性定义了数据是如何发送的。HTTP协议提供了一些处理请求的方法。HTML表单数据可以通过至少两种方式来发送:GET和POST方法

为了理解这两种方法的差别,我们回过头分析下HTTP是如何工作的。每次你想在互联网上取得一个资源,浏览器就会发送一次请求。一个HTTP请求包含了两部分:一个“请求头”:包含了浏览器所具有能力的集合;还有一个“body”:包含了希望服务器可以处理的各种数据信息。

GET方法

GET方法用来让浏览器向服务器请求并索取一个资源:就好比去体育专卖店买球拍:

例子:

我        对 售货员说:我想要购买 这个  红双喜的乒乓球拍

浏览器 对 服务端说:我希望GET  这个 资源

在这种情况下,浏览器会发送一个空的body,由于body是空的,如果发送表单也是采用这种方式的话,数据将会以增加的方式追加到URL地址后面。

思考下面的例子:

<form action="http://foo.com" method="get">
  <input name="say" value="Hi">
  <input name="to" value="Mom">
  <button>Send my greetings</button>
</form>

本次请求的结构如下所示:

GET /?say=Hi&to=Mom HTTP/1.1
Host: foo.com
POST请求

POST方法有些与众不同,

例子

考虑下面的表单(和上方 几乎 一样):

<form action="http://foo.com" method="post">
  <input name="say" value="Hi">
  <input name="to" value="Mom">
  <button>Send my greetings</button>
</form>

我们使用了POST方式来发送数据,本次HTTP请求结构如下:

POST / HTTP/1.1
Host: foo.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 13
say=Hi&to=Mom

Content-Length 头指定了body的大小,Content-Type头指定了发送到服务器资源的类型。我们会讨论下这些头:

当然,HTTP请求不会直接展示给用户(但是可以通过第三方工具来查看,比如 火狐浏览器的 Web Console 或者谷歌浏览器的 Chrome Developer Tools),唯一直接展示给用户的便是URL相应,所以使用GET请求,用户将会在URL中看到传输的数据,但是使用POST请求,我们并不能直接看到传输的数据。以下是两点原因:

  1. 如果你需要发送密码(或者其它敏感数据),请不要使用GET方法否则你会将密码展示到URL中。
  2. 如果你要发送大量数据,推荐使用POST方法,因为一些浏览器限制了URL的大小以及长度。

服务端:处理数据

无论你采用哪种HTTP方法,服务端都会将接收到的字符串解析成一对一对的键值对,以便获取数据。

Example: Raw PHP

PHP offers some global objects to access the data. Assuming you've used the POST method, the following example just takes the data and displays it to the user. Of course, what you do with the data is up to you. You might display it, store it into a database, send them by email, or process it in some other way.

<?php
  // The global $_POST variable allows you to access the data sent with the POST method
  // To access the data sent with the GET method, you can use $_GET
  $say = htmlspecialchars($_POST['say']);
  $to  = htmlspecialchars($_POST['to']);
  echo  $say, ' ', $to;

This example displays a page with the data we sent. In our example from before, the output would be:

Hi Mom

Example: Raw Python

This example uses Python to do the same thing--display the provided data on a web page. It uses the CGI Python package to access the form data.

#!/usr/bin/env python
import html
import cgi
import cgitb; cgitb.enable()     # for troubleshooting
print("Content-Type: text/html") # HTTP header to say HTML is following
print()                          # blank line, end of headers
form = cgi.FieldStorage()
say  = html.escape(form["say"].value);
to   = html.escape(form["to"].value);
print(say, " ", to)

The result is the same as with PHP:

Hi Mom

Other languages and frameworks

There are many other server-side technologies you can use for form handling, including Perl, Java, .Net, Ruby, etc. Just pick the one you like best. That said, it's worth noting that it's very uncommon to use these technologies directly because this can be tricky. It's more common to use one of the many nice frameworks that make handling forms easier, such as:

It's worth noting that even using these frameworks, working with forms isn't necessarily easy. But it's much better, and will save you a lot of time.

A special case: sending files

Files are a special case with HTML forms. They are binary data—or considered as such—where all other data is text data. Because HTTP is a text protocol, there are special requirements to handle binary data.

The enctype attribute

This attribute lets you specify the value of the Content-Type HTTP header. This header is very important because it tells the server what kind of data is being sent. By default, its value is application/x-www-form-urlencoded. In human terms, this means: "This is form data that has been encoded into URL form."

But if you want to send files, you need to do two things:

  • Set the method attribute to POST because file content can't be put inside a URL parameter using a form.
  • Set the value of enctype to multipart/form-data because the data will be split into multiple parts, one for each file plus one for the text of the form body that may be sent with them.

For example:

<form method="post" enctype="multipart/form-data">
  <input type="file" name="myFile">
  <button>Send the file</button>
</form>

Note: Some browsers support the multiple attribute on the <input> element in order to send more than one file with only one input element. How the server handles those files really depends on the technology used on the server. As mentioned previously, using a framework will make your life a lot easier.

Warning: Many servers are configured with a size limit for files and HTTP requests in order to prevent abuse. It's important to check this limit with the server administrator before sending a file.

安全问题

每次你发送数据到服务端,你都需要考虑安全问题。HTML表单是攻击服务端的首选对象,这个问题并不是由HTML表单产生的,而是由服务端采用了不合理的处理方式而导致的。

Common security flaws

Depending on what you're doing, there are some very well-known security issues:

XSS and CSRF

跨站脚本攻击 (XSS) and 跨站请求伪造 (CSRF) are common types of attacks that occur when you display data sent by a user back to the user or to another user.

XSS lets attackers inject client-side script into Web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same origin policy. The effect of these attacks may range from a petty nuisance to a significant security risk.

CSRF are similar to XSS attacks in that they start the same way—by injecting client-side script into Web pages—but their target is different. CSRF attackers try to escalate privileges to those of a higher-privileged user (such as a site administrator) to perform an action it shouldn't be able to do (for example, sending data to an untrusted user).

XSS attacks exploit the trust a user has for a web site, while CSRF attacks exploit the trust a web site has for its users.

To prevent these attacks, you should always check the data a user sends to your server and (if you need to display it) try not to display the HTML content as provided by the user. Intead, you should process the user-provided data so you don't display it verbatim.  Almost all frameworks on the market today implement a minimal filter that removes the HTML <script>, <iframe> and <object> elements from data sent by any user. This helps to mitigate the risk, but doesn't necessarily eradicate it.

SQL injection

SQL injection is a type of attack that tries to perform actions on a database used by the target web site. This typically involves sending an SQL request in the hope that the server will execute it (many times when the application server tries to store the data). This is actually one of the main vector attacks against web sites.

The consequences can be terrible, ranging from data loss to access to a whole infrastructure by using privilege escalation. This is a very serious threat and you should never store data sent by a user without performing some sanitization (for example, by using mysql_real_escape_string() on a PHP/MySQL infrastructure).

HTTP header injection and email injection

These kinds of attacks can occur when your application builds HTTP headers or emails based on the data input by a user on a form. These won't directly damage your server or affect your users, but they are an open door to deeper problems such as session hijacking or phishing attacks.

These attacks are mostly silent, and can turn your server into a zombie.

Be paranoid: Never trust your users

So, how do you fight these threats? This is a topic far beyond this guide, but there are a few rules to keep in mind. The most important rule is: never ever trust your users, including yourself; even a trusted user could have been hijacked.

All data that comes to your server must be checked and sanitized. Always. No exception.

  • Escape potentially dangerous characters. The specific characters you should be cautious with vary depending on the context in which the data is used and the server platform you employ, but all server-side languages have functions for this.
  • Limit the incoming amount of data to allow only what's necessary.
  • Sandbox uploaded files (store them on a different server and allow access to the file only through a different subdomain or even better through a fully different domain name).

You should avoid many/most problems if you follow these three rules, but it's always a good idea to get a security review performed by a competent third party. Don't assume that you've seen all the possible problems.

Conclusion

As you can see, sending form data is easy, but securing an application can be tricky. Just remember that a front-end developer is not the one who should define the security model of the data. Yes, as we'll see, it's possible to perform client side data validation but the server can't trust this validation because it has no way to truly know what really happens on the client side.

See also

If you want to learn more about securing a web application, you can dig into these resources:

文档标签和贡献者