做爬虫业务一段时间了,经常用到GET, POST方法请求数据。GET请求最没问题,而POST中常用的 表单提交,JSON提交也比较容易。自以为对 TCP/HTTP 协议理解透彻。然后想到HTTP文件上传的原理,却还不懂,突然想搞明白,故网上查了资料。其中涉及HTTP的RFC1867协议,记录如下:
使用HTTP的 POST 方法,提交文件上传。
Content-Type 请求头的值有如下几种:
multipart/form-data 格式的POST的数据如下所示:
- ------WebKitFormBoundarycz5DOEJKqu7XXB7k
- Content-Disposition: form-data; name="_csrf"
-
- NgnTBmqX7F9HqIjxufqrM4MCr-Szxtw3SISaHY4Sl-O3XnZys1SMHY2L2MB_INRebu0fWuj6tmXlQAqM8GdIKw==
- ------WebKitFormBoundarycz5DOEJKqu7XXB7k
- Content-Disposition: form-data; name="Product[product_no]"
-
- H312985401
- ------WebKitFormBoundarycz5DOEJKqu7XXB7k
- Content-Disposition: form-data; name="Product[name]"
-
- 女式连帽针织开衫
- ------WebKitFormBoundarycz5DOEJKqu7XXB7k
- Content-Disposition: form-data; name="Product[price]"
-
- 539
- ------WebKitFormBoundarycz5DOEJKqu7XXB7k
- Content-Disposition: form-data; name="Product[describe]"
-
-
- ------WebKitFormBoundarycz5DOEJKqu7XXB7k--
-
- ------WebKitFormBoundaryzBpJfpFKA7eYQx6h
- Content-Disposition: form-data; name="file"; filename="rust-lang.png"
- Content-Type: image/png
-
-
- ------WebKitFormBoundaryzBpJfpFKA7eYQx6h
- Content-Disposition: form-data; name="token"
-
- 171a2fe2c5be7ad5772957b48dc50c41
- ------WebKitFormBoundaryzBpJfpFKA7eYQx6h
- Content-Disposition: form-data; name="type"
-
- format
- ------WebKitFormBoundaryzBpJfpFKA7eYQx6h--
-
multipart/form-data 最初由 《RFC 1867: Form-based File Upload in HTML》[1]文档提出。
RFC1867 文档中也写了为什么要新增类型,而不使用旧有的application/x-www-form-urlencoded:因为此类型不适合用于传输大型二进制数据或包含非ASCII字符的数据。平常我们使用这个类型都是把表单数据使用url编码后传送给后端,二进制文件当然没办法一起编码进去了。所以multipart/form-data就诞生了
摘自 《RFC 1867: Form-based File Upload in HTML》[2] 6.Example
- Content-type: multipart/form-data, boundary=AaB03x
-
- --AaB03x
- content-disposition: form-data; name="field1"
- Joe Blow
- --AaB03x
- content-disposition: form-data; name="pics"; filename="file1.txt"
- Content-Type: text/plain
-
- ... contents of file1.txt ...
- --AaB03x--
-
HTTP的 Content-type 请求头中,除了定义 multipart/form-data 外,还指定了 boundary (翻译为边界,即界定符),
用来分割不同参数值和标记POST数据的结束。
multipart/form-data 类型的POST请求体中,包含多个字段参数,每个字段内容用 --+界定符boundary 隔开,上传文件的信息内容,就是一个字段参数。具体内容如下所示:
- --AaB03x
- content-disposition: form-data; name="pics"; filename="file1.txt"
- Content-Type: text/plain
-
- ... contents of file1.txt ...
-
--AaB03x 为本字段内容的开始符,也是上个字段内容的结束符。接下来是正式字段内容。
上传文件的字段信息,说明如下:
以上4个信息包含在一个form-data字段里。其中,文件内容类型和文件具体内容之间,要再包含一个空行(\r\n)
- const path = require('path');
- const fs = require('fs');
- const http = require('http');
- // 定义一个分隔符,要确保唯一性
- const boundaryKey = '-------------------------461591080941622511336662';
- const request = http.request({
- method: 'post',
- host: 'localhost',
- port: '7787',
- path: '/files',
- headers: {
- 'Content-Type': 'multipart/form-data; boundary=' + boundaryKey, // 在请求头上加上分隔符
- 'Connection': 'keep-alive'
- }
- });
- // 写入内容头部
- request.write(
- `--${boundaryKey}\r\nContent-Disposition: form-data; name="file"; filename="1.png"\r\nContent-Type: image/jpeg\r\n\r\n`
- );
- // 写入内容
- const fileStream = fs.createReadStream(path.join(__dirname, '../1.png'));
- fileStream.pipe(request, { end: false });
- fileStream.on('end', function () {
- // 写入尾部
- request.end('\r\n--' + boundaryKey + '--' + '\r\n');
- });
- request.on('response', function(res) {
- console.log(res.statusCode);
- });
-
- package main
-
- import (
- "io/ioutil"
- "log"
- "net/http"
- "os"
- "strings"
- "sync"
- "time"
- )
-
- var wc sync.WaitGroup
-
- //SendData sends data to server.
- func SendData(c *http.Client, url string, method string, filePath string) {
- defer wc.Done()
-
- if c == nil {
- log.Fatalln("client is nil")
- }
- if method == "POST" {
- boundary := "ASSDFWDFBFWEFWWDF" //可以自己设定,需要比较复杂的字符串作
- var data []byte
- if _, err := os.Lstat(filePath); err == nil {
- file, _ := os.Open(filePath)
- defer file.Close()
-
- data, _ = ioutil.ReadAll(file)
- } else {
- log.Fatal("file not exist")
- }
-
- picData := "--" + boundary + "\n"
- picData = picData + "Content-Disposition: form-data; name=\"userfile\"; filename=" + filePath + "\n"
- picData = picData + "Content-Type: application/octet-stream\n\n"
- picData = picData + string(data) + "\n"
- picData = picData + "--" + boundary + "\n"
- picData = picData + "Content-Disposition: form-data; name=\"text\";filename=\"1.txt\"\n\n"
- picData = picData + string("data=ali") + "\n"
- picData = picData + "--" + boundary + "--"
-
- req, err := http.NewRequest(method, url, strings.NewReader(picData))
- req.Header.Set("Content-Type", "multipart/form-data; boundary=" + boundary)
- if err == nil {
- if rep, err := c.Do(req); err == nil {
- content, _ := ioutil.ReadAll(rep.Body)
- log.Println("get response: " + string(content))
- rep.Body.Close()
- }
- }
- } else if method == "GET" {
- //TODO get data from server
- }
- }
-
- func main() {
- client := &http.Client{
- Timeout: time.Second * 3,
- }
- postImgPath := "1.png"
- method := "POST"
- url := "http://127.0.0.1:8000/postdata"
- wc.Add(1)
-
- go SendData(client, url, method, postImgPath)
-
- wc.Wait()
- }
-
POST请求头内容:
- --boundary //分割符
- Content-Disposition: form-data; name="userfile"; filename="1.png"
- Content-Type: application/octet-stream
-
- 1.png的内容
- --${bound}
- Content-Disposition: form-data; name="text"; filename="username"
-
- name=Tom
- --boundary--
-
Golang服务端接收上传文件:
- package main
-
- import (
- "fmt"
- "io"
- "log"
- "net/http"
- "os"
- "strings"
- "time"
- )
-
- //DownloadFile download file from client to local.
- func DownloadFile(w http.ResponseWriter, r *http.Request) {
- switch r.Method {
- case "GET":
- fmt.Println("GET")
- w.Write([]byte(string("hi, get successful")))
- case "POST":
- fmt.Println("POST")
- r.ParseForm() //解析表单
- imgFile, _, err := r.FormFile("userfile")//获取文件内容
- if err != nil {
- log.Fatal(err)
- }
- defer imgFile.Close()
-
- imgName := ""
- files := r.MultipartForm.File //获取表单中的信息
- for k, v := range files {
- for _, vv := range v {
- fmt.Println(k + ":" + vv.Filename)//获取文件名
- if strings.Index(vv.Filename, ".png") > 0 {
- imgName = vv.Filename
- }
- }
- }
-
- saveFile, _ := os.Create(imgName)
- defer saveFile.Close()
- io.Copy(saveFile, imgFile) //保存
-
- w.Write([]byte("successfully saved"))
- default:
- fmt.Println("default")
- }
- }
-
- func main() {
- server := &http.Server{
- Addr: "127.0.0.1:8000",
- ReadTimeout: 2 * time.Second,
- WriteTimeout: 2 * time.Second,
- }
- mux := http.NewServeMux()
- mux.HandleFunc("/postdata", DownloadFile)
- server.Handler = mux
- server.ListenAndServe()
- }
-
RFC(Request For Comments)-意即“请求评议”,是一系列以编号排定的文件,包含了关于Internet的几乎所有重要的文字资料,基本的互联网通信协议都有在RFC文件内详细说明。目前RFC文件由Internet Society(ISOC)赞助发行。如果你想成为网络方面的专家,那么RFC无疑是最重要也是最经常需要用到的资料之一,所以RFC享有网络知识圣经之美誉。
HTTP 1.1 版本规范由 RFC2616 定义。