有些网站会根据 http request 中的 user-agent header 的值返回不同的response,例如 http://wttr.in 会根据就会根据 user-agent 是否为 curl 来决定是返回带图片的 HTML,还是字符拼接图案的文本。
一开始我以为修改 url package 中的 user-agent 就是直接把相应的 header 内容加到 url-request-extra-headers 中就行了,事实证明我还是太天真了,这样做的后果是会产生两个 user-agent header...
- (let ((url-debug t)
- (url-request-extra-headers '(("User-Agent" . "curl/7.78.0"))))
- (kill-buffer (url-retrieve-synchronously "http://wttr.in"))
- (with-current-buffer "*URL-DEBUG*"
- (keep-lines "^User-Agent" (point-min) (point-max))
- (buffer-substring-no-properties (point-min) (point-max))))
-
- User-Agent: URL/Emacs Emacs/28.0.50 (X11; x86_64-pc-linux-gnu)
- User-Agent: curl/7.78.0
-
在翻阅了 url manual 之后才知道,原来 url 专门有个变量用来控制 user-agent:
- url-user-agent is a variable defined in ‘url-vars.el’.
-
- Its value is ‘default’
-
- You can customize this variable.
- This variable was introduced, or its default value was changed, in
- version 26.1 of Emacs.
- Probably introduced at or before Emacs version 25.1.
-
- User Agent used by the URL package for HTTP/HTTPS requests.
- Should be one of:
- * A string (not including the "User-Agent:" prefix)
- * A function of no arguments, returning a string
- * ‘default’ (to compute a value according to ‘url-privacy-level’)
- * nil (to omit the User-Agent header entirely)
-
所以修改 user-agent header 的正确方法是修改 url-user-agent 这个变量的值:
- (let ((url-debug t)
- (url-user-agent "curl/7.78.0"))
- (kill-buffer (url-retrieve-synchronously "http://wttr.in"))
- (with-current-buffer "*URL-DEBUG*"
- (keep-lines "^User-Agent" (point-min) (point-max))
- (buffer-substring-no-properties (point-min) (point-max))))
-
- User-Agent: curl/7.78.0