Download Install Tutorial Docs FAQ Tools WikiLicense Team IRC Planet Involvement Shop Book

Ticket #946 (defect)

Opened 1 year ago

Last modified 1 year ago

Problem with encoded text in multipart/form-data

Status: closed (fixed)

Reported by: mail@timgolden.me.uk Assigned to: fumanchu
Priority: high Milestone: 3.2
Component: CherryPy code Keywords:
Cc:

The code below is a very slightly modified version of the cherrypy Page 0 tutorial which presents a multipart/form-data form. It only has one field -- a textarea. In reality, I also have a file upload and other fields, but they're not necessary to present the problem. I enter non-ascii characters into the text area and submit. Although I have set as much as I believe in the way of encoding headers, I get a cherrypy traceback, reproduced below. This only occurs with multipart/form-data forms: a conventional form with no enctype behaves ok. Reproduced below is the cherrypy code, the traceback, and the headers (courtesy of the LiveHeaders? addon) which Firefox is sending with the request. (IE behaves the same way, FWIW).

I'm only inserting one character into the form: U+201C, the infamous left quotation mark. The POST header seems to be encoding it correctly as UTF-8 (that's "\xe2\x80\x9c" you can see there) but cherrypy doesn't seem to have any way of picking that up.

I'm running against the current Svn HEAD (r2470). Christian Wyglendowski confirmed on the mailing list that the code works ok in 3.1.2, producing the expected:

{'text': '\xe2\x80\x9c', 'submit': 'Create'}

Code:

import cherrypy

FORM = """
<form method="POST" enctype="multipart/form-data" accept-
charset="utf-8">
  <textarea name="text" value=""></textarea>
  <p>
    <input type="submit" id="create" name="submit" value="Create" />
    <input type="submit" id="cancel" name="submit" value="Cancel" />
  </p>
</form>
"""

class HelloWorld:
    def index(self, **kwargs):
        if kwargs:
          return repr (kwargs)
        else:
          return FORM
    index.exposed = True

cherrypy.config.update (
  {
    "global" : {
      "tools.encode.on" : True,
      "tools.encode.encoding" : "utf-8",
    },
  }
)
cherrypy.quickstart(HelloWorld())

TRACEBACK:

Traceback (most recent call last):
  File "c:\work_in_progress\cherrypy\cherrypy\_cprequest.py", line
646, in respond
    self.body.process()
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line
595, in process
    super(RequestBody, self).process()
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line
281, in process
    proc(self)
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line 82,
in process_multipart_form_data
    process_multipart(entity)
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line 76,
in process_multipart
    part.process()
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line
279, in process
    self.default_proc()
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line
398, in default_proc
    result = self.read_lines_to_boundary()
  File "c:\work_in_progress\cherrypy\cherrypy\_cpreqbody.py", line
387, in read_lines_to_boundary
    result = result.decode(self.encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
0: ordinal not in range(128)

HEADERS:

POST / HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:
1.9.1.1) Gecko/20090715 Firefox/3.5.1 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/
*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost:8080/
Content-Type: multipart/form-data;
boundary=---------------------------16405529922056
Content-Length: 246
-----------------------------16405529922056
Content-Disposition: form-data; name="text"

“
-----------------------------16405529922056
Content-Disposition: form-data; name="submit"

Create
-----------------------------16405529922056--

HTTP/1.x 500 Internal Server Error
Date: Tue, 28 Jul 2009 09:33:26 GMT
Content-Length: 1886
Content-Type: text/html;charset=utf-8
Server: CherryPy/3.2.0
----------------------------------------------------------

Change History

08/05/09 00:10:48: Modified by fumanchu

  • priority changed from normal to high.
  • milestone set to 3.2.

08/08/09 11:27:47: Modified by fumanchu

  • status changed from new to assigned.

This worked in 3.1 because tools.decode used a universal "default_encoding" argument, set to 'utf-8'.

This does *not* work in 3.2 because we're actually following the MIME spec now which says the default is US-ASCII.

The fix would involve allowing cherrypy.request.body.default_encoding to trickle down to multipart parts.

08/09/09 20:35:48: Modified by fumanchu

Fixed in trunk in [2495]. Needs port to python3.

I replaced the 3 Entity attributes (force_encoding, encoding, default_encoding) with a single list: "attempt_charsets". Benefits:

  1. Multiple charsets can be attempted.
  2. The encoding declared in the Content-Type request header (if any) can be both preceded by app-specified charsets, and also followed by them or by framework defaults.
  3. The 'decode' Tool from 3.1 has been reinstated with a backward-compatible API. It and any user tools which wish to modify request entity parsing or decoding can run at the 'before_request_body' hook.

08/10/09 19:03:38: Modified by fumanchu

  • status changed from assigned to closed.
  • resolution set to fixed.

python3 upgraded in [2496].

Hosted by WebFaction

Log in as guest/cpguest to create tickets