JimDabell 6 days ago

That’s not quite the whole story. Appendix C of the XHTML 1.0 specification provides HTML compatibility guidelines:

> This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.

https://www.w3.org/TR/xhtml1/#guidelines

And RFC 2854, which defines the text/html media type, explicitly states this is permissible to label as text/html:

> The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401]. In addition, [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html.

https://datatracker.ietf.org/doc/html/rfc2854#section-2

However even browsers that support XHTML rendering use their HTML parser for XHTML 1.0 documents served as text/html, even though they should really be parsing them as XHTML 1.0.

But yes, that extra slash means something entirely different to the SGML formulation of HTML (HTML 2.0 to HTML 4.01). HTML5 ditched SGML though, so SHORTTAG NET is no longer a thing.

1
currysausage 6 days ago

I believe the sentence from the RFC:

[XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01

is technically incorrect. While the XHTML 1 compatibility profile was compatible with HTML 4 as implemented by major browsers, that wasn't actually HTML 4. HTML 4 is based on SGML, while what was implemented was a combination of HTML 4 semantics with the tagsoup parsing rules that browsers organically developed. These rules were only later formalized as part of HTML 5.

The compatibility guidelines do recommend a space between <br and />, but (at least according to https://validator.w3.org/ in HTML 4 mode) this doesn't change anything about <br /> being a NET-enabling start-tag <br /, followed by a greather-than sign.

Enter this:

  <h1>Hello<br />world</h1>
and select "Validate HTML fragment", "HTML 4.01", and "Show Outline". This is the result:

  [H1] Hello>world
(Obviously nitpicking, but that's my point: the nitpickers can be out-nitpicked.)

JimDabell 6 days ago

Haha yes. Appendix C gave compatibility guidelines, but you are right that doesn’t actually result in documents that could be parsed by a parser that implemented SHORTTAG NET.

Elsewhere in the thread, I posted an example of SHORTTAG NET being removed from a browser to enable parsing of XHTML documents:

https://github.com/emacsmirror/w3/commit/68af7c107dcbe194e30...

Nevertheless, the text/html RFC explicitly condones Appendix C, so despite it not being fully reflective of reality, it’s still permissible to use text/html to label XHTML 1.0 documents that follow Appendix C :D