Add a simple markdown parser for formatting rustc --explain

Currently, the output of `rustc --explain foo` displays the raw markdown in a pager. This is acceptable, but using actual formatting makes it easier to understand. This patch consists of three major components: 1. A markdown parser. This is an extremely simple non-backtracking recursive implementation that requires normalization of the final token stream 2. A utility to write the token stream to an output buffer 3. Configuration within rustc_driver_impl to invoke this combination for `--explain`. Like the current implementation, it first attempts to print to a pager with a fallback colorized terminal, and standard print as a last resort. If color is disabled, or if the output does not support it, or if printing with color fails, it will write the raw markdown (which matches current behavior). Pagers known to support color are: `less` (with `-r`), `bat` (aka `catbat`), and `delta`. The markdown parser does not support the entire markdown specification, but should support the following with reasonable accuracy: - Headings, including formatting - Comments - Code, inline and fenced block (no indented block) - Strong, emphasis, and strikethrough formatted text - Links, anchor, inline, and reference-style - Horizontal rules - Unordered and ordered list items, including formatting This parser and writer should be reusable by other systems if ever needed.
2022-12-19 12:09:40 -06:00
parent 8aed93d912
commit 6a1c10bd85
15 changed files with 1408 additions and 19 deletions
--- a/compiler/rustc_errors/src/markdown/tests/input.md
+++ b/compiler/rustc_errors/src/markdown/tests/input.md
@@ -0,0 +1,50 @@
+# H1 Heading [with a link][remote-link]
+
+H1 content: **some words in bold** and `so does inline code`
+
+## H2 Heading
+
+H2 content: _some words in italic_
+
+### H3 Heading
+
+H3 content: ~~strikethrough~~ text
+
+#### H4 Heading
+
+H4 content: A [simple link](https://docs.rs) and a [remote-link].
+
+---
+
+A section break was above. We can also do paragraph breaks:
+
+(new paragraph) and unordered lists:
+
+- Item 1 in `code`
+- Item 2 in _italics_
+
+Or ordered:
+
+1. Item 1 in **bold**
+2. Item 2 with some long lines that should wrap: Lorem ipsum dolor sit amet,
+   consectetur adipiscing elit. Aenean ac mattis nunc. Phasellus elit quam,
+   pulvinar ac risus in, dictum vehicula turpis. Vestibulum neque est, accumsan
+   in cursus sit amet, dictum a nunc. Suspendisse aliquet, lorem eu eleifend
+   accumsan, magna neque sodales nisi, a aliquet lectus leo eu sem.
+
+---
+
+## Code
+
+Both `inline code` and code blocks are supported:
+
+```rust
+/// A rust enum
+#[derive(Debug, PartialEq, Clone)]
+enum Foo {
+    /// Start of line
+    Bar
+}
+```
+
+[remote-link]: http://docs.rs
--- a/compiler/rustc_errors/src/markdown/tests/output.stdout
+++ b/compiler/rustc_errors/src/markdown/tests/output.stdout
@@ -0,0 +1,35 @@
+[0m[0m[1m[4m[38;5;14mH1 Heading [0m[0m[1m[4m[38;5;14m]8;;http://docs.rs\with a link]8;;\[0m[0m[1m[4m[38;5;14m[0m
+[0mH1 content: [0m[0m[1msome words in bold[0m and [0m[0m[2mso does inline code[0m
+
+[0m[0m[4m[38;5;14mH2 Heading[0m[0m[4m[38;5;14m[0m
+[0mH2 content: [0m[0m[3msome words in italic[0m
+
+[0m[0m[3m[38;5;14mH3 Heading[0m[0m[3m[38;5;14m[0m
+[0mH3 content: [0m[0m[9mstrikethrough[0m text[0m
+
+[0m[0m[3m[4m[36mH4 Heading[0m[0m[3m[4m[36m[0m
+[0mH4 content: A [0m]8;;https://docs.rs\simple link]8;;\[0m and a [0m]8;;http://docs.rs\remote-link]8;;\[0m.[0m
+[0m--------------------------------------------------------------------------------------------------------------------------------------------[0m
+[0mA section break was above. We can also do paragraph breaks:[0m
+
+[0m(new paragraph) and unordered lists:[0m
+
+[0m*   [0mItem 1 in [0m[0m[2mcode[0m[0m[0m
+[0m*   [0mItem 2 in [0m[0m[3mitalics[0m[0m[0m
+
+[0mOr ordered:[0m
+
+[0m1.  [0mItem 1 in [0m[0m[1mbold[0m[0m[0m
+[0m2.  [0mItem 2 with some long lines that should wrap: Lorem ipsum dolor sit amet,[0m consectetur adipiscing elit. Aenean ac mattis nunc. Phasellus
+    elit quam,[0m pulvinar ac risus in, dictum vehicula turpis. Vestibulum neque est, accumsan[0m in cursus sit amet, dictum a nunc. Suspendisse
+    aliquet, lorem eu eleifend[0m accumsan, magna neque sodales nisi, a aliquet lectus leo eu sem.[0m[0m[0m
+[0m--------------------------------------------------------------------------------------------------------------------------------------------[0m
+[0m[0m[4m[38;5;14mCode[0m[0m[4m[38;5;14m[0m
+[0mBoth [0m[0m[2minline code[0m and code blocks are supported:[0m
+
+[0m[0m[2m/// A rust enum
+#[derive(Debug, PartialEq, Clone)]
+enum Foo {
+    /// Start of line
+    Bar
+}[0m[0m
--- a/compiler/rustc_errors/src/markdown/tests/parse.rs
+++ b/compiler/rustc_errors/src/markdown/tests/parse.rs
@@ -0,0 +1,312 @@
+use super::*;
+use ParseOpt as PO;
+
+#[test]
+fn test_parse_simple() {
+    let buf = "**abcd** rest";
+    let (t, r) = parse_simple_pat(buf.as_bytes(), STG, STG, PO::None, MdTree::Strong).unwrap();
+    assert_eq!(t, MdTree::Strong("abcd"));
+    assert_eq!(r, b" rest");
+
+    // Escaping should fail
+    let buf = r"**abcd\** rest";
+    let res = parse_simple_pat(buf.as_bytes(), STG, STG, PO::None, MdTree::Strong);
+    assert!(res.is_none());
+}
+
+#[test]
+fn test_parse_comment() {
+    let opt = PO::TrimNoEsc;
+    let buf = "<!-- foobar! -->rest";
+    let (t, r) = parse_simple_pat(buf.as_bytes(), CMT_S, CMT_E, opt, MdTree::Comment).unwrap();
+    assert_eq!(t, MdTree::Comment("foobar!"));
+    assert_eq!(r, b"rest");
+
+    let buf = r"<!-- foobar! \-->rest";
+    let (t, r) = parse_simple_pat(buf.as_bytes(), CMT_S, CMT_E, opt, MdTree::Comment).unwrap();
+    assert_eq!(t, MdTree::Comment(r"foobar! \"));
+    assert_eq!(r, b"rest");
+}
+
+#[test]
+fn test_parse_heading() {
+    let buf1 = "# Top level\nrest";
+    let (t, r) = parse_heading(buf1.as_bytes()).unwrap();
+    assert_eq!(t, MdTree::Heading(1, vec![MdTree::PlainText("Top level")].into()));
+    assert_eq!(r, b"\nrest");
+
+    let buf1 = "# Empty";
+    let (t, r) = parse_heading(buf1.as_bytes()).unwrap();
+    assert_eq!(t, MdTree::Heading(1, vec![MdTree::PlainText("Empty")].into()));
+    assert_eq!(r, b"");
+
+    // Combo
+    let buf2 = "### Top `level` _woo_\nrest";
+    let (t, r) = parse_heading(buf2.as_bytes()).unwrap();
+    assert_eq!(
+        t,
+        MdTree::Heading(
+            3,
+            vec![
+                MdTree::PlainText("Top "),
+                MdTree::CodeInline("level"),
+                MdTree::PlainText(" "),
+                MdTree::Emphasis("woo"),
+            ]
+            .into()
+        )
+    );
+    assert_eq!(r, b"\nrest");
+}
+
+#[test]
+fn test_parse_code_inline() {
+    let buf1 = "`abcd` rest";
+    let (t, r) = parse_codeinline(buf1.as_bytes()).unwrap();
+    assert_eq!(t, MdTree::CodeInline("abcd"));
+    assert_eq!(r, b" rest");
+
+    // extra backticks, newline
+    let buf2 = "```ab\ncd``` rest";
+    let (t, r) = parse_codeinline(buf2.as_bytes()).unwrap();
+    assert_eq!(t, MdTree::CodeInline("ab\ncd"));
+    assert_eq!(r, b" rest");
+
+    // test no escaping
+    let buf3 = r"`abcd\` rest";
+    let (t, r) = parse_codeinline(buf3.as_bytes()).unwrap();
+    assert_eq!(t, MdTree::CodeInline(r"abcd\"));
+    assert_eq!(r, b" rest");
+}
+
+#[test]
+fn test_parse_code_block() {
+    let buf1 = "```rust\ncode\ncode\n```\nleftovers";
+    let (t, r) = parse_codeblock(buf1.as_bytes());
+    assert_eq!(t, MdTree::CodeBlock { txt: "code\ncode", lang: Some("rust") });
+    assert_eq!(r, b"\nleftovers");
+
+    let buf2 = "`````\ncode\ncode````\n`````\nleftovers";
+    let (t, r) = parse_codeblock(buf2.as_bytes());
+    assert_eq!(t, MdTree::CodeBlock { txt: "code\ncode````", lang: None });
+    assert_eq!(r, b"\nleftovers");
+}
+
+#[test]
+fn test_parse_link() {
+    let simple = "[see here](docs.rs) other";
+    let (t, r) = parse_any_link(simple.as_bytes(), false).unwrap();
+    assert_eq!(t, MdTree::Link { disp: "see here", link: "docs.rs" });
+    assert_eq!(r, b" other");
+
+    let simple_toplevel = "[see here](docs.rs) other";
+    let (t, r) = parse_any_link(simple_toplevel.as_bytes(), true).unwrap();
+    assert_eq!(t, MdTree::Link { disp: "see here", link: "docs.rs" });
+    assert_eq!(r, b" other");
+
+    let reference = "[see here] other";
+    let (t, r) = parse_any_link(reference.as_bytes(), true).unwrap();
+    assert_eq!(t, MdTree::RefLink { disp: "see here", id: None });
+    assert_eq!(r, b" other");
+
+    let reference_full = "[see here][docs-rs] other";
+    let (t, r) = parse_any_link(reference_full.as_bytes(), false).unwrap();
+    assert_eq!(t, MdTree::RefLink { disp: "see here", id: Some("docs-rs") });
+    assert_eq!(r, b" other");
+
+    let reference_def = "[see here]: docs.rs\nother";
+    let (t, r) = parse_any_link(reference_def.as_bytes(), true).unwrap();
+    assert_eq!(t, MdTree::LinkDef { id: "see here", link: "docs.rs" });
+    assert_eq!(r, b"\nother");
+}
+
+const IND1: &str = r"test standard
+    ind
+    ind2
+not ind";
+const IND2: &str = r"test end of stream
+  1
+  2
+";
+const IND3: &str = r"test empty lines
+  1
+  2
+
+not ind";
+
+#[test]
+fn test_indented_section() {
+    let (t, r) = get_indented_section(IND1.as_bytes());
+    assert_eq!(str::from_utf8(t).unwrap(), "test standard\n    ind\n    ind2");
+    assert_eq!(str::from_utf8(r).unwrap(), "\nnot ind");
+
+    let (txt, rest) = get_indented_section(IND2.as_bytes());
+    assert_eq!(str::from_utf8(txt).unwrap(), "test end of stream\n  1\n  2");
+    assert_eq!(str::from_utf8(rest).unwrap(), "\n");
+
+    let (txt, rest) = get_indented_section(IND3.as_bytes());
+    assert_eq!(str::from_utf8(txt).unwrap(), "test empty lines\n  1\n  2");
+    assert_eq!(str::from_utf8(rest).unwrap(), "\n\nnot ind");
+}
+
+const HBT: &str = r"# Heading
+
+content";
+
+#[test]
+fn test_heading_breaks() {
+    let expected = vec![
+        MdTree::Heading(1, vec![MdTree::PlainText("Heading")].into()),
+        MdTree::PlainText("content"),
+    ]
+    .into();
+    let res = entrypoint(HBT);
+    assert_eq!(res, expected);
+}
+
+const NL1: &str = r"start
+
+end";
+const NL2: &str = r"start
+
+
+end";
+const NL3: &str = r"start
+
+
+
+end";
+
+#[test]
+fn test_newline_breaks() {
+    let expected =
+        vec![MdTree::PlainText("start"), MdTree::ParagraphBreak, MdTree::PlainText("end")].into();
+    for (idx, check) in [NL1, NL2, NL3].iter().enumerate() {
+        let res = entrypoint(check);
+        assert_eq!(res, expected, "failed {idx}");
+    }
+}
+
+const WRAP: &str = "plain _italics
+italics_";
+
+#[test]
+fn test_wrap_pattern() {
+    let expected = vec![
+        MdTree::PlainText("plain "),
+        MdTree::Emphasis("italics"),
+        MdTree::Emphasis(" "),
+        MdTree::Emphasis("italics"),
+    ]
+    .into();
+    let res = entrypoint(WRAP);
+    assert_eq!(res, expected);
+}
+
+const WRAP_NOTXT: &str = r"_italics_
+**bold**";
+
+#[test]
+fn test_wrap_notxt() {
+    let expected =
+        vec![MdTree::Emphasis("italics"), MdTree::PlainText(" "), MdTree::Strong("bold")].into();
+    let res = entrypoint(WRAP_NOTXT);
+    assert_eq!(res, expected);
+}
+
+const MIXED_LIST: &str = r"start
+- _italics item_
+<!-- comment -->
+- **bold item**
+  second line [link1](foobar1)
+  third line [link2][link-foo]
+-   :crab:
+    extra indent
+end
+[link-foo]: foobar2
+";
+
+#[test]
+fn test_list() {
+    let expected = vec![
+        MdTree::PlainText("start"),
+        MdTree::ParagraphBreak,
+        MdTree::UnorderedListItem(vec![MdTree::Emphasis("italics item")].into()),
+        MdTree::LineBreak,
+        MdTree::UnorderedListItem(
+            vec![
+                MdTree::Strong("bold item"),
+                MdTree::PlainText(" second line "),
+                MdTree::Link { disp: "link1", link: "foobar1" },
+                MdTree::PlainText(" third line "),
+                MdTree::Link { disp: "link2", link: "foobar2" },
+            ]
+            .into(),
+        ),
+        MdTree::LineBreak,
+        MdTree::UnorderedListItem(
+            vec![MdTree::PlainText("🦀"), MdTree::PlainText(" extra indent")].into(),
+        ),
+        MdTree::ParagraphBreak,
+        MdTree::PlainText("end"),
+    ]
+    .into();
+    let res = entrypoint(MIXED_LIST);
+    assert_eq!(res, expected);
+}
+
+const SMOOSHED: &str = r#"
+start
+### heading
+1. ordered item
+```rust
+println!("Hello, world!");
+```
+`inline`
+``end``
+"#;
+
+#[test]
+fn test_without_breaks() {
+    let expected = vec![
+        MdTree::PlainText("start"),
+        MdTree::ParagraphBreak,
+        MdTree::Heading(3, vec![MdTree::PlainText("heading")].into()),
+        MdTree::OrderedListItem(1, vec![MdTree::PlainText("ordered item")].into()),
+        MdTree::ParagraphBreak,
+        MdTree::CodeBlock { txt: r#"println!("Hello, world!");"#, lang: Some("rust") },
+        MdTree::ParagraphBreak,
+        MdTree::CodeInline("inline"),
+        MdTree::PlainText(" "),
+        MdTree::CodeInline("end"),
+    ]
+    .into();
+    let res = entrypoint(SMOOSHED);
+    assert_eq!(res, expected);
+}
+
+const CODE_STARTLINE: &str = r#"
+start
+`code`
+middle
+`more code`
+end
+"#;
+
+#[test]
+fn test_code_at_start() {
+    let expected = vec![
+        MdTree::PlainText("start"),
+        MdTree::PlainText(" "),
+        MdTree::CodeInline("code"),
+        MdTree::PlainText(" "),
+        MdTree::PlainText("middle"),
+        MdTree::PlainText(" "),
+        MdTree::CodeInline("more code"),
+        MdTree::PlainText(" "),
+        MdTree::PlainText("end"),
+    ]
+    .into();
+    let res = entrypoint(CODE_STARTLINE);
+    assert_eq!(res, expected);
+}
--- a/compiler/rustc_errors/src/markdown/tests/term.rs
+++ b/compiler/rustc_errors/src/markdown/tests/term.rs
@@ -0,0 +1,90 @@
+use std::io::BufWriter;
+use std::path::PathBuf;
+use termcolor::{BufferWriter, ColorChoice};
+
+use super::*;
+use crate::markdown::MdStream;
+
+const INPUT: &str = include_str!("input.md");
+const OUTPUT_PATH: &[&str] = &[env!("CARGO_MANIFEST_DIR"), "src","markdown","tests","output.stdout"];
+
+const TEST_WIDTH: usize = 80;
+
+// We try to make some words long to create corner cases
+const TXT: &str = r"Lorem ipsum dolor sit amet, consecteturadipiscingelit.
+Fusce-id-urna-sollicitudin, pharetra nisl nec, lobortis tellus. In at
+metus hendrerit, tincidunteratvel, ultrices turpis. Curabitur_risus_sapien,
+porta-sed-nunc-sed, ultricesposuerelacus. Sed porttitor quis
+dolor non venenatis. Aliquam ut. ";
+
+const WRAPPED: &str = r"Lorem ipsum dolor sit amet, consecteturadipiscingelit. Fusce-id-urna-
+sollicitudin, pharetra nisl nec, lobortis tellus. In at metus hendrerit,
+tincidunteratvel, ultrices turpis. Curabitur_risus_sapien, porta-sed-nunc-sed,
+ultricesposuerelacus. Sed porttitor quis dolor non venenatis. Aliquam ut. Lorem
+    ipsum dolor sit amet, consecteturadipiscingelit. Fusce-id-urna-
+    sollicitudin, pharetra nisl nec, lobortis tellus. In at metus hendrerit,
+    tincidunteratvel, ultrices turpis. Curabitur_risus_sapien, porta-sed-nunc-
+    sed, ultricesposuerelacus. Sed porttitor quis dolor non venenatis. Aliquam
+    ut. Sample link lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet,
+consecteturadipiscingelit. Fusce-id-urna-sollicitudin, pharetra nisl nec,
+lobortis tellus. In at metus hendrerit, tincidunteratvel, ultrices turpis.
+Curabitur_risus_sapien, porta-sed-nunc-sed, ultricesposuerelacus. Sed porttitor
+quis dolor non venenatis. Aliquam ut. ";
+
+#[test]
+fn test_wrapping_write() {
+    WIDTH.with(|w| w.set(TEST_WIDTH));
+    let mut buf = BufWriter::new(Vec::new());
+    let txt = TXT.replace("-\n","-").replace("_\n","_").replace('\n', " ").replace("    ", "");
+    write_wrapping(&mut buf, &txt, 0, None).unwrap();
+    write_wrapping(&mut buf, &txt, 4, None).unwrap();
+    write_wrapping(
+        &mut buf,
+        "Sample link lorem ipsum dolor sit amet. ",
+        4,
+        Some("link-address-placeholder"),
+    )
+    .unwrap();
+    write_wrapping(&mut buf, &txt, 0, None).unwrap();
+    let out = String::from_utf8(buf.into_inner().unwrap()).unwrap();
+    let out = out
+        .replace("\x1b\\", "")
+        .replace('\x1b', "")
+        .replace("]8;;", "")
+        .replace("link-address-placeholder", "");
+
+    for line in out.lines() {
+        assert!(line.len() <= TEST_WIDTH, "line length\n'{line}'")
+    }
+
+    assert_eq!(out, WRAPPED);
+}
+
+#[test]
+fn test_output() {
+    // Capture `--bless` when run via ./x
+    let bless = std::env::var("RUSTC_BLESS").unwrap_or_default() == "1";
+    let ast = MdStream::parse_str(INPUT);
+    let bufwtr = BufferWriter::stderr(ColorChoice::Always);
+    let mut buffer = bufwtr.buffer();
+    ast.write_termcolor_buf(&mut buffer).unwrap();
+
+    let mut blessed = PathBuf::new();
+    blessed.extend(OUTPUT_PATH);
+
+    if bless {
+        std::fs::write(&blessed, buffer.into_inner()).unwrap();
+        eprintln!("blessed output at {}", blessed.display());
+    } else {
+        let output = buffer.into_inner();
+        if std::fs::read(blessed).unwrap() != output {
+            // hack: I don't know any way to write bytes to the captured stdout
+            // that cargo test uses
+            let mut out = std::io::stdout();
+            out.write_all(b"\n\nMarkdown output did not match. Expected:\n").unwrap();
+            out.write_all(&output).unwrap();
+            out.write_all(b"\n\n").unwrap();
+            panic!("markdown output mismatch");
+        }
+    }
+}