Skip to content

fix(telegram): render Markdown with Telegram formatting#1624

Closed
elibosley wants to merge 5 commits into
caronc:masterfrom
elibosley:codex/fix-telegram-paragraph-spacing
Closed

fix(telegram): render Markdown with Telegram formatting#1624
elibosley wants to merge 5 commits into
caronc:masterfrom
elibosley:codex/fix-telegram-paragraph-spacing

Conversation

@elibosley

@elibosley elibosley commented May 26, 2026

Copy link
Copy Markdown

Summary

  • Preserves paragraph breaks when Markdown input is converted to Telegram HTML payloads.
  • Converts standard Markdown/HTML into Telegram Markdown v1 or MarkdownV2 for Telegram Markdown targets.
  • Adds Telegram-specific MarkdownV2 escaping for regular text, link destinations, and code/pre blocks.
  • Adds regression coverage for Markdown paragraphs, strikethrough, emphasis, HTML links, fenced code, inline code, HTML input, and renderer edge cases.

Root Cause

Telegram's Markdown modes are not generic Markdown. The previous path either passed standard Markdown through mostly unchanged for Telegram Markdown targets, or relied on generic HTML cleanup for Telegram HTML targets. That collapsed intentional paragraph breaks, lost Telegram-supported formatting in some HTML-to-Markdown cases, and escaped MarkdownV2 content without enough context for links and code blocks. Live Telegram client testing also showed that escaping literal [ in Markdown v1 text can display a visible backslash, so Markdown v1 text escaping is kept narrower than MarkdownV2.

Implementation Notes

  • Adds a small Telegram-specific HTMLParser renderer that emits Telegram Markdown v1/MarkdownV2.
  • Documents that the renderer is intentionally scoped to Telegram parse-mode rules and is not a replacement for a future global html_to_markdown() converter.
  • Uses Python Markdown with fenced-code support as the standard Markdown input path, with ~~strike~~ normalized for Telegram rendering.
  • Adds a per-plugin conversion hook so Telegram can receive original Markdown/HTML and perform target-specific conversion without changing generic conversion behavior for other plugins.
  • Keeps the paragraph-spacing fix from the original draft PR, now as part of the broader Telegram formatting backport from the Go port investigation in BUG: Formating Does Not Work for Telegram Service / Email Service unraid/apprise-go#48.

Validation

  • uv run --with ruff ruff check .
  • uv run --with ruff ruff format --check .
  • uv run --with pytest --with pytest-mock pytest tests/test_plugin_telegram.py tests/test_api.py -q
  • Focused diff coverage for changed production code: 100.0% line coverage, 100.0% branch coverage, 100.0% combined line+branch coverage.
  • Live Telegram Bot API/client smoke test sent HTML paragraph spacing, Markdown v1, MarkdownV2, HTML-to-MarkdownV2 link, and text-to-MarkdownV2 escaping messages. Follow-up live test verified the Markdown v1 bracket escape fix and a clean docs link.

Additional Check

  • uv run --with pytest --with pytest-mock pytest -q reported 1349 passed, 265 skipped, 14 warnings before I interrupted a hanging teardown/keyboard-interrupt cleanup path; the focused suites above completed normally.

Purpose of the change:
- Preserve paragraph spacing when Markdown input is converted to Telegram HTML payloads.

How behavior was before:
- Telegram HTML cleanup collapsed all repeated newlines to a single newline.
- Markdown paragraph boundaries were lost, while hard line breaks and paragraphs became indistinguishable.

Why that was a problem:
- Messages with intentional blank lines appeared compressed in Telegram.
- Metadata-style lines could be hard to scan because paragraph spacing from Markdown was not retained.

What the new change accomplishes:
- Keeps paragraph breaks for Markdown-origin Telegram HTML payloads.
- Preserves legacy single-newline cleanup for raw HTML and text-origin content.
- Adds regression coverage for Markdown paragraph spacing.

How it works:
- Makes Telegram HTML line-break cleanup format-sensitive.
- Uses a double CRLF for Markdown paragraph closures, keeps single CRLF for title wrappers and legacy HTML/text paths, and trims a trailing extra paragraph break.
@elibosley elibosley force-pushed the codex/fix-telegram-paragraph-spacing branch from 94c8291 to 4fd9ff2 Compare May 26, 2026 15:06
Purpose of the change:
- Backport the broader Telegram Markdown rendering fixes from apprise-go into the upstream Telegram plugin.
- Keep the existing paragraph-spacing PR as a single combined Telegram formatting change.

How behavior was before:
- Telegram Markdown targets received standard Markdown mostly unchanged.
- MarkdownV2 escaping was not context-aware for code blocks, links, and converted HTML.
- HTML input could be converted generically before Telegram had a chance to preserve Telegram-supported formatting.

Why that was a problem:
- Telegram's Markdown and MarkdownV2 parse modes are not compatible with generic Markdown syntax.
- Valid standard Markdown could render incorrectly or be rejected by Telegram for bad escaping.
- Paragraph, link, and code formatting could be lost or over-escaped.

What the new change accomplishes:
- Converts standard Markdown and HTML into Telegram Markdown v1 or MarkdownV2 for Telegram Markdown targets.
- Preserves paragraph breaks while keeping hard line breaks as single newlines.
- Applies MarkdownV2 escaping by text, link, and code/pre context.

How it works:
- Adds a Telegram-specific HTMLParser renderer for Telegram Markdown output.
- Uses Markdown rendering with fenced-code support as the standard Markdown input path.
- Adds a per-plugin conversion hook so Telegram can bypass generic Markdown conversion while other plugins keep existing behavior.
- Expands Telegram tests for standard Markdown, MarkdownV2, links, code, HTML input, and paragraph spacing.
@elibosley elibosley changed the title fix(telegram): preserve markdown paragraph spacing fix(telegram): render Markdown with Telegram formatting May 26, 2026
elibosley added 2 commits May 26, 2026 11:35
Purpose of the change:
- Raise coverage for the new Telegram Markdown renderer to 100 percent across changed executable lines and branches.

How behavior was before:
- Main Telegram Markdown behavior was covered, but defensive helper no-ops and malformed/self-closing HTML parser branches were not executed.

Why that was a problem:
- The new renderer carried uncovered edge paths around empty builders, single trailing newlines, <br>, <li>, self-closing tags, and unmatched end tags.

What the new change accomplishes:
- Adds focused tests for the remaining Telegram renderer edge cases.
- Confirms the new code reaches 100 percent diff line and branch coverage under the focused coverage run.

How it works:
- Exercises the private Telegram renderer helpers and parser entry points directly from the Telegram plugin tests.
- Keeps the patch test-only and leaves production behavior unchanged.
Purpose of the change:
- Fix a live Telegram client rendering issue found during end-to-end testing.

How behavior was before:
- Telegram Markdown v1 escaped literal '[' characters in rendered text.
- Telegram accepted the payload but displayed the backslash in the client.

Why that was a problem:
- Markdown v1 notification labels such as [2/5 Markdown v1 conversion] showed a visible leading backslash.
- The API-level test passed, but the client rendering was visibly wrong.

What the new change accomplishes:
- Stops escaping literal '[' characters in regular Telegram Markdown v1 text.
- Preserves existing MarkdownV2 escaping and all other Markdown v1 escaping.

How it works:
- Narrows the Markdown v1 text escape regex to backslash, asterisk, underscore, and backtick.
@caronc

caronc commented May 26, 2026

Copy link
Copy Markdown
Owner

I feel some of these changes you've put forward really mimic a more general solution brought forth here in an incomplete PR: #932.

The idea of the PR was to add support for HTML -> Markdown and leverage the translations tools already built (for all services)... But it was never completed by the individual who started it. Right now, when the (Apprise) input is HTML and the output is markdown, the html_to_text() (in conversion.py) translator is still used; see here vs an ideal (and unwritten) html_to_markdown().

Your PR seems to convert html_to__very_restrictive_and_custom_telegram_markdown(). Not sure if there if this could be split apart? I could see a global conversion that does HTML to markdown as best as it can (aligning with all rules), and then the Telegram handling it's unique edge cases if self.notify_format == NotifyFormat.MARKDOWN ?

I appreciate you trying to forward port a change you made on our Apprise-Go adaptation; i just need time to digest what is being submitted.

@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b463f23) to head (3fd943f).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##            master     #1624    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          210       212     +2     
  Lines        29834     30150   +316     
  Branches      4973      5029    +56     
==========================================
+ Hits         29834     30150   +316     
Files with missing lines Coverage Δ
apprise/apprise.py 100.00% <100.00%> (ø)
apprise/plugins/telegram.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Purpose: address PR feedback asking whether the Telegram formatter should be split from a future general HTML-to-Markdown converter.
- Before: the Telegram HTML renderer and conversion hook worked, but their scope was implicit and looked like a plugin-local replacement for a broader converter.
- Problem: reviewers had to infer why Telegram bypasses the global conversion path for Markdown targets.
- Change: document that the HTML renderer is intentionally Telegram-specific and that the conversion hook keeps original content so send() can apply Telegram parse-mode escaping.
- Validation: run ruff check, ruff format --check, and focused Telegram/API pytest coverage.
@caronc

caronc commented Jun 16, 2026

Copy link
Copy Markdown
Owner

I feel #1638 correctly handles this PR as this is to focused on 1 service and not all of them. Thanks for PR reguardless! 🙏

@caronc caronc closed this Jun 16, 2026
@elibosley

elibosley commented Jun 16, 2026 via email

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants