Mateen Kiani
Published on Tue Aug 05 2025·3 min read
HTML is the backbone of web content, but Markdown offers simplicity and readability. Developers often need to extract HTML from templates, emails, or web scrapers and turn it into Markdown for docs or static site generators. Yet, converting every tag, link, or image can be a headache without the right approach.
"A small library can save huge time when dealing with conversions."
This guide walks through popular libraries and custom methods to transform HTML into clean Markdown. Ready to see how to turn a chunk of HTML into Markdown in just a few lines of Python?
When starting out, pick a library that fits your needs:
Each option comes with pros and cons in terms of flexibility, dependencies, and output style.
First, install the library:
pip install markdownify
Then run a basic conversion:
from markdownify import markdownify as mdhtml = '''<h1>Welcome</h1><p>This is a <strong>test</strong> of HTML to markdown.</p>'''markdown = md(html)print(markdown)
The output looks like:
# WelcomeThis is a **test** of HTML to markdown.
You can tweak how tags map to Markdown using the heading_style
or custom rules.
Most converters handle links and images by default, but you can adjust them:
from markdownify import markdownify_with_optionsopts = {'heading_style': 'ATX','bullet_list_marker': '-','strip': ['img'],}markdown = markdownify_with_options(html, opts=opts)
Tip: To customize image syntax, catch
<img>
tags with BeautifulSoup and build Markdown strings manually.
When libraries fall short, parse HTML yourself:
from bs4 import BeautifulSoupdef html_to_md(html):soup = BeautifulSoup(html, 'html.parser')md_lines = []for el in soup.find_all(['h1','p','a','img']):if el.name == 'h1':md_lines.append(f'# {el.text}')elif el.name == 'p':md_lines.append(el.get_text())elif el.name == 'a':href = el.get('href')md_lines.append(f'[{el.text}]({href})')elif el.name == 'img':alt = el.get('alt','')src = el.get('src')md_lines.append(f'')return '\n\n'.join(md_lines)
Here, you can use append to string techniques or build lists of lines.
Once you have your Markdown, write it out:
md_content = html_to_md(html)with open('output.md', 'w', encoding='utf-8') as f:f.write(md_content)
For line-by-line writing, check Python write to file line by line.
<table>
tags to pipe syntax.glob
or os.walk
.Converting HTML to Markdown in Python can be quick and reliable with the right tools. For most tasks, libraries like markdownify or html2text handle the heavy lifting. When you need full control, BeautifulSoup offers a flexible way to parse and rebuild content. Try these methods in your next project to keep your documentation DRY and easy to maintain.