Plugins
插件系统 (Plugins)¶
支持通过插件扩展 (或覆盖) 新的站点、文本处理器 (processors)、客户端 (clients)。
插件可放在本地目录中启用, 也可选择是否覆盖内置实现。
在 settings.toml 中启用¶
[plugins]
# 是否启用本地插件目录
enable_local_plugins = true
# 是否允许本地插件覆盖内置实现
override_builtins = false
# 本地插件路径 (可选)
local_plugins_path = "./novel_plugins"
目录结构¶
.
├─ settings.toml
└─ novel_plugins
├─ sites
│ └─ ciweimao # 示例: 站点键 (site_key), 即 --site 的值
│ ├─ fetcher.py # 必需: 实现会话类 (如 CiweimaoFetcher)
│ ├─ parser.py # 必需: 实现解析类 (如 CiweimaoParser)
│ ├─ searcher.py # 可选: 用于站内搜索
│ └─ client.py # 可选: 不提供则使用通用 CommonClient
└─ processors
└─ processor_a.py # 自定义文本处理器, 名称在 settings.toml 的 processors.name 中引用
sites/<site_key>/中的<site_key>即命令行--site的值, 例如:
novel-cli download --site ciweimao 123456
扩展一个站点¶
一个站点通常包含两个必要组件: Fetcher (抓取页面) 与 Parser (解析页面)
可选组件: Searcher (站内搜索) 、Client (覆盖默认下载/导出流程)
1. Fetcher (抓取)¶
- 职责: 登录、抓取 书籍信息页 与 章节页 (可能为多页)
- 建议继承:
BaseFetcher或更高层的GenericFetcher - 注册: 使用装饰器
@registrar.register_fetcher()
Fetcher 协议 (精简)¶
class FetcherProtocol(Protocol):
site_name: str
async def login(
self,
username: str = "",
password: str = "",
cookies: dict[str, str] | None = None,
attempt: int = 1,
**kwargs: Any,
) -> bool: ...
async def fetch_book_info(self, book_id: str, **kwargs: Any) -> list[str]: ...
async def fetch_chapter_content(self, book_id: str, chapter_id: str, **kwargs: Any) -> list[str]: ...
@property
def is_logged_in(self) -> bool: ...
@property
def login_fields(self) -> list[LoginField]: ...
需要登录的站点可提供交互字段:
@dataclass
class LoginField:
name: str
label: str
type: Literal["text", "password", "cookie"]
required: bool
default: str = ""
placeholder: str = ""
description: str = ""
继承 BaseFetcher (自定义 URL 的简单站点)¶
from typing import Any
from novel_downloader.plugins.base.fetcher import BaseFetcher
from novel_downloader.plugins.registry import registrar
@registrar.register_fetcher()
class B520Fetcher(BaseFetcher):
site_name: str = "b520"
BOOK_INFO_URL = "http://www.b520.cc/{book_id}/"
CHAPTER_URL = "http://www.b520.cc/{book_id}/{chapter_id}.html"
async def fetch_book_info(self, book_id: str, **kwargs: Any) -> list[str]:
url = self.BOOK_INFO_URL.format(book_id=book_id)
return [await self.fetch(url, headers={"Referer": "http://www.b520.cc/"})]
async def fetch_chapter_content(self, book_id: str, chapter_id: str, **kwargs: Any) -> list[str]:
url = self.CHAPTER_URL.format(book_id=book_id, chapter_id=chapter_id)
return [await self.fetch(url, headers={"Referer": "http://www.b520.cc/"}, encoding="gbk")]
继承 GenericFetcher (模板 URL / 分页内置支持)¶
单页信息 + 单页章节
from novel_downloader.plugins.base.fetcher import GenericFetcher
from novel_downloader.plugins.registry import registrar
@registrar.register_fetcher()
class BiquyueduFetcher(GenericFetcher):
site_name: str = "biquyuedu"
BOOK_INFO_URL = "https://biquyuedu.com/novel/{book_id}.html"
CHAPTER_URL = "https://biquyuedu.com/novel/{book_id}/{chapter_id}.html"
信息/章节为多页
from novel_downloader.plugins.base.fetcher import GenericFetcher
from novel_downloader.plugins.registry import registrar
@registrar.register_fetcher()
class Biquge5Fetcher(GenericFetcher):
site_name: str = "biquge5"
BASE_URL = "https://www.biquge5.com"
USE_PAGINATED_INFO = True
USE_PAGINATED_CHAPTER = True
@classmethod
def relative_info_url(cls, book_id: str, idx: int) -> str:
return f"/{book_id}/index_{idx}.html" if idx > 1 else f"/{book_id}/"
@classmethod
def relative_chapter_url(cls, book_id: str, chapter_id: str, idx: int) -> str:
return f"/{book_id}/{chapter_id}_{idx}.html" if idx > 1 else f"/{book_id}/{chapter_id}.html"
信息页和目录页分离
from novel_downloader.plugins.base.fetcher import GenericFetcher
from novel_downloader.plugins.registry import registrar
@registrar.register_fetcher()
class I25zwFetcher(GenericFetcher):
site_name: str = "i25zw"
HAS_SEPARATE_CATALOG = True
BOOK_INFO_URL = "https://www.i25zw.com/book/{book_id}.html"
BOOK_CATALOG_URL = "https://www.i25zw.com/{book_id}/"
CHAPTER_URL = "https://www.i25zw.com/{book_id}/{chapter_id}.html"
GenericFetcher还提供分页钩子should_continue_pagination(...)、相对路径拼接等通用逻辑, 便于快速适配。
2. Parser (解析)¶
- 职责: 将原始 HTML 列表解析为书籍元信息 (含卷/章列表) 与章节正文
- 建议继承:
BaseParser - 注册:
@registrar.register_parser()
Parser 协议 (精简)¶
class ParserProtocol(Protocol):
def parse_book_info(
self,
raw_pages: list[str],
**kwargs: Any,
) -> BookInfoDict | None: ...
def parse_chapter_content(
self,
raw_pages: list[str],
chapter_id: str,
**kwargs: Any,
) -> ChapterDict | None: ...
数据结构:
class ChapterDict(TypedDict):
id: str
title: str
content: str
extra: dict[str, Any]
class ChapterInfoDict(TypedDict):
title: str
url: str
chapterId: str
accessible: NotRequired[bool]
class VolumeInfoDict(TypedDict):
volume_name: str
chapters: list[ChapterInfoDict]
class BookInfoDict(TypedDict):
book_name: str
author: str
cover_url: str
update_time: str
summary: str
extra: dict[str, Any]
volumes: list[VolumeInfoDict]
tags: NotRequired[list[str]]
word_count: NotRequired[str]
serial_status: NotRequired[str]
summary_brief: NotRequired[str]
解析示例¶
from typing import Any
from lxml import html
from novel_downloader.plugins.base.parser import BaseParser
from novel_downloader.plugins.registry import registrar
from novel_downloader.schemas import BookInfoDict, ChapterDict, ChapterInfoDict, VolumeInfoDict
@registrar.register_parser()
class AaatxtParser(BaseParser):
site_name: str = "aaatxt"
def parse_book_info(self, raw_pages: list[str], **kwargs: Any) -> BookInfoDict | None:
if not raw_pages:
return None
tree = html.fromstring(raw_pages[0])
book_name = self._first_str(tree.xpath("//div[@class='xiazai']/h1/text()"))
author = self._first_str(tree.xpath("//span[@id='author']/a/text()"))
cover_url = self._first_str(tree.xpath("//div[@id='txtbook']//div[@class='fm']//img/@src"))
update_time = self._first_str(tree.xpath("//div[@id='txtbook']//li[contains(text(), '上传日期')]/text()"),
replaces=[("上传日期:", "")])
summary = self._first_str(tree.xpath("//div[@id='jj']//p/text()"))
download_url = self._first_str(tree.xpath("//div[@id='down']//li[@class='bd']//a/@href"))
chapters: list[ChapterInfoDict] = []
for a in tree.xpath("//div[@id='ml']//ol/li/a"):
url = a.get("href", "").strip()
chapter_id = url.split("/")[-1].replace(".html", "")
title = a.text_content().strip()
chapters.append({"title": title, "url": url, "chapterId": chapter_id})
volumes: list[VolumeInfoDict] = [{"volume_name": "正文", "chapters": chapters}]
return {
"book_name": book_name,
"author": author,
"cover_url": cover_url,
"update_time": update_time,
"summary": summary,
"volumes": volumes,
"extra": {"download_url": download_url},
}
def parse_chapter_content(self, raw_pages: list[str], chapter_id: str, **kwargs: Any) -> ChapterDict | None:
if not raw_pages:
return None
tree = html.fromstring(raw_pages[0])
raw_title = self._first_str(tree.xpath("//div[@id='content']//h1/text()"))
title = raw_title.split("-", 1)[-1].strip()
paragraphs = []
for txt in tree.xpath("//div[@class='chapter']//text()"):
line = txt.strip()
if not line or self._is_ad_line(txt):
continue
paragraphs.append(line)
if not paragraphs:
return None
content = "\n".join(paragraphs)
return {"id": chapter_id, "title": title, "content": content, "extra": {"site": self.site_name}}
3. Searcher (可选)¶
- 职责: 站内搜索, 返回
SearchResult列表 - 继承:
BaseSearcher - 注册:
@registrar.register_searcher()
Searcher 示例¶
import logging
from lxml import html
from novel_downloader.plugins.base.searcher import BaseSearcher
from novel_downloader.plugins.registry import registrar
from novel_downloader.schemas import SearchResult
logger = logging.getLogger(__name__)
@registrar.register_searcher()
class B520Searcher(BaseSearcher):
site_name = "b520"
priority = 30
BASE_URL = "http://www.b520.cc/"
SEARCH_URL = "http://www.b520.cc/modules/article/search.php"
async def _fetch_html(self, keyword: str) -> str:
try:
async with self.session.get(self.SEARCH_URL, params={"searchkey": keyword},
headers={"Referer": "http://www.b520.cc/"}) as resp:
resp.raise_for_status()
return await self._response_to_str(resp)
except Exception:
logger.error("Failed to fetch HTML for keyword '%s' from '%s'", keyword, self.SEARCH_URL)
return ""
def _parse_html(self, html_str: str, limit: int | None = None) -> list[SearchResult]:
doc = html.fromstring(html_str)
rows = doc.xpath('//table[@class="grid"]//tr[position()>1]')
results: list[SearchResult] = []
for idx, row in enumerate(rows):
href = self._first_str(row.xpath(".//td[1]/a[1]/@href"))
if not href:
continue
if limit is not None and idx >= limit:
break
book_id = href.strip("/").split("/")[-1]
book_url = self._abs_url(href)
title = self._first_str(row.xpath(".//td[1]/a[1]/text()"))
latest_chapter = self._first_str(row.xpath(".//td[2]/a[1]/text()")) or "-"
author = self._first_str(row.xpath(".//td[3]//text()"))
word_count = self._first_str(row.xpath(".//td[4]//text()"))
update_date = self._first_str(row.xpath(".//td[5]//text()"))
results.append(SearchResult(
site=self.site_name,
book_id=book_id,
book_url=book_url,
cover_url="",
title=title,
author=author,
latest_chapter=latest_chapter,
update_date=update_date,
word_count=word_count,
priority=self.priority + idx,
))
return results
扩展文本处理器 (Processors)¶
- 职责: 在导出前对 书籍元信息 与 章节 进行变换
- 协议:
ProcessorProtocol - 注册:
@registrar.register_processor()
class ProcessorProtocol(Protocol):
def __init__(self, config: dict[str, Any]) -> None: ...
def process_book_info(self, book_info: BookInfoDict) -> BookInfoDict: ...
def process_chapter(self, chapter: ChapterDict) -> ChapterDict: ...
配置传入规则
在 settings.toml 中:
[[general.processors]]
name = "cleaner"
overwrite = false
config_1 = true
config_2 = 2
config_3 = "three"
则处理器构造器 __init__(config) 收到:
{
"config_1": true,
"config_2": 2,
"config_3": "three"
}
现有内置处理器文档: 见
docs/3-settings-schema.md#processors-配置包括cleaner(正则/替换) 、zh_convert(OpenCC 简繁转换) 、corrector(pycorrector 纠错, 效果因模型而异)。
注册机制¶
插件通过装饰器注册, 导入模块时自动完成登记, 无需手动导表/清单:
- 站点会话:
@registrar.register_fetcher() - 站点解析器:
@registrar.register_parser() - 站点搜索器:
@registrar.register_searcher() - 站点客户端:
@registrar.get_client() - 文本处理器:
@registrar.register_processor()
1. 关键名如何确定¶
注册器会根据插件模块的路径自动推导出唯一键名 (key), 用来在运行时定位插件。
站点类插件 (Fetcher / Parser / Client / Searcher)
键名来源于模块路径中的:
sites.<site_key>.<kind>
例如以下结构:
novel_plugins/
└── sites/
└── ciweimao/
├── fetcher.py
├── parser.py
└── searcher.py
对应的站点键名为:
ciweimao
命令行即可直接使用该键:
novel-cli download --site ciweimao 123456
站点键名始终自动转换为小写; 若以数字开头, 请加前缀
n(如123abc->n123abc)
文本处理器 (Processor)
键名来源于模块路径中 processors. 之后的部分。
例如模块:
novel_downloader.plugins.processors.cleaner
自动推导键名:
cleaner
在配置文件中可直接引用:
[[general.processors]]
name = "cleaner"
如果处理器使用了子包,例如:
novel_plugins/processors/text/zh_convert.py
则键名为:
text.zh_convert
2. 搜索/加载顺序与覆盖关系¶
注册表会按 "命名空间" 顺序尝试导入模块 (懒加载):
- 内置命名空间:
novel_downloader.plugins - (可选) 本地命名空间: 默认
novel_plugins, 由settings.toml中[plugins]控制
配置文件里:
enable_local_plugins = true开启本地插件local_plugins_path = "./novel_plugins"指定目录override_builtins = true时, 本地命名空间将排在内置之前, 可覆盖同名实现
站点模块的动态导入路径:
{namespace}.sites.<site_key>.<kind>处理器模块的动态导入路径:
{namespace}.processors.<name>
3. 获取与回退策略¶
get_client(site): 未找到站点专属实现时, 自动回退到 通用实现 (CommonClient)get_fetcher(site)/get_parser(site): 未找到会抛ValueError("Unsupported site")get_processor(name): 未找到会抛ValueError("Unsupported processor")get_searcher_class(site): 未找到会抛ValueError("Unsupported site")get_searcher_classes(): 在未指定站点时, 会尝试加载所有已知站点的searcher模块并返回可用列表
所有注册均在模块被导入时生效, 请确保插件文件能被 Python 导入 (路径正确、无语法错误)