In order to understand well something I find it's easier to program it.
I often did this in college, with building an library to build FSA, logic solver, compilers...
I wanted to start blogging, and I had heard about a number of tools in the space like Jekyll, Hugo, Zola..., but I was never motivated enough & thought I had nothing to tell that was interesting. But I feel that I have developed some niche software that could be useful to people, and want to share more about it (like my neovim-flake for NixOS, but that is a topic for another time).
I find the easiest way to motivate myself is to do something new and making a static site generator seems like a great & quick project in order to get hooked on blogging (though I often build tools and never use them...).
So before starting here are the few features I was sure I wanted:
- Support for no javascript. I don't like writing javascript, when I have to make frontend stuff I tend to write in rust with something like yew. Also, a static website should really not need javascript.
- Syntax highlighting for code. This is a programming blog, so it should feature pretty code. This could be a problem for two reasons: I don't want javascript & I use niche languages like
Nix
.
- Pretty automatic building of articles. I just want to write markdown in a file, and have it rendered and picked up by every part without having to write a list of articles or anything like that.
- Tools to automatically refresh the articles while I am writing them.
So now let's dig in how to make everything in that list.
That's the part that was the easiest to do, just have to find a library that converts markdown to HTML and we pretty much done.
As I like building in Rust, I settled on pulldown-cmark
, it's easy to us, you just have to call two functions to parse your markdown and to render.
As I knew that I wanted to customize the rendering of html the next step was to copy the html
module from pulldown-cmark
in my own crate, adapt a few imports, and voila.
I had never done something like this before, so I did not really know how to do. The easiest way would have been to import a JS library and be done with it, but this was not an option.
I first tried to use the syntect
crate, it did exactly what I wanted but had a huge drawback: It did not support out-of-the-box the Nix
language. I could maybe add support for it using a Sublime Text file, but it seems few people are interested in writing one, and I don't use sublime text.
Having seen that it used Sublime Text I thought about using tree-sitter. My editor of choice is neovim and it supports tree-sitter for highlighting code.
There are several tree-sitter parsers on crates.io, and even a bit more that are only published as git repositories, and importantly it has parsers for all languages I am currently interested in. It even has a library for generating prettied up (with inline CSS) HTML: tree-sitter-highlight.
The only problem is that the default queries bundled with the parsers were not really as detailed as I could want, and more importantly tree-sitter-highlight does not tell you how to highlight. It only tells you where you need to highlight by calling a callback with the current node.
So I was going to need to fix both these problems, and I am not really a good person for choosing color themes (damn you color-blindness!). My thought was to piggy-back on the neovim infrastructure. Indeed the plugin nvim-treesitter defines a number of (expansive) queries, and it helps map those queries on Vim Highlight groups.
So my first step was to write a build script that would include those queries in a library. This can be done with something like this:
fn main() -> std::io::Result<()> {
let queries = concat!(env!("NVIM_TREESITTER"), "/queries");
let mut out_file = std::fs::OpenOptions::new()
.create(true)
.truncate(true)
.write(true)
.open(Path::new(
&env::var("OUT_DIR").unwrap().join("ts_config.rs"),
));
write!(
out_file,
r#"
use once_cell::sync::Lazy;
use tree_sitter_highlight::HighlightConfiguration;
type HiCfg = Lazy<HighlightConfiguration>;
"#
)?;
for language in &["rust", "json"] {
write!(
out_file,
r#"pub static CONFIG_{language}: HiCfg = Lazy::new(|| {
HighlightConfiguration::new(
tree_sitter_{language}::language(,
include_str!("{queries}/{language}/highlights.scm"),
include_str!("{queries}/{language}/injections.scm"),
include_str!("{queries}/{language}/locals.scm"),
)
})"#
)?;
}
}
This will create a ts_config.rs
file that we can just include in our lib.rs
in order to have access to the different configurations. The only prerequisite is to add each language we want to support in our Cargo.toml
.
Note: There are some edge cases not handled here: not all languages have all queries, and we don't really want a different static for each language because we want them to choose them dynamically. Check Verin's source for the complete example. Most following examples of code will be simplified in some way, so this applies to the rest of the post.
Now that we have an expansive set of queries with an easy way to support more languages we can start with the next part: Theming the syntax tree. Because I used the same queries than nvim-treesitter I could check how it maps the different tree-sitter nodes to vim higlight groups. I can then take a neovim theme that I enjoy and map the colors to the highlight groups.
In practice this means something like this:
pub const NODES: &[&str] = &[
"annotation",
"attribute",
// many values
"type.definition",
];
static THEME: Lazy<HashMap<&'static str, &'static str>> = Lazy::new(|| {
let mut map = HashMap::new();
map.insert("annotation", "#d183e8");
map.insert("attribute", "#74b2ff");
// many values
map.insert("type.definition", "#36c692");
map
});
In pratice the same few colors are used a lot, so instead of inserting the raw string in the map we can insert a static
that contains it.
It is then pretty easy to highlight some code, we just need to find the correct configuration in our ts_config.rs
, apply the set of nodes using HighlightConfiguration::configure
, setup the html highlighter with a callback that will fetch the correct color from the THEME
map. This is implemented in the ts-highlight-html helper crate.
Now that we have solved the syntax highlighting problem we just need to parse our markdown file and wrap them in some HTML.
In order be able to define a common theme for all my articles I wanted to be able to wrap each article inside a template. I used liquid because the API is nice and not too magic.
Each article is going to need to define some amount of metadata that can't really be expressed in markdown, so I followed a similar path as pandoc, embed some metadata at the top (Note: pandoc is awesome, please use if you just need to convert some markdown to HTML/PDF/jira issue...).
Verin will look for the marker /~
in the document, and everything before it will be interpreted as a TOML document. For example the metadata for this document is:
title = "I wanted to blog a bit"
date = "10/09/2022"
page = "article"
summary = """\
I wanted to blog, but I knew I wanted some features that could be complex:
- No javascript
- Automatic discovery of my notes
- Highlighting for niche languages like nix.
And I wanted to build my static site generator too! This is how I made it.
"""
The format of the date is configurable for you ISO-8601 enthusiasts
The most important information in this header is the page
. It allows Verin to automatically look for a liquid template that matches the name, and then pass to the template the rest of the metadata along with the rendered HTML for the markdown.
There must always be a special template, the index.liquid
template that is used to render the index.html
file. When we render an article we can store some information such that the index template has access to a list of articles in order to present them like it wants.
We now have all the information to write the main function of Verin:
fn render_website(input: PathBuf, output: PathBuf) -> Result<()> {
// This is some global information for the blog, like it's name and the format of the dates
let config = load_config(&input)?;
let templates = HashMap::new();
for entry in glob(&input.as_path().join("**/*.liquid").to_string_lossy())? {
let entry = entry?;
let template = liquid::ParserBuilder::with_stdlib()
.build()?
.parse_file(&entry)?;
templates.pages.insert(
entry
.file_stem()
.unwrap()
.to_str()
.ok_or(eyre::eyre!("Template name should be valid UTF-8"))?
.to_owned(),
template,
);
}
let mut articles = Vec::new();
for entry in glob(&input.as_path().join("**/*.md").to_string_lossy())? {
let entry = entry?;
let (metadata, body) = parse_metadata(entry);
let html = render_markdown(body)?;
// Fetch the correct template & render to file
write_to_file(&templates, &html, &metadata, &output)?;
// Store articles for indexing
add_article(&mut articles, &metadata);
}
// Render the index with all the articles
render_index(&templates, &articles)
}
The last step to make it as enjoyable as possible is to write some tooling to handle building the posts.
Because Verin is made to be pretty much used by myself I am going to take a source first approach. This means that the rust code for Verin and the markdown & templates for the posts are going to sit in the same workspace.
This allows me to use the xtask framework to define commands for my tooling.
This one is the easiest. I just need to find the root of the workspace and then I can take $root/posts
and write to target/html
.
In order to find the root of a workspace you can use the CARGO_MANIFEST_DIR
environment variable and go up some ancestors with Path::new(...).ancestors().nth(1)
.
Then you can just execute env!("CARGO") xtask build {root}/posts {root}/target/html
. I recommend using a crate like duct to reduce the boilerplate as recommend in the xtask description.
This one is a bit more complex because we want to refresh a page on a browser. We can go with the following plan:
- Create a server listening for websocket connections
- When in need of refresh send a message in the websocket to trigger some JS that refreshes the page.
- To detect the need of a refresh we can use cargo-watch and send a message through tcp to the websocket server
But wait! Didn't we say no JS in our generated HTML? Well yes, in production. So we are going to introduce two types of build in Verin: debug or not. In debug mode we add this tiny bit of javascript:
let ws = new WebSocket("ws://localhost:4111");
ws.onopen = function (_) {
console.log("WS started");
};
ws.onmessage = function (_) {
console.log("REFRESH");
window.location = window.location;
};
ws.onerror = function (error) {
console.log(`[error] WS error: ${error.message}`);
};
We can define the command xtask refresh-server
that will do the following:
- start a
TcpListener
in a new thread that listens for websockets. Whenever it receives a new connection, it spawns a thread waiting for refresh demands with a Bus
from bus. When a refresh is asked it writes to the websocket.
- start a
TcpListener
in the main thread that listens for new refresh requests. Whenever a new connection is accepted it sends a value through the Bus
.
We then define the xtask refresh-request
that will run cargo xtask build --debug
followed by a connection to the refresh-server.
Finally we can define the cargo xtask watch
command that will do the following:
- Run
cargo xtask refresh-server
- Run
cargo watch -x "cargo xtask refresh-request"
So whenever we have a change in either the Verin source or a post we re-build everything and we refresh the page, neat!
With the tooling we defined it's super easy to define a CI workflow that will build & publish on each push.
I self host a server with Gitea, so I looked into the simplest CI and found drone to be pretty simple to use.
I have the following pipeline:
kind: pipeline
type: docker
name: default
steps:
- name: cloning nvim-treesitter
image: alpine/git
commands:
- git clone https://github.com/nvim-treesitter/nvim-treesitter
- name: generating website
image: rust:1.63
commands:
- NVIM_TREESITTER=$(realpath nvim-treesitter) cargo xtask build
- name: publish built pages
image: drillster/drone-rsync
settings:
hosts: ["selfhost.server"]
port: 10022
user: traxys
key:
from_secret: ssh_key
source: ./target/release/html/
target: /path/to/blog-render/
That allows to build & publish the website to a local repository on each push. In the future I will think about how to automatically publish the content of the website of some event (tags maybe?) to my hosting solution.
Because I have not used Verin a lot I don't have many pain points but I have the following:
- Extend markdown with custom syntax for the notes in this article
- Refresh & scroll back to a more useful spot in watch mode.