I wanted to blog a bit

10 September 2022

2.3 - Index page
3 - Tooling

About blogging & motivation

In order to understand well something I find it's easier to program it. I often did this in college, with building an library to build FSA, logic solver, compilers...

I wanted to start blogging, and I had heard about a number of tools in the space like Jekyll, Hugo, Zola..., but I was never motivated enough & thought I had nothing to tell that was interesting. But I feel that I have developed some niche software that could be useful to people, and want to share more about it (like my neovim-flake for NixOS, but that is a topic for another time).

I find the easiest way to motivate myself is to do something new and making a static site generator seems like a great & quick project in order to get hooked on blogging (though I often build tools and never use them...).

So before starting here are the few features I was sure I wanted:

So now let's dig in how to make everything in that list.

Generating HTML from markdown

That's the part that was the easiest to do, just have to find a library that converts markdown to HTML and we pretty much done.

As I like building in Rust, I settled on pulldown-cmark, it's easy to us, you just have to call two functions to parse your markdown and to render.

As I knew that I wanted to customize the rendering of html the next step was to copy the html module from pulldown-cmark in my own crate, adapt a few imports, and voila.

Handling Syntax Highlighting

I had never done something like this before, so I did not really know how to do. The easiest way would have been to import a JS library and be done with it, but this was not an option.

I first tried to use the syntect crate, it did exactly what I wanted but had a huge drawback: It did not support out-of-the-box the Nix language. I could maybe add support for it using a Sublime Text file, but it seems few people are interested in writing one, and I don't use sublime text.

Having seen that it used Sublime Text I thought about using tree-sitter. My editor of choice is neovim and it supports tree-sitter for highlighting code.

There are several tree-sitter parsers on crates.io, and even a bit more that are only published as git repositories, and importantly it has parsers for all languages I am currently interested in. It even has a library for generating prettied up (with inline CSS) HTML: tree-sitter-highlight.

The only problem is that the default queries bundled with the parsers were not really as detailed as I could want, and more importantly tree-sitter-highlight does not tell you how to highlight. It only tells you where you need to highlight by calling a callback with the current node.

So I was going to need to fix both these problems, and I am not really a good person for choosing color themes (damn you color-blindness!). My thought was to piggy-back on the neovim infrastructure. Indeed the plugin nvim-treesitter defines a number of (expansive) queries, and it helps map those queries on Vim Highlight groups.

So my first step was to write a build script that would include those queries in a library. This can be done with something like this:

fn main() -> std::io::Result<()> {
    let queries = concat!(env!("NVIM_TREESITTER"), "/queries");
    let mut out_file = std::fs::OpenOptions::new()
        .create(true)
        .truncate(true)
        .write(true)
        .open(Path::new(
            &env::var("OUT_DIR").unwrap().join("ts_config.rs"),
        ));

    write!(
        out_file,
        r#"
        use once_cell::sync::Lazy;
        use tree_sitter_highlight::HighlightConfiguration;

        type HiCfg = Lazy<HighlightConfiguration>;
	"#
    )?;

    for language in &["rust", "json"] {
        write!(
            out_file,
            r#"pub static CONFIG_{language}: HiCfg = Lazy::new(|| {
                HighlightConfiguration::new(
                    tree_sitter_{language}::language(,
                    include_str!("{queries}/{language}/highlights.scm"),
                    include_str!("{queries}/{language}/injections.scm"),
                    include_str!("{queries}/{language}/locals.scm"),
                )
            })"#
        )?;
    }
}

This will create a ts_config.rs file that we can just include in our lib.rs in order to have access to the different configurations. The only prerequisite is to add each language we want to support in our Cargo.toml.

Note: There are some edge cases not handled here: not all languages have all queries, and we don't really want a different static for each language because we want them to choose them dynamically. Check Verin's source for the complete example. Most following examples of code will be simplified in some way, so this applies to the rest of the post.

Theming our AST

Now that we have an expansive set of queries with an easy way to support more languages we can start with the next part: Theming the syntax tree. Because I used the same queries than nvim-treesitter I could check how it maps the different tree-sitter nodes to vim higlight groups. I can then take a neovim theme that I enjoy and map the colors to the highlight groups.

In practice this means something like this:

pub const NODES: &[&str] = &[
    "annotation",
    "attribute",
    // many values
    "type.definition",
];

static THEME: Lazy<HashMap<&'static str, &'static str>> = Lazy::new(|| {
    let mut map = HashMap::new();
    map.insert("annotation", "#d183e8");
    map.insert("attribute", "#74b2ff");
    // many values
    map.insert("type.definition", "#36c692");
    map
});

In pratice the same few colors are used a lot, so instead of inserting the raw string in the map we can insert a static that contains it.

It is then pretty easy to highlight some code, we just need to find the correct configuration in our ts_config.rs, apply the set of nodes using HighlightConfiguration::configure, setup the html highlighter with a callback that will fetch the correct color from the THEME map. This is implemented in the ts-highlight-html helper crate.

Now that we have solved the syntax highlighting problem we just need to parse our markdown file and wrap them in some HTML.

HTML templates

In order be able to define a common theme for all my articles I wanted to be able to wrap each article inside a template. I used liquid because the API is nice and not too magic.

Each article is going to need to define some amount of metadata that can't really be expressed in markdown, so I followed a similar path as pandoc, embed some metadata at the top (Note: pandoc is awesome, please use if you just need to convert some markdown to HTML/PDF/jira issue...).

Verin will look for the marker /~ in the document, and everything before it will be interpreted as a TOML document. For example the metadata for this document is:

title = "I wanted to blog a bit"
date = "10/09/2022"
page = "article"
summary = """\
I wanted to blog, but I knew I wanted some features that could be complex:
- No javascript
- Automatic discovery of my notes
- Highlighting for niche languages like nix.

And I wanted to build my static site generator too! This is how I made it.
"""
The format of the date is configurable for you ISO-8601 enthusiasts

The most important information in this header is the page. It allows Verin to automatically look for a liquid template that matches the name, and then pass to the template the rest of the metadata along with the rendered HTML for the markdown.

Index page

There must always be a special template, the index.liquid template that is used to render the index.html file. When we render an article we can store some information such that the index template has access to a list of articles in order to present them like it wants.

We now have all the information to write the main function of Verin:

fn render_website(input: PathBuf, output: PathBuf) -> Result<()> {
    // This is some global information for the blog, like it's name and the format of the dates
    let config = load_config(&input)?;
    let templates = HashMap::new();

    for entry in glob(&input.as_path().join("**/*.liquid").to_string_lossy())? {
        let entry = entry?;

        let template = liquid::ParserBuilder::with_stdlib()
            .build()?
            .parse_file(&entry)?;

        templates.pages.insert(
            entry
                .file_stem()
                .unwrap()
                .to_str()
                .ok_or(eyre::eyre!("Template name should be valid UTF-8"))?
                .to_owned(),
            template,
        );
    }

    let mut articles = Vec::new();

    for entry in glob(&input.as_path().join("**/*.md").to_string_lossy())? {
        let entry = entry?;
        let (metadata, body) = parse_metadata(entry);
        let html = render_markdown(body)?;

        // Fetch the correct template & render to file
        write_to_file(&templates, &html, &metadata, &output)?;

        // Store articles for indexing
        add_article(&mut articles, &metadata);
    }

    // Render the index with all the articles
    render_index(&templates, &articles)
}

Tooling

The last step to make it as enjoyable as possible is to write some tooling to handle building the posts.

Because Verin is made to be pretty much used by myself I am going to take a source first approach. This means that the rust code for Verin and the markdown & templates for the posts are going to sit in the same workspace.

This allows me to use the xtask framework to define commands for my tooling.

Building: cargo xtask build

This one is the easiest. I just need to find the root of the workspace and then I can take $root/posts and write to target/html.

In order to find the root of a workspace you can use the CARGO_MANIFEST_DIR environment variable and go up some ancestors with Path::new(...).ancestors().nth(1).

Then you can just execute env!("CARGO") xtask build {root}/posts {root}/target/html. I recommend using a crate like duct to reduce the boilerplate as recommend in the xtask description.

Watching for changes: cargo xtask watch

This one is a bit more complex because we want to refresh a page on a browser. We can go with the following plan:

But wait! Didn't we say no JS in our generated HTML? Well yes, in production. So we are going to introduce two types of build in Verin: debug or not. In debug mode we add this tiny bit of javascript:

let ws = new WebSocket("ws://localhost:4111");
ws.onopen = function (_) {
  console.log("WS started");
};

ws.onmessage = function (_) {
  console.log("REFRESH");
  window.location = window.location;
};

ws.onerror = function (error) {
  console.log(`[error] WS error: ${error.message}`);
};

We can define the command xtask refresh-server that will do the following:

We then define the xtask refresh-request that will run cargo xtask build --debug followed by a connection to the refresh-server.

Finally we can define the cargo xtask watch command that will do the following:

So whenever we have a change in either the Verin source or a post we re-build everything and we refresh the page, neat!

Continous Integration

With the tooling we defined it's super easy to define a CI workflow that will build & publish on each push. I self host a server with Gitea, so I looked into the simplest CI and found drone to be pretty simple to use.

I have the following pipeline:

kind: pipeline
type: docker
name: default

steps:
  - name: cloning nvim-treesitter
    image: alpine/git
    commands:
      - git clone https://github.com/nvim-treesitter/nvim-treesitter

  - name: generating website
    image: rust:1.63
    commands:
      - NVIM_TREESITTER=$(realpath nvim-treesitter) cargo xtask build

  - name: publish built pages
    image: drillster/drone-rsync
    settings:
      hosts: ["selfhost.server"]
      port: 10022
      user: traxys
      key:
        from_secret: ssh_key
      source: ./target/release/html/
      target: /path/to/blog-render/

That allows to build & publish the website to a local repository on each push. In the future I will think about how to automatically publish the content of the website of some event (tags maybe?) to my hosting solution.

Future steps

Because I have not used Verin a lot I don't have many pain points but I have the following: