Setting up Devops Weekly newsletter’s archive
I am a subscriber of the Devops weekly newsletter. I was wondering a few weeks ago whether it had any web-based archive or not. It would have been more convenient for me to read it on the web than in my mailer. I asked its author whether it had any public archives, and also if he would mind if I created one. He confirmed that it had not had and he gave its approval.
I am a subscriber since 2015, so I have a few hundred issues. Unfortunately not all of them, but enough to start to build something meaningful.
My goal was to build something simple based on technologies that I knew or I was relatively confident that they were not going to need lots of time to learn. I wanted to minimize the costs, but also wanted something stable.
I used previously GitHub pages with Jekyll.
GitHub pages offers you free git-backed hosting.
So it comes implicitly with version tracking and backups.
Also, anybody can open Pull Requests to a public repository so potentially others could contribute easily later.
I thought the best is to keep the project open anyway so
I registered a new GitHub organisation and
I created a repository for the project.
GitHub pages automatically recognises if a repository is named as your <account>.github.io
and it starts to serve its content
under the same domain. devopsweeklyarchive.github.io in our case.
Jekyll is a ruby-based static site generator templating engine. At least from this project’s point of view, these are its most relevant features. Also, it has some nice free templates. I used previously Minimal Mistakes which is one of the most popular. It has multiple options for how you can use it in a project, but this starter template seemed to be the simplest. So I recreated the project repository from this template.
Configuring Minimal Mistakes is quite straight-forward. Mainly it means filling its _config.yml. Which is quite well documented.
Also, I had to delete the example posts from the _posts
folder.
Essentially this is all you need to build an empty site.
To test it locally, you should run this command:
Its output looks like this:
It serves the website on the mentioned url: http://127.0.0.1:4000
.
My local ruby installation was not in the best shape, but reinstalling the related modules fixed its problems.
The next step was to convert the mails into Markdown files. I gathered them into a single mailbox first and ended to use a set of scripts:
split_big_mbox_to_individuals.sh
As its name says, this one is just splitting the big mailbox to single files/issues with formail
.
mbox_to_original.sh
is stripping down the unneeded bits from the mails.
Most of the mail headers and the mailing list footers are quite pointless on the web.
Also, it names the files based on the issue number extracted from the Subject
line.
original_to_post.sh
is responsible for the main Markdown conversion:
With these scripts, I could populate the _posts
folder with the “main” content.
I also added a simple about page.
I uploaded this version since it had the main parts in place already.
Then I tweaked it a bit more.
I did not like the long URL https://devopsweeklyarchive.github.io/, so I have bought a bit shorter one: https://devopsweeklyarchive.com/. Fortunately, GitHub pages supports these alternative names, just a CNAME file has to be provided.
GitHub pages offers now HTTPS for the domains with alternative names, but I also wanted to add Cloudflare to the picture. You can say it is overkill, but still, this is a hobby project, I wanted to play a bit with that as well. Also if you can have DDOS protection and world-wide CDN for free, then why not? Anyway, it was still the same simplicity to configure it as a few years ago. Originally I wanted to buy the domain from them as well, but currently, they offer transfer only. Although this page sounds quite attractive.
And a few more cherries to the top of the cake.
I wanted to add the basic Google management tools so registered for Analytics and Search Console Tools. Connecting them with the Minimal Mistakes template was very simple, just these lines in the configuration:
I wanted to create a site map as well since the site structure is very basic, but when I checked, Minimal Mistakes already generated that for me.
The last piece was to make the site search a bit smarter. Minimal Mistakes came with a default search engine: Lunr, but it searched only in the first 50 words and it was not very sophisticated.
So I choose to enable Algolia. Fortunately, it is fully supported both with Minimal Mistakes and Jekyll.
And then I reached the project’s most puzzling problem.
Why do these 300 pages generate 10000+ objects in Algolia?
I needed the help of Algolia’s support and I had to dig a bit in its internals to understand
that all <p>
tags are going to be indexed separately to be able to search them individually.
They describe this process here.
Also, the original newsletter was organised to have two line-breaks between the “body” of an article and the source link of the article.
Which meant that Jekyll generated two <p>
tags from them.
Understanding this took a bit long, but then removing one of the line-breaks was quite simple.
Another tricky question is how to integrate the indexing step into a static system? Luckily, they have a nice solution to this problem. You can configure Travis to execute some steps triggered by Git pushes. Travis is a much smarter CI system than this, but to solve this particular problem, we do not need its other capabilities. Essentially I needed to copy their example Travis file to the new repository. So now we have smart typo-resistant search functionality.
Finally, the Devops weekly newsletter got its public archive.
Comments