HN Tags
Jul 28, 2025
python
programming
llm
ollama
Alright, all the cool kids are doing cool things with LLMs. Now’s my time to make a mess with them.
Yay, it’s yet another clone of Hacker News? No, not quite that.
Categories
I found myself wondering the other day just what proportion of stories in Hacker News are currently about AI now that LLMs are the flavour of the month. It would be handy if the stories came neatly filed into categories in some ways so that I could get a feel for that.
As tends to be the way, I ended up adding this to my ludicrously long list of “things I should play around with at some point, possibly, not today, obviously, but soon.”
It’s also traditional for me to snaffle some kind of suitable domain name as and when these urges take me. I think it’s part of the procrastination cycle; buying a domain name feels like progress towards the goal, but is incredibly low effort, and thus feeling like I’m making progress I wander off and get attracted to the next shiny thing. Hmm.
As it happened, though, I was also casting around for a project to get my extremely rudimentary (like virtually non-existent) Python skills up to some slightly less broken level - and I have some time off in which to do it in. Yay?
So … I did the thing; I now have a little python project that does the following:
- Sucks down the top 30 stories and comments from the Hacker News Firebase feed
- For each story it…
- Asks an LLM to roughly categorise the story
- With the stories categorised, it then generates an html copy of the HN front page with category tags
- For each tag there’s a corresponding category page with the list of stories
- The results are pushed up to an S3 bucket where they are made available via a Cloudfront distribution
This runs hourly, and the retrieval of the stories and comments and the classification are not atomic, so this is not strictly speaking a snapshot of the HN frontpage, but it does give you a rough view on what stories (with categories) were up there within the last hour or so.
Incidentally, my goal here is mostly personal entertainment - I don’t want to deprive HN of any traffic, so all the story and comment links go directly to the corresponding real Hacker News comment page.
Where does this run
I have a terror of accidentally running up some vast AWS bill, so except for the static web hosting everything is running on a tiny Intel NUC box running Debian on my home network. Plus it’s going to be a lot harder for some ne’er-do-well to hack as it just sits on my network and occasionally pushes stuff up to the AWS bucket before triggering a Cloudfront invalidation. Of course now I’ve made this public I suppose I’m just asking for someone to figure out a cunning ploy via a Hacker News comment. Anyway, this mighty beast of a server has an Intel i5-8259U CPU and 8 Gb of memory.

The Intel NUC (the top one)
Although I have the decent-ish Nvidia 3090 graphics card in another machine on my home network the space-heater like characteristics of that when I boot it up are such that I don’t want it up and running 24/7. The Intel NUC turns out to be adequate to run the Qwen2.5:1.5b model under Ollama.
Note also that the “running-in-my-home-office” nature of this means that there are zero guarantees for uptime on the site; any time I want to move the box, if I’m on vacation, if the fan gets irritating, anything like that and I’ll either turn it off or reduce the frequency with which it runs.
Limitations
I think the biggest limitation is that it doesn’t really work all that well. It does do the classification of the stories, and the classifications are usually quite reasonable, but it’s hard to get it to do something consistent that I fully agree with.
In particular, my raison d’ĂȘtre of figuring out which are the AI stories didn’t really work out that well, because it tends to be a bit more wordy and drop things I would think of as being just “AI” into finer groups such as “AI Development”, “AI IDEs”, “LLM tooling”, and so on.
I have a few possible approaches in mind to try to squash that down to something sane, but for now I’ll just run with it - they’re not totally crazy distinctions at least.
After
The reason I was wondering about categories in the first place was that I was trying to create a plausible list of bandwagons that the IT industry has leapt aboard over the last 25 years or so. Things like LLMs are just the latest in a long line that included things like minicomputers, microprocessors, Multimedia CD ROMs, Bitcoin, NFTs, Internet of Things, and so on. That’s not to cast aspersions on these things necessarily; there’s no doubt at all that “the web” and “ecommerce” were in that list and they clearly had a lasting impact.
I thought it would be interesting to validate my plausible list by seeing what categories were highly visible to the Hacker News crowd during its reign. This will no doubt be a bit off base, with things like that Erlang incident that we don’t talk about, but it ought to give me a feel for whether I’m completely deluded with the topics in my list.
So at some point, when hntags is categorising to my satisfaction, that article will probably be based on the same logic that I’m using here.
Don’t hold your breath though. Oooh, I saw a shiny thing…
Python
So how am I doing with the Python stuff? Well, my code’s not going to win any awards, but I’m getting a better understanding of python syntax, have a bit of understanding of what the idiomatic way to do stuff is (list comprehensions are quite cool), and I’m sort of getting to grips with the packaging mechanisms. I wouldn’t want to write any commercial code just yet, but I feel like I’m getting there. uv seems quite nice for what it’s worth.
Resources
The main resource for this project is the Github repository and its README. That includes a section on Weaknesses, Fear, Uncertainty, Doubt which is essentially a stream-of-consciousness dump on where this is and where it’s going.