What I learned in my 4 years of open source development.
Posted on 2024-10-05 19:03:05
The Story Behind Simplex: A Journey in Open-Source Development
I am the maker/developer behind the open-source Discord bot called Simplex. In this article, I thought I’d walk you through some of the decisions made with this project, the issues and bug fixes, as well as why the project started. Simplex was a bot I started when I was 14, and now it has 84K users in 500 servers at the time of writing. I hope you enjoy my journey as much as I have.
How It Began
I started Simplex 3 years ago (15 October 2021). Back then, it wasn’t even known as Simplex; it was the Michael Media Group tutorial bot, where I was making videos teaching Python. It’s unclear where this tutorial bot ended and Simplex started. The first GitHub commit says 25 October 2021. Simplex was a testing ground for ideas that I could then make a tutorial on. However, the tutorials never came; I kept packing ideas into this testing bot.
The original goal was to help people make something instead of paying large monthly subscriptions for bots like Mee6. Once I had the bot published, a large driving factor was replacing all of the paid tools that my friends used with one open-source bot that cared about the community.
When starting out, I used to think all of the fees were arbitrary. The costs were ridiculously high for no reason. Once my bot started to get some traction, I realised some form of income was needed to cover the costs, so I set up a Buy Me a Coffee. I also realised some bots (like Mee6) have no reason for all the hidden charges they impose on end users. Bots like that upset me, as their goal is clearly to take as much money as possible from younger people who don’t understand what they are paying for.
Moving Simplex to the Cloud
Simplex started out running discord.py and being locally hosted on my computer. It worked fine—I woke up before anyone else and went to bed last; only me and a few of my mates used the bot.
As the bot started to grow to maybe 20 or 30 servers, I decided leaving my PC on was not a good solution, so I moved it over to the cheapest DigitalOcean plan they had. I was already running a root node for a cryptocurrency called an unknown coin, which I had set up a year beforehand when I was 14. Later, I moved over to Hetzner due to the cheaper prices for more RAM, and I’m still currently using it.
The Tech Stuff
When starting, I had to choose between discord.js and discord.py. I had managed to use Python to do small tasks and automation before, and I had managed to print "Hello World" with JS, so Python was the clear choice. Discord.py was relatively easy to use. I used discord.py for the first six months of the bot until it got deprecated, which left me with two options: Pycord or Hikari.
I chose Pycord because it has a lot of similarities to discord.py. (As a matter of fact, Pycord is a fork of discord.py.) It was also around this time I got help from a programmer called Sid, who made a few commits to create the popular leaving system we have (and also accidentally deleted all of our user data for the second time). (After some convincing, I’ve added this story.)
Sid has popped up every now and then with a useful commit and feedback. It was also around this time we had a boom in people inviting the bot. It blew up in a week, which led to issues with data going missing (even without Sid deleting it).
The Database Journey
JSON files were initially used as a database. Looking back, that was a terrible idea and never should have happened, but at the time, I was a 14-year-old who never had a project that needed to store data long-term before. JSON seemed easy to use. The problem was that if two different servers tried to access the JSON file at the same time, one or both of the servers would be overwritten.
This, of course, was not wanted, so I learnt SQLite3, which I would highly recommend due to how simplistic and reliable it is. I am sure some of you can see the issues here. If you can’t, no worries—it took me multiple months to realise it. SQLite3 is blocking and not asynchronous (article 1 blocking vs non-blocking). This had the possibility of slowing down the bot and the response times if the counting system became any more popular.
This led to me rewriting all of our database calls to use aiosqlite. Aiosqlite is still a type of SQLite database like SQLite3. The difference is that Aiosqlite is a different library and implementation of the same type of database software.
Challenges with Counting
1. Evaluation Issues
Evaluation was an issue that kept rearing its ugly head. To start out, I was just using the eval()
function built into Python until I saw someone on Discord saying “eval is evil” and did some Googling. It turns out I created a security vulnerability where any valid Python code in the counting channels would be executed.
Luckily, this was noticed incredibly quickly before anyone exploited it. My first thought was to filter the strings that got read to just digits and - + ! * / < >
. However, this had two issues:
1. It would no longer evaluate binary, hexadecimal, etc. (This wasn’t a major issue, as I could reimplement them.)
2. The bot could be crashed by evaluating large numbers.
It turns out that if you do eval(10 ** 10 ** 10)
in Python, it will crash the program. (Fun fact: I learnt that this is an issue with a few public bots. I did try to reach out to a few to let them know.) After experimenting with a few solutions, I decided to use a library to make it safe and to stop overflow errors.
2. Base69
The last big thing with counting was base69. I had a computer science lesson about base 8 and hexadecimal, and a friend of mine said, “You know, it’d be funny if there was a way to use base69.” Half an hour later, I had a Python package published during my lunch break, and then that evening, I decided to implement it into the Simplex counting system.
Modular Design
Most people look at the number of files and databases in Simplex and ask two things:
1. “Why have you made so many database files when they could all be tables in one database file?”
2. “Why are there so many Python files?”
Each Python file is a Cog. Cogs are “a collection of commands, listeners, and some state into one class” (discord.py docs).
There are a few reasons for the modular design:
- Debugging: Sometimes it’s hard to work out what exactly is crashing the bot or where all the storage is going. Cogs can be loaded and unloaded while the bot is running.
- Development: When working on my laptop, I don’t have copies of all the databases, so I can work on one cog and its corresponding database without a lot of setup.
- Open Source: This allows people to just download a cog file from GitHub and know it will work.
The Missing Data
We had two instances of data loss very early on. There were maybe a handful of servers who were very understanding and got that we were two 14-year-olds posting code on the internet.
What Went Wrong?
While working on the code, some blank database files got pushed to GitHub. When I was told that it was all clear, I ran a command that I had added to the bot to pull down the files and restart the bot. Unfortunately, Sid’s testing databases were still in the repository.
What Did I Learn?
Key Lessons:
- Managing open source is more than just writing code.
- Preparing for success is just as important as preparing for failure.
- I learnt Python deeply, discovering its quirks and power.
If you enjoyed the blog, feel free to join the Discord or Reddit to get updates and talk about the articles. Or follow the RSS feed.