What I have learnt in 4 years of open-source development.

Posted on 2024-10-05 19:03:05

I am the maker/developer behind the open-source discord bot called Simplex. In this article, I thought I'd walk you through some of the decisions that were made with this project, issues and bug fixes as well as why the project started. Simplex was a bot I started when I was 14 and now it has 84K users in 500 servers at the time of writing. I hope you enjoy my journey as much as I have.

I started Simplex 3 years ago (15 October 2021) back then it wasn't even known as Simplex it was the michael media group tutorial bot. Where I was making videos teaching Python. It's unclear where this tutorial bot ended and Simplex started. The first Github commit says 25th of October 2021. Simplex was a testing ground for ideas that I could then make a tutorial on. However the tutorials never came, I was packing ideas and ideas into this testing bot. The original goal was to help people make something instead of paying large monthly subscriptions for bots like mee6. Once I had the bot published a large driving factor was replacing all of the paid tools that my friends used to use with one open source bot that cared about the community. When starting out I used to think all of the fees were abetry. The costs were ridiculously high for no reason. Once my bot started to get some traction I realised some form of income was needed to cover the costs so I set up a buy me a coffee. I also realised some bots (mee6) have no reason for all of the hidden charges they throw at the end users. Bots like that upset me as their goal is clearly to take as much money as possible from younger people who don't understand what they are paying for.

Simplex started out running discord.py and being locally hosted on my computer. It worked fine I woke up before anyone else and went to bed last, only me and a few of my mates used the bot. As the bot started to grow to maybe 20 or 30 servers I decided leaving my PC on was not a good solution so I moved it over to the cheapest digital ociean plan they had. I was already running a root node for a cryptocurrency called an unknown coin, which I set up a year beforehand when I was 14. Later, I moved over to enter due to the cheaper prices for more RAM and I'm still currently using it.

The tech stuff

When starting I had to choose discord.js or discord.py main. I managed to use Python to do small tasks and automation before and I had managed to print Hello World with JS, so Python was the clear choice. Discord.py was relatively easy to use, I used discord.py for the first six months of the bot until it got deprecated which left me with two options Pycord or Hikari. I chose Pycord because it has a lot of similarities to discord.py. (As a matter of fact, the Pycord is a fork of discord.py) It was also around this time I got help from a programmer called Sid, who made a few commits to creating the popular leaving system we have. (Also accidentally deleted all of our user data for the second time). (After some convincing iv added this story). Sid has popped up every now and then with a useful commit and feedback. It was also around his time we had a boom of people inviting the bot. It blew up in a week which led to issues with data going missing without sid deleting it. JSON files were out used as a database. Looking back that was a terrible idea and never should have happened but at the time I was a 14-year-old who never had a project that needed to store data long-term before. JSON seemed easy to use the thing that neither of us realised when testing it locally was if two different servers tried to access the JSON file at the same time one or both of the servers would be overwritten. This of course was not wanted so I learnt SQLite3 which I would highly recommend due to how simplistic it is and how reliable it is. I am sure some of you can see the issues here, If you can't no worries it took me multiple months to realise it. SQLite3 is blocked and is not asynchronous ( article 1 blocking vs non-blocking ) This had the possibility of slowing down the bot and the response times if counting got any more popular leading to me rewriting all of our database calls to use aiosqlite. Aiosqlite is still a type of SQLite data like SQLite3. It is just Aiosqlite is a different library and a different implementation of the same type of database software.

Sadly data going missing was not the only issue that faced counting. Evaluation was an issue that kept rearing its ugly head. To start out I was just using the eval() function that is built into Python until I saw someone on Discord saying eval is evil and did some googling. Turns out I created a security vulnerability where any valid Python code in the counting channels would be executed. Luckily this was noticed incredibly quickly before anyone exploited it. My first thought was to filter the strings that got read to just digits and - + ! * / < > However this had two issues. The first one was that it would no longer evaluate binary, hexadecimal, etc. This was not a major issue as I could reimplement them. The bigger issue is that the bot could be crashed by evaluating the last numbers. Turns out that if you do eval( 10 ** 10 **1 0) in Python it will crash the program. (Fun fact I learnt that this is an issue with a few public bots, I did try to reach other to a few to let them know) After messing around with a few solutions I decided to use a library to make it safe and to stop overflow errors. The last big thing with counting was base69. I had a computer science lesson about base 8 and hexadecimal and a friend of mine said you know it be funny if there was a way to use base69. Half an hour later I had a Python package published in my lunch and then decided that evening to implement it into the simplex counting system. First, it checks to see if the inputted data is base69, if that fails it will then go into the simpcalc system before updating the database. I think that sums up the counting system.

A big selling point for Simplex was the unlimited free RSS feeds. I did not expect so many issues with it, (besides the me forgetting a remove button), unlimited RSS feeds mean a lot of traffic. A lot of traffic means rate limits, I had 2 issues with rate limits. The first issue is with Discord, there a certain amount of requests that can be sent to Discord (50 requests per second) at the time of writing we have just over 600 RSS feeds. Sending out the feed updates grinds the bot to a halt for an hour Afterwards, there are no message response times of up to 3 seconds delay one day that was a spike and it hit 13 seconds so this had to be dealt with. Now we fetch 5 RSS feeds every second. Then have a second wait before the next 5. While this may seem overkill it drastically helped reduce the latency of more important commands. Another issue we have is with users not understanding what an RSS feed is. We have people put any old website URL and expect something to happen. While URLs don't use much space in storage you might be wondering what the issue is. The issue is we still have to try to download the data from the end and then parse what we think is an XML file. This takes time, bandwidth, and processing power, which could be used for something else. Once a month we now go through and remove all RSS feeds that don't wear the "LastArticle" column of the table is set to NULL and DM the server owner. The last issue that has been fixed now is the layout of the RSS feed. It's just as if you think you found every edge case and every possible layout someone adds an RSS feed with a slightly different layout that the parse cannot read. After a lot of tweaking the reader, I managed to get it working for all standard layouts.

Simplex modual desgin

Most people look at the amount of files and databases that there are in Simplex and ask two things. The most common one is why have you made so many database files when they could all be tables in one database file. And the other one is due to the amount of Python files we have. Each Python file is a Cog. Cogs are "a collection of commands, listeners, and some state into one class" (discord.py docs). There are a few reasons for the modular design. The first reason is debugging, sometimes it's hard to work out what exactly is crashing the bot or where all the storage is going. Cogs can be loaded and unloaded while the bot is running. If there are any issues I can unload the cog and see if the bot starts running smother. When working on my laptop I don't have copies of all the databases so I can work on one cog and the corresponding database without having to do a lot of set-up. The final reason is probably the most important one to me, It allows people to just download a cog file from GitHub and put it in there knowing it will just work. It will create the database file it needs just for that cog without any hassle of having to set up tables in a database they might already have. When I first started out learning how code worked it was mostly by thinking with other people's projects and trying to implement that in my own project. Simplex has always been about giving to the open-source community. I wanted it to be as easy as possible for anyone no matter the skill level to be able to tinker and work on their own version of Simplex.

THE MISSING DATA .

So, we had two losses of data very easily. There were maybe a handful of servers who were very understanding and got that we were two 14-year-olds posting code on the internet. So I want to make it clear that there is no one at fault. Since the incidents we now have systems to stop it from happening again. So we were launching a new feature. I want to say it was leaving, It was definitely around the time we added leaving. Working on the code and some Blank database files got pushed to git. When I was told that it was all clear I ran a command that I added to the bot to pull down the files and restart the bot (this was added to save time with SSH as well as SSH being blocked at my school). The issue is the testing databases on Sids machine were in. The little script was used to make a reset to head if there were any issues. (Sometimes, I'd tinker with the status message directly on the server). Luckily for us no one really minded. We had a backup from a week beforehand, at that point, we didn't have features that had a lot of changing data.

So what have we done to prevent this? The first biggest change is we have a .gitignore file meaning we can't upload any database files to GitHub. This is the most important change. The rest of the changes are in workflows. We now make a copy of all the data before any big changes, I generate the database file on the server itself with a set-up script. Finally, it is time we push out new updates. I aim to do all updates at 6 am UK time on a Sunday. Most people seem to be asleep then so if something breaks it affects the least amount of users. Normally I can roll to the previous backup in 5 minutes but it is nice knowing I have an hour or two without anyone noticing. Ever since then we never lost data due to a mistake that we have made.

what did I learn

I think there are a few lessons here. The main and probably most obvious one is how to both be a part of an open-source community and manage an open-source community. I did not realise how much more there was to open source besides writing code. I learnt how to communicate updates, and changes in frameworks and got to believe in the project. A project the size of Simplex can not survive without people's trust. Let's face it simplex wouldn't exist without you lot using it, recommending changes and code or donating.

I also learned preparing for success is just as important as preparing for failure. I was never expecting anyone to use Simplex when I started. If we being honest this is my first personal project, more than once I had issues that could have been prevented If I had expected success. Would have used more reliable tools, and made sure I had enough bandwidth on the server and enough storage to start with.

Probably the most obvious one but I learnt Python. While I could write Python beforehand it was only ever basic projects or a small API there. It was only once I started working on a project the size of Simplex I really understood what Python could do as well as all of the little quirks of Python. Python can be such a powerful language with some incredibly interesting little features. I don't think I'll ever be able to discover them all, one of the reasons I have now started this blog is so I can share everything I learned about Python.

I have never tried to write something like this. Feedback would be brilliant.

If you enjoyed the blog feel free to join the discord or reddit to get updates and talk about the articles. Or follow the RSS feed.

Support articles

article 1 (blocking vs non-blocking)

Back to Home