How four characters caused 579 errors
When I started work this morning, my goal was to eliminate errors from our server log. We have a couple functions that occasionally throw errors when given weird input. Typically they don’t matter because no data gets messed up and the user doesn’t see anything.
However, they are problematic because they pollute the logs, making it harder to find real errors. After looking at them every day for a while, I decided to get rid of them. Unfortunately, in doing so I managed to introduce a real error. It didn’t affect any users, but it did affect our data.
So I started combing through the logs and looking for errors. Most
of them were easy to fix. Throwing in an
block usually got the job done. Other times preconditions that used
to be true weren’t true any more.
if 'family_id' in request.GET: family = Family.objects(id=request.GET['family_id']).first() if family: request.session['family_id'] = family.id
Generally speaking, calling
objects(...).first() instead of
objects.get(...) is a good way to avoid handling
MultipleObjectsReturned errors. However, the parameters have
to be valid, otherwise it’ll return a
I saw a bunch of
ValidationErrors in the logs, and this was the
culprit. Turns out that
family_id was sometimes set to
which is an invalid ID. I knew it was an easy fix, so I changed it
try: family = Family.objects(id=request.GET.get('family_id')) except (ValidationError, Family.DoesNotExist): pass if family is not None: request.session['family_id'] = getattr(family, 'id', None)
That took care of the
ValidationErrors, but I introduced two more
subtle bugs. So subtle, in fact, that our unit tests missed them.
I pushed my commit and went on with my day.
A couple hours later, Cody let me know that the logs were exploding with errors. The messages were cryptic, but they started around the time I pushed that commit. My heart sunk. I immediately started thinking about the time I accidentally deleted all our data and hoped it wasn’t that bad.
I looked over my commits. (GitHub’s compare view helped
immensely.) At first, nothing seemed amiss. I finally got it down
to three possible commits and scrutinized every bit of them. That’s
when I saw it: I wrote
Family.objects(...) instead of
Family.objects.get(...). That missing
.get meant I was returning
QuerySet instead of a
That wouldn’t have been so bad if it wasn’t for the other change I
made further down. Instead of using
family.id, which would’ve
AttributeError when called on a
QuerySet, I used
getattr(family, 'id', None), which silently returns
I quickly changed the offending code and pushed a new commit. The final working code looked like this:
try: family = Family.objects.get(id=request.GET.get('family_id')) except (ValidationError, Family.DoesNotExist): pass if family is not None: request.session['family_id'] = family.id
During the six hours that the code was missing that very important
.get, it managed to cause 579 errors. Luckily they didn’t affect
any users and only had a minimal impact on our data.
After slamming my head on the desk for a couple minutes, I wrote a handful of test cases to make sure this problem never happened again.