While most of the tutorials online nowadays describe influxdb as a very good time series database we discovered that there are some serious considerations you should do before selecting a time series database for your next IoT project.
Here are some findings we did and questions we should have asked when we researched and selected time series databases for a new design of an IoT system:
- To design a stable database engine suitable for production typically takes 10+ years of user feedback and experience from smaller scale testing and continuous improvements. Several of the “high ranking” databases has not been around for that long
- mySQL and ORACLE have stable database engines by now, but they are not time series databases
- Influx has rewritten their database engine several times over a period of few years. This may be a hint to that they are still struggling to converge on a stable database design
- We found several very worrying statements in bug reports related to influx such as “please help, our production database is suddenly corrupt”, “the database has crashed, the backup is also corrupt. There must be something fundamentally wrong with storage”, “we have discontinued functions iusedthisextensivelyandthereisnowayaroundit() in the next release due to a complete rewrite of the database engine” etc.
- Some databases have very inefficient compression algorithms. They will compress, but when trying to read from a compressed table, the penalty is pretty big and it goes too slow forcing you to use uncompressed tables
- Some databases do not support writing to compressed tables (!). You have to decompress before writing. This takes a lot of time. And there is no API support to check if you try to write to a compressed table. You have to resort to parsing log messages before writing. Yes it is true.
- Even if a database shows up high in a ranking, it may be because of it has the highest growth rate. That does not tell you much. If you look closer, you may discover that all the big guys already use more conservative and well proven time series database systems. That people reading tutorials download and install influxdb does not mean that Google, Facebook, Amazon etc uses it. It does not mean that it is the best time series database available.
- Some databases claim to support continuous aggregates. However did you benchmark and test properly that the functions used for reading from the aggregates are stable, does not require 100% CPU, and does not crash your database engine? You may be in for a surprise.
- Some time series databases does not have support for regular data types and regular SQL syntax. This means you will have to have TWO databases. One for configuration data and one for time series.
- Some time series databases from the cloud vendors are extremely expensive as soon as you come up into production. Did you actually check the cost? If not you may be in for a BIG surprise.
- Even plotting time series in a web browser may be too slow. Did you check if your time series database can decimate and deliver data to your front end in real time?