Why Docker was a natural choice for TellusR
TellusR started out as a simple plugin, and Solr plugins are simple jar files. So why are we using Docker containers in TellusR?
Some plug-in jar files are pretty small, since after all, compiled JVM code isn’t all that big; and some have a considerable size, since they can carry their dependencies within them. But in both cases, a Solr plugin can be just a simple jar file, downloaded from somewhere and put in the correct folder on your server. After a quick Solr restart, the plugin is running. And TellusR was this easy in the beginning.
But of course, even though every project starts out with a small amount of code and compiles to an equally small amount of byte code, they inevitably grow. In the beginning, TellusR could give an overview of your Solr search speed, the number of searches, the number of null-hits and similar metrics. All of this is not too complicated. But we soon found it interesting to refine the way Solr builds its list of keywords for different documents, to interfere with the search autocomplete suggester, to add some serious NLP to the search mechanisms in order to ameliorate the search itself, etc. And we then found ourselves in need of more than just the JVM to make the whole thing work well.
The first one of those needs was Python. We turned to Python for its great NLP libraries. Now, we still could have shipped everything in a single jar file. Our Python NLP scripts could be shipped within the jar file and extracted and run from there. But then Python needs a runtime environment, an interpreter, and we would need to make sure that a usable version of Python is installed on the machine where TellusR is supposed to run. And it has to be Python 3. How would users feel about having to install Python 3 in order to run a Solr plugin on the JVM? And how could we make it un-complicated?
Second, we had a need for storing data. The statistics that TellusR aggregates from your Solr searches are best stored in a database so they can survive a machine reboot. So should we write a lot of code to accommodate for just any database that the user could have on his/her server? And if there was no A third problem was on our own end. As the TellusR codebase grew, so did the number of developers involved. Some of us are using Linux machines, some are on their Macs. Hitting the right file path for different OSes carries the same difficulties as the Python problem. And this is equally true for end-users: we want TellusR to work on all machines, be it Windows or BSD- and Unix-variants. So how would we make sure that the files we need in runtime are found on the machine running TellusR?database on the machine − should we install one? How would that affect the installation process?
The very simple answer to these worries is in the title of this article. Docker permits us to distribute containers with a complete OS − the full environment needed, not only the code that shall be running. Modern containers take up almost no more space than what is needed for the application to run, so the resulting download isn’t much bigger than what a jar file would be. The alternative is a far more complicated installation procedure for TellusR. With Docker, our install script simply downloads three small components: a jar file (which is the «actual» Solr plugin), a Docker container for TellusR/NLP, and a Docker container for the TellusR/Central. And as a nice little extra, the CLI output from the Docker installation is very elegant and easy to read, so as we use that in our own install script, we look very fancy!
Happy docking your TellusR installation!