Maximizing Laziness 1
Do you know what I think about most while working?
Why am I working?
You see, my job isn’t about creating wonderful pieces of art. My job isn’t about solving the world’s problems. My jobs isn’t about entertaining people. My job is about finishing my PhD.
Quit the whining and take me to the tutorial.
If you are thinking about doing a PhD for any other reason than having a PhD, reconsider. Actually, no I’m just kidding. We just don’t like people like you. With your god damn ambition and your barbie doll outlook on life. And your happiness.
But I digress.
A PhD such as mine involves two kinds of activities:
Ask insightful scientific questions that drive the discovery of amazing new phenomena that may change the world. Try to answer the questions. You don’t need to be a scientist to do the former, and the latter is impossible (or at least somewhat unlikely). Some people think being in academia has made me a cynic. I can’t reasonably dispute this1.
Running 10,000 simulations. Luckily for me, I’m not an experimentalist. Unluckily for me I’m a computationalist2. And being a computationalist involves running countless simulations using weird esoteric software that was written by other computationalists. Or if you’re really stupid, software that you wrote yourself. And this means writing 10,000 input files.
At the moment these are Gromacs input files. This is what my normal day looks like:
- Write input file 1 to 29. Use a template file because I’m smart. Write changes to each file manually because I’m not smart.
- Launch simulation 3,7,13 on server3c
- Launch simulation 5,8,1 on server1a
- Procrastinate
- Launch simulation 2,9 on server13
- All simulations crash because of wrong input
-
Shoot self with gun
…
- Gun doesn’t fire because of wrong input
- :’(
- Manually go through 29 input files and fix mistake
- goto 2
Running 10,000 simulations means writing 10,000 input files. So when I’m working, I’m writing 10,000 input files. Do you know what I think about most while I’m working?
When you procrastinate: self intoxicate, generously lubricate, deeply aspirate
And automate!3
Here, I’ll show you how to automate the creation of 10,000 gromacs .mdp files. Though this method can be used to create any kind of input files that are text. And it can also be used to generate any sort or repetitive code.
We’ll be using a template engine. What is a template engine you may ask? I’m not entirely sure. But we can use one to automate the creation of many input files. And that is all that matters for now.
There are a few out there, but we’ll be using ibis because it’s lightweight, it’s written in python, it’s in the public domain, and its website is so gosh darn sexy.
It was written for creating static websites lazily (mine is one of these). But it should work the same for .mdp files, or any kind of text files.
First we should create a template. Here is a gromacs template file for some simulations. I want to a 100 simulations, with temperatures ranging from 200K to 300K. My .mdp file template looks like this:
; template.mdp
; ...
; Crap I don't care about right now
; ...
ref_t = {{temp}} ; This is template markup. Note the double curly braces.
; ...
; More crap I don't care about right now
; ...
gen_temp = {{temp}} ; what we can do it twice!?!?
{{temp}} will hold the value for the temperature. And we’ll script creating the files with python and ibis. {{temp}} will be used as a key in a dictionary that we will give Ibis. Ibis will then replace the keys with their values. Of course, we will be sneaky and change the value of the keys on the fly.
Let’s start.
Now to generate the files. Kiss your SSD goodbye.
Each with the proper value of temperature.
“But wait!”, you say. I am Smarty McNotLazyPants. I don’t need this. Why not just do
Well, I don’t know. You could just do that I suppose. But then you’re not Maximizing Laziness. And with a template engine you can do some more interesting things.
Let’s make things a little more complicated. I have a protein that I want to simulate in a variety of conditions. I’m going to do this because I don’t know what is going on. And maybe something interesting will happen if I run enough simulations. What else am I going to do?
I want to run simulations at a variety of temperatures and pressures, with the Berendsen barostat, itself with a number of different parameters, which I will tweak if the pressure is higher than 1.2 bar. I suspect that the most interesting case is at 285 K, so I will write the trajectory more frequently at that temperature and run the simulation with a smaller time step. I could just use a string replace (or even sed shudder), but this way the script is slightly easier to read. And I can generate all the input files with one template file and one script.
Again the template input file (I’ve cut the irrelevant bits out)
dt = {{dt}}
nstxtcout = {{out_freq}}
ref_t = {{temp}}
ref_p = {{pressure}}
gen_vel = yes
gen_temp = {{temp}}
compressibility = {{comp}}
$ python awesome.py
$ ls -v input*.mdp
input_200_0.5.mdp
input_200_1.0.mdp
input_200_1.5.mdp
...
$ cat input_285_1.5.mdp
dt = 0.01
nstxtcout = 5000
ref_t = 285
ref_p = 1.5
gen_vel = yes
gen_temp = 285
compressibility = 5e-05
Needless to say, the files you generate are still subject to the rules of the program you intend to use them with. Don’t blame me if something breaks.
The most interesting thing about this is that a combination of different templates and Ibis template markup, we can write fairly complicated scripts that produce a diverse set of input files. And because the whole thing is in python, it is possible to automate the set up of and running many different simulations related to a certain system. But be careful, too much automation makes hair grow on your palms.
Ibis is extremely powerful and you can do all kinds of amazing things with it. It even supports complicated programming done in the template itself. Though I suspect I won’t be needing the complicated bits. I recommend visiting its beautiful website. Drink plenty of fluids.
For some resources on learning python see A Whirlwind Tour of Python and especially for data scientists (basically all scientists) Python Data Science Handbook. These books are free, open source and focused on data processing, analysis. Why are there so many online resources for unimportant bullshit. WTF is Django? By the way, Django is itself built with a template engine.
-
This sentence is very useful if you ever write a paper that gets to peer review. ↩
-
Noun: Someone that does runs computational experiments(simulations). Was that too hard Oxford dictionary? How dare you call me a Psychologist? Ew. ↩
-
It took me only 90 minutes to come up with this rhyming joke. Good thing I did that at my office, where I don’t have anything important to do. ↩