Maximizing Laziness 1
Do you know what I think about most while working?
Why am I working?
You see, my job isn’t about creating wonderful pieces of art. My job isn’t about solving the world’s problems. My jobs isn’t about entertaining people. My job is about finishing my PhD.
Quit the whining and take me to the tutorial.
If you are thinking about doing a PhD for any other reason than having a PhD, reconsider. Actually, no I’m just kidding. We just don’t like people like you. With your god damn ambition and your barbie doll outlook on life. And your happiness.
But I digress.
A PhD such as mine involves two kinds of activities:
Ask insightful scientific questions that drive the discovery of amazing new phenomena that may change the world. Try to answer the questions. You don’t need to be a scientist to do the former, and the latter is impossible (or at least somewhat unlikely). Some people think being in academia has made me a cynic. I can’t reasonably dispute this1.
Running 10,000 simulations. Luckily for me, I’m not an experimentalist. Unluckily for me I’m a computationalist2. And being a computationalist involves running countless simulations using weird esoteric software that was written by other computationalists. Or if you’re really stupid, software that you wrote yourself. And this means writing 10,000 input files.
At the moment these are Gromacs input files. This is what my normal day looks like:
- Write input file 1 to 29. Use a template file because I’m smart. Write changes to each file manually because I’m not smart.
- Launch simulation 3,7,13 on server3c
- Launch simulation 5,8,1 on server1a
- Procrastinate
- Launch simulation 2,9 on server13
- All simulations crash because of wrong input
-
Shoot self with gun
…
- Gun doesn’t fire because of wrong input
- :’(
- Manually go through 29 input files and fix mistake
- goto 2
Running 10,000 simulations means writing 10,000 input files. So when I’m working, I’m writing 10,000 input files. Do you know what I think about most while I’m working?
When you procrastinate: self intoxicate, generously lubricate, deeply aspirate
And automate!3
Here, I’ll show you how to automate the creation of 10,000 gromacs .mdp files. Though this method can be used to create any kind of input files that are text. And it can also be used to generate any sort or repetitive code.
We’ll be using a template engine. What is a template engine you may ask? I’m not entirely sure. But we can use one to automate the creation of many input files. And that is all that matters for now.
There are a few out there, but we’ll be using ibis because it’s lightweight, it’s written in python, it’s in the public domain, and its website is so gosh darn sexy.
It was written for creating static websites lazily (mine is one of these). But it should work the same for .mdp files, or any kind of text files.
First we should create a template. Here is a gromacs template file for some simulations. I want to a 100 simulations, with temperatures ranging from 200K to 300K. My .mdp file template looks like this:
; template.mdp
; ...
; Crap I don't care about right now
; ...
ref_t = {{temp}} ; This is template markup. Note the double curly braces.
; ...
; More crap I don't care about right now
; ...
gen_temp = {{temp}} ; what we can do it twice!?!?
{{temp}} will hold the value for the temperature. And we’ll script creating the files with python and ibis. {{temp}} will be used as a key in a dictionary that we will give Ibis. Ibis will then replace the keys with their values. Of course, we will be sneaky and change the value of the keys on the fly.
Let’s start.
$ pip install ibis
$ mkdir input_files
$ cp template.mdp input_files/
$ cd input_files
# run.py
import ibis
# read in the template file as a string into an ibis template
with open("template.mdp", "r") as tempFile:
template = ibis.Template("".join(tempFile.readlines()))
# now for the fun bit:
for t in range(200, 301):
with open("temp_{}.mdp".format(t), "w") as outFile:
d = {"temp": t} # replace {{temp}} with the value in the variable t
outFile.writelines(template.render(d))
Now to generate the files. Kiss your SSD goodbye.
$ python run.py
$ ls -v temp_*
temp_200.mdp temp_215.mdp temp_230.mdp temp_245.mdp temp_260.mdp temp_275.mdp temp_290.mdp
temp_201.mdp temp_216.mdp temp_231.mdp temp_246.mdp temp_261.mdp temp_276.mdp temp_291.mdp
temp_202.mdp temp_217.mdp temp_232.mdp temp_247.mdp temp_262.mdp temp_277.mdp temp_292.mdp
temp_203.mdp temp_218.mdp temp_233.mdp temp_248.mdp temp_263.mdp temp_278.mdp temp_293.mdp
temp_204.mdp temp_219.mdp temp_234.mdp temp_249.mdp temp_264.mdp temp_279.mdp temp_294.mdp
temp_205.mdp temp_220.mdp temp_235.mdp temp_250.mdp temp_265.mdp temp_280.mdp temp_295.mdp
temp_206.mdp temp_221.mdp temp_236.mdp temp_251.mdp temp_266.mdp temp_281.mdp temp_296.mdp
temp_207.mdp temp_222.mdp temp_237.mdp temp_252.mdp temp_267.mdp temp_282.mdp temp_297.mdp
temp_208.mdp temp_223.mdp temp_238.mdp temp_253.mdp temp_268.mdp temp_283.mdp temp_298.mdp
temp_209.mdp temp_224.mdp temp_239.mdp temp_254.mdp temp_269.mdp temp_284.mdp temp_299.mdp
temp_210.mdp temp_225.mdp temp_240.mdp temp_255.mdp temp_270.mdp temp_285.mdp temp_300.mdp
temp_211.mdp temp_226.mdp temp_241.mdp temp_256.mdp temp_271.mdp temp_286.mdp
temp_212.mdp temp_227.mdp temp_242.mdp temp_257.mdp temp_272.mdp temp_287.mdp
temp_213.mdp temp_228.mdp temp_243.mdp temp_258.mdp temp_273.mdp temp_288.mdp
temp_214.mdp temp_229.mdp temp_244.mdp temp_259.mdp temp_274.mdp temp_289.mdp
Each with the proper value of temperature.
$ diff -y -d --suppress-common-lines temp_200.mdp temp_201.mdp
ref_t = 200 ; note the double curly braces. | ref_t = 201 ; note the double curly braces.
gen_temp = 200 ; what we can do it twice!?!? | gen_temp = 201 ; what we can do it twi
“But wait!”, you say. I am Smarty McNotLazyPants. I don’t need this. Why not just do
template.replace("{{temp}}", str(t))
Well, I don’t know. You could just do that I suppose. But then you’re not Maximizing Laziness. And with a template engine you can do some more interesting things.
Let’s make things a little more complicated. I have a protein that I want to simulate in a variety of conditions. I’m going to do this because I don’t know what is going on. And maybe something interesting will happen if I run enough simulations. What else am I going to do?
I want to run simulations at a variety of temperatures and pressures, with the Berendsen barostat, itself with a number of different parameters, which I will tweak if the pressure is higher than 1.2 bar. I suspect that the most interesting case is at 285 K, so I will write the trajectory more frequently at that temperature and run the simulation with a smaller time step. I could just use a string replace (or even sed shudder), but this way the script is slightly easier to read. And I can generate all the input files with one template file and one script.
Again the template input file (I’ve cut the irrelevant bits out)
dt = {{dt}}
nstxtcout = {{out_freq}}
ref_t = {{temp}}
ref_p = {{pressure}}
gen_vel = yes
gen_temp = {{temp}}
compressibility = {{comp}}
# awesme.py
import ibis
def SomeFunction(*args):
return 5e-5
# these parameters are for a Coarse-Grained simulation
pressures = [0.5, 1.0, 1.5] # in bar
temperatures = range(200, 301)
out_freq1 = 25000
out_freq2 = 5000
dt1 = 0.02
dt2 = 0.01
comp1 = 4.5e-5
# read in the template file as a string into an ibis template
with open("template.mdp", "r") as tempFile:
template = ibis.Template("".join(tempFile.readlines()))
# now for the fun bit:
for t in temperatures:
for p in pressures:
paramDict = {"temp":t, "pressure": p, "dt": dt1,
"out_freq": out_freq1, "comp":comp1}
if t == 285:
paramDict["out_freq"] = out_freq2
paramDict["dt"] = dt2
if p >= 1.2:
paramDict["comp"] = SomeFunction(paramDict["temp"], paramDict["dt"], p) # I can let someone else make this decision.
with open("input_{}_{}.mdp".format(t, p), "w") as outFile:
outFile.writelines(template.render(paramDict))
$ python awesome.py
$ ls -v input*.mdp
input_200_0.5.mdp
input_200_1.0.mdp
input_200_1.5.mdp
...
$ cat input_285_1.5.mdp
dt = 0.01
nstxtcout = 5000
ref_t = 285
ref_p = 1.5
gen_vel = yes
gen_temp = 285
compressibility = 5e-05
Needless to say, the files you generate are still subject to the rules of the program you intend to use them with. Don’t blame me if something breaks.
The most interesting thing about this is that a combination of different templates and Ibis template markup, we can write fairly complicated scripts that produce a diverse set of input files. And because the whole thing is in python, it is possible to automate the set up of and running many different simulations related to a certain system. But be careful, too much automation makes hair grow on your palms.
Ibis is extremely powerful and you can do all kinds of amazing things with it. It even supports complicated programming done in the template itself. Though I suspect I won’t be needing the complicated bits. I recommend visiting its beautiful website. Drink plenty of fluids.
For some resources on learning python see A Whirlwind Tour of Python and especially for data scientists (basically all scientists) Python Data Science Handbook. These books are free, open source and focused on data processing, analysis. Why are there so many online resources for unimportant bullshit. WTF is Django? By the way, Django is itself built with a template engine.
-
This sentence is very useful if you ever write a paper that gets to peer review. ↩
-
Noun: Someone that does runs computational experiments(simulations). Was that too hard Oxford dictionary? How dare you call me a Psychologist? Ew. ↩
-
It took me only 90 minutes to come up with this rhyming joke. Good thing I did that at my office, where I don’t have anything important to do. ↩