Firstly, I really appreciate Yihui’s post about me.
Today I want to share some tips with people in academia about how to reduce repeated works in research. You might treat this as an extension for Jeek Leek’s excellent book: How to be a mordern scientist.
Which kind of works could be treated as repeated works in research? A lot! However, unless you experiments are all in silico, I just want to focused on the parts after you finished your experiment. Repeated works in data analysis, writing and presentation should be considered.
Doing the same thing multiple times would waste your time. Also such behavior would introduce variances to reproduce your results. If we could not find a robot, a better choice is using codes. An important principle in research is that what you could do should be totally reproducible by other people, and yourself. When you do something related to research, make sure other people could always repeat your operation from the recipe you left. You should always treat yourself as ‘other people’ and such codes or text files would help you reduce the future energy.
- Use Graphical User Interface(GUI)?
Nooooooooo! No one would remember whether certain buttons have been clicked or not. However, you could also use macro to track your operations if possible. Make sure such macro could be shared without certain requirements about expensive software. If you have to use GUI, try to avoid the interactive operations which could affect your results.
- Version control?
Yes! Always use version control for your files and add comments about changes. Use words which could be understood by human beings. Otherwise prepare a code book for yourself. Trust me, you need such code book rather than any other people. Git and github would be a good start.
Yes! Try to create your own rss list and keep track of new papers in passive way. Always start with abstract and end with a note links to original paper. Organize the notes as a book related to your research topic. When a new paper appeared as a note in your knowledge system or books, you could finally use such paper. Avoid guiding by the authors and keep your own system updated as I did for metabolomics here.
- Plot/Process similar data with the same setting?
- Make it clear?
- Market your ideas?
Try to share source files about your ideas as a github repo and this would help you spread your ideas as meme. Also you should consider pre-print sever such as arxiv, bioRxiv and chemrxiv to share your ideas in Physics, Life science and Chemistry, respectively.
- SNS your ideas?
- Similar topic of a bunch of functions?
- Similar topic of your ideas?
- Reproduce the whole environment for data analysis?
Use docker or rocker image to pack all but least dependence or software in a Linux image and share them on dockerhub. You could also distribute your raw data and scripts with your docker image and show the links on your publications. In this case, anyone could validate your results with the same setting. I also did one for metabolomics study.
- Share the data?
- Literature management?
Zotero or fulltext. However, as I shown here, all you need for literature management is DOI. Just build your own knowledge system and organize literature according to your topic. You would benefit from such system by reducing a lot of time to align literature into the sections of your papers since you have already done this at the very beginning.
- I can’t remember those tips
Thanks again, Yihui! Actually you developed most of the packages.