How to use jupyter notebooks efficiently?


#1

Dear ESDL Team and Users,

I wanted to ask some general questions. I struggle to understand how to use jupyter notebooks efficiently.

My plan was to migrate an already existing project to the ESDL framework to make use of the datasets provided there. The project consists of multiple self-written python functions distributed over many files. Usually, I use PyCharm as my IDE which works quite well.
However, now I am trying to understand how to handle this setup with jupyter notebooks.
If I put everything in one ipynb file, the ipynb file becomes quite large and possibly confusing or complex. For the moment I have handled it in a way that I have one main function in jupyter notebook and the rest of the functions are python files which I import to the ipynb file.
However, I am not convinced that this is the best way. Furthermore, in the end we should provide a ‘fully reproducible workflow as a jupyter notebook’ and the workflow should of course be readable and reproducible.

So my question is how to develop bigger projects with many functions in jupyter notebooks?
Is it better to have all the code in one ipynb file, even if the file gets quite large?
How do the other Early Adopters handle their code?

Another more JupyterLab related question is if there is a possibility to debug my code?

Best regards,
Laura


#2

Hey Laura (and all),

So I personally think this is a really good question. I’ve struggled with this when dealing with Jupyter notebooks and Jupyter lab in general.

When I’m constrained to just using Jupyter lab, I like to work together with the notebook, the text editor and the terminal. Basically exactly what you do. It’s not my favourite as I like using IDEs (VSCode is the one I like to use). So as I’m developing stuff in a Jupyter notebook and I notice I’m using a recurring function: I’ll create a .py file for it and then dump it in there with really good comments with examples of how to use it; and maybe some testing if I feel like it. It keeps my analysis notebooks much lighter that way. I got this strategy from watching this reproducible data analysis in Jupyter series from Dr. Jake VanderPlas if anyone else is interested. He covers many aspects such as python packaging, unit testing, refactoring and debugging. Overall, I think this workflow has worked pretty well for me in the past and I don’t think it takes away the reproducible aspects as long as I don’t skip too many steps.

But yeah, I would like to pivot off of what Laura said: does having a jupyter notebook with a python package on the side count as a ‘fully reproducible workflow as a jupyter notebook’?

Cheers,
Emmanuel


#3

Hi Laura,

Maybe you could be more specific about what you do. In theory(!) the xarray envirnoment should provide you with tools to keep your Python code at minimum. However, there might be more complex models, that need a software to be developed e.g. a Primary Production Model.

Hence, a couple of remarks re notebooks: You are right, the lab is no full IDE. It is a publishing tool. If you have heavy Python code e.g. a complex model, I would indeed recommend developing the software using an IDE like Pycharm and install your software in your hub environment as you have done. Organise your code on GitHub to stay reproducible. However, once your code is up you can access the python files using the lab’s text editor which can do syntax highlighting. Notebooks also offer debugging (see below). Hence, adjustment is possible without moving code back and forth.

Another note on reproducibility: The sentence ‘fully reproducible workflow as a jupyter notebook’ is probably a bit confusing. Reproducible does not mean presenting all lines of code you develop in a notebook. The notebook should focus on presenting your research. Simply speaking: One needs to be able to install your software (e.g. from GitHub) and push Run All. As a result one should get a readible scientific document. You can even add your notebook to your GitHib repo. GitHub will render the document.

To answer your questions explicitely:

  • So my question is how to develop bigger projects with many functions in jupyter notebooks?

Use Pycharm or the like locally to dev and test your code.

  • Is it better to have all the code in one ipynb file, even if the file gets quite large?

No.

  • How do the other Early Adopters handle their code?

Use GitHub to organise your code. In the hub, use the terminal to pull your code from the repo and import it into your notebook. If you want to be more sophisticated, have a look into setuptools.

Another more JupyterLab related question is if there is a possibility to debug my code?

Yes, you can. The notebook includes pdb. Have a look at this article: debugging-jupyter-notebooks

I hope this helps

Helge