## Customization checklist

Take the following steps to fully customize Maneage for your research project. After finishing the list, be sure to run ./project configure and project make to see if everything works correctly. If you notice anything missing or any in-correct part (probably a change that has not been explained here), please let us know to correct it.

As described above, the concept of reproducibility (during a project) heavily relies on version control. Currently Maneage uses Git as its main version control system. If you are not already familiar with Git, please read the first three chapters of the ProGit book which provides a wonderful practical understanding of the basics. You can read later chapters as you get more advanced in later stages of your work.

## First custom commit

1. Get this repository and its history (if you don't already have it): Arguably the easiest way to start is to clone Maneage and prepare for your customizations as shown below. After the cloning first you rename the default origin remote server to specify that this is Maneage's remote server. This will allow you to use the conventional origin name for your own project as shown in the next steps. Second, you will create and go into the conventional master branch to start committing in your project later.

git clone https://git.maneage.org/project.git    # Clone/copy the project and its history.
mv project my-project                            # Change the name to your project's name.
cd my-project                                    # Go into the cloned directory.
git remote rename origin origin-maneage          # Rename current/only remote to "origin-maneage".
git checkout -b master                           # Create and enter your own "master" branch.
pwd                                              # Just to confirm where you are.
2. Prepare to build project: The ./project configure command of the next step will build the different software packages within the "build" directory (that you will specify). Nothing else on your system will be touched. However, since it takes long, it is useful to see what it is being built at every instant (its almost impossible to tell from the torrent of commands that are produced!). So open another terminal on your desktop and navigate to the same project directory that you cloned (output of last command above). Then run the following command. Once every second, this command will just print the date (possibly followed by a non-existent directory notice). But as soon as the next step starts building software, you'll see the names of software get printed as they are being built. Once any software is installed in the project build directory it will be removed. Again, don't worry, nothing will be installed outside the build directory.

# On another terminal (go to top project source directory, last command above)
./project --check-config
3. Test Maneage: Before making any changes, it is important to test it and see if everything works properly with the commands below. If there is any problem in the ./project configure or ./project make steps, please contact us to fix the problem before continuing. Since the building of dependencies in configuration can take long, you can take the next few steps (editing the files) while its working (they don't affect the configuration). After ./project make is finished, open paper.pdf. If it looks fine, you are ready to start customizing the Maneage for your project. But before that, clean all the extra Maneage outputs with make clean as shown below.

./project configure     # Build the project's software environment (can take an hour or so).
./project make          # Do the processing and build paper (just a simple demo).
# Open 'paper.pdf' and see if everything is ok.
4. Setup the remote: You can use any hosting facility that supports Git to keep an online copy of your project's version controlled history. We recommend GitLab because it is more ethical (although not perfect), and later you can also host GitLab on your own server. Anyway, create an account in your favorite hosting facility (if you don't already have one), and define a new project there. Please make sure the newly created project is empty (some services ask to include a README in a new project which is bad in this scenario, and will not allow you to push to it). It will give you a URL (usually starting with git@ and ending in .git), put this URL in place of XXXXXXXXXX in the first command below. With the second command, "push" your master branch to your origin remote, and (with the --set-upstream option) set them to track/follow each other. However, the maneage branch is currently tracking/following your origin-maneage remote (automatically set when you cloned Maneage). So when pushing the maneage branch to your origin remote, you shouldn't use --set-upstream. With the last command, you can actually check this (which local and remote branches are tracking each other).

git remote add origin XXXXXXXXXX        # Newly created repo is now called 'origin'.
git push --set-upstream origin master   # Push 'master' branch to 'origin' (with tracking).
git push origin maneage                 # Push 'maneage' branch to 'origin' (no tracking).
5. Title, short description and author: The title and basic information of your project's output PDF paper should be added in paper.tex. You should see the relevant place in the preamble (prior to \begin{document}. After you are done, run the ./project make command again to see your changes in the final PDF, and make sure that your changes don't cause a crash in LaTeX. Of course, if you use a different LaTeX package/style for managing the title and authors (in particular a specific journal's style), please feel free to use it your own methods after finishing this checklist and doing your first commit.

6. Delete dummy parts: Maneage contains some parts that are only for the initial/test run, mainly as a demonstration of important steps, which you can use as a reference to use in your own project. But they not for any real analysis, so you should remove these parts as described below:

• paper.tex: 1) Delete the text of the abstract (from \includeabstract{ to \vspace{0.25cm}) and write your own (a single sentence can be enough now, you can complete it later). 2) Add some keywords under it in the keywords part. 3) Delete everything between %% Start of main body. and %% End of main body.. 4) Remove the notice in the "Acknowledgments" section (in \new{}) and Acknowledge your funding sources (this can also be done later). Just don't delete the existing acknowledgment statement: Maneage is possible thanks to funding from several grants. Since Maneage is being used in your work, it is necessary to acknowledge them in your work also.

• reproduce/analysis/make/top-make.mk: Delete the delete-me line in the makesrc definition. Just make sure there is no empty line between the download \ and verify \ lines (they should be directly under each other).

• reproduce/analysis/make/verify.mk: In the final recipe, under the commented line Verify TeX macros, remove the full line that contains delete-me, and set the value of s in the line for download to XXXXX (any temporary string, you'll fix it in the end of your project, when its complete).

• Delete all delete-me* files in the following directories:

rm tex/src/delete-me*
rm reproduce/analysis/make/delete-me*
rm reproduce/analysis/config/delete-me*
• Disable verification of outputs by removing the yes from reproduce/analysis/config/verify-outputs.conf. Later, when you are ready to submit your paper, or publish the dataset, activate verification and make the proper corrections in this file (described under the "Other basic customizations" section below). This is a critical step and only takes a few minutes when your project is finished. So DON'T FORGET to activate it in the end.

• Re-make the project (after a cleaning) to see if you haven't introduced any errors.

./project make clean
./project make
7. Don't merge some files in future updates: As described below, you can later update your infra-structure (for example to fix bugs) by merging your master branch with maneage. For files that you have created in your own branch, there will be no problem. However if you modify an existing Maneage file for your project, next time its updated on maneage you'll have an annoying conflict. The commands below show how to fix this future problem. With them, you can configure Git to ignore the changes in maneage for some of the files you have already edited and deleted above (and will edit below). Note that only the first echo command has a > (to write over the file), the rest are >> (to append to it). If you want to avoid any other set of files to be imported from Maneage into your project's branch, you can follow a similar strategy. We recommend only doing it when you encounter the same conflict in more than one merge and there is no other change in that file. Also, don't add core Maneage Makefiles, otherwise Maneage can break on the next run.

echo "paper.tex merge=ours" > .gitattributes
echo "tex/src/delete-me.mk merge=ours" >> .gitattributes
echo "tex/src/delete-me-demo.mk merge=ours" >> .gitattributes
echo "reproduce/analysis/make/delete-me.mk merge=ours" >> .gitattributes
echo "reproduce/software/config/TARGETS.conf merge=ours" >> .gitattributes
echo "reproduce/analysis/config/delete-me-num.conf merge=ours" >> .gitattributes
git add .gitattributes
8. Copyright and License notice: It is necessary that all the "copyright-able" files in your project (those larger than 10 lines) have a copyright and license notice. Please take a moment to look at several existing files to see a few examples. The copyright notice is usually close to the start of the file, it is the line starting with Copyright (C) and containing a year and the author's name (like the examples below). The License notice is a short description of the copyright license, usually one or two paragraphs with a URL to the full license. Don't forget to add these two notices to any new file you add in your project (you can just copy-and-paste). When you modify an existing Maneage file (which already has the notices), just add a copyright notice in your name under the existing one(s), like the line with capital letters below. To start with, add this line with your name and email address to paper.tex, tex/src/preamble-header.tex, reproduce/analysis/make/top-make.mk, and generally, all the files you modified in the previous step.

Copyright (C) 2018-2021 Existing Name <existing@email.address>
Copyright (C) 2021 YOUR NAME <YOUR@EMAIL.ADDRESS>
9. Configure Git for fist time: If this is the first time you are running Git on this system, then you have to configure it with some basic information in order to have essential information in the commit messages (ignore this step if you have already done it). Git will include your name and e-mail address information in each commit. You can also specify your favorite text editor for making the commit (emacs, vim, nano, and etc.).

git config --global user.name "YourName YourSurname"
git config --global user.email your-email@example.com
git config --global core.editor nano
10. Your first commit: You have already made some small and basic changes in the steps above and you are in your project's master branch. So, you can officially make your first commit in your project's history and push it. But before that, you need to make sure that there are no problems in the project. This is a good habit to always re-build the system before a commit to be sure it works as expected.

git status                 # See which files you have changed.
git diff                   # Check the lines you have added/changed.
./project make             # Make sure everything builds successfully.
git add -u                 # Put all tracked changes in staging area.
git status                 # Make sure everything is fine.
git diff --cached          # Confirm all the changes that will be committed.
git commit                 # Your first commit: put a good description!
git push                   # Push your commit to your remote.
11. Start your exciting research: You are now ready to add flesh and blood to this raw skeleton by further modifying and adding your exciting research steps. You can use the "published works" section in the introduction (above) as some fully working models to learn from. Also, don't hesitate to contact us if you have any questions.

## Other basic customizations

• High-level software: Maneage installs all the software that your project needs. You can specify which software your project needs in reproduce/software/config/TARGETS.conf. The necessary software are classified into two classes: 1) programs or libraries (usually written in C/C++) which are run directly by the operating system. 2) Python modules/libraries that are run within Python. By default TARGETS.conf only has GNU Astronomy Utilities (Gnuastro) as one scientific program and Astropy as one scientific Python module. Both have many dependencies which will be installed into your project during the configuration step. To see a list of software that are currently ready to be built in Maneage, see reproduce/software/config/versions.conf (which has their versions also), the comments in TARGETS.conf describe how to use the software name from versions.conf. Currently the raw pipeline just uses Gnuastro to make the demonstration plots. Therefore if you don't need Gnuastro, go through the analysis steps in reproduce/analysis and remove all its use cases (clearly marked).

• Input dataset: The input datasets are managed through the reproduce/analysis/config/INPUTS.conf file. It is best to gather all the information regarding all the input datasets into this one central file. To ensure that the proper dataset is being downloaded and used by the project, it is also recommended get an MD5 checksum of the file and include that in INPUTS.conf so the project can check it automatically. The preparation/downloading of the input datasets is done in reproduce/analysis/make/download.mk. Have a look there to see how these values are to be used. This information about the input datasets is also used in the initial configure script (to inform the users), so also modify that file. You can find all occurrences of the demo dataset with the command below and replace it with your input's dataset.

grep -ir wfpc2 ./*
• README.md: Correct all the XXXXX place holders (name of your project, your own name, address of your project's online/remote repository, link to download dependencies and etc). Generally, read over the text and update it where necessary to fit your project. Don't forget that this is the first file that is displayed on your online repository and also your colleagues will first be drawn to read this file. Therefore, make it as easy as possible for them to start with. Also check and update this file one last time when you are ready to publish your project's paper/source.

• Verify outputs: During the initial customization checklist, you disabled verification. This is natural because during the project you need to make changes all the time and its a waste of time to enable verification every time. But at significant moments of the project (for example before submission to a journal, or publication) it is necessary. When you activate verification, before building the paper, all the specified datasets will be compared with their respective checksum and if any file's checksum is different from the one recorded in the project, it will stop and print the problematic file and its expected and calculated checksums. First set the value of verify-outputs variable in reproduce/analysis/config/verify-outputs.conf to yes. Then go to reproduce/analysis/make/verify.mk. The verification of all the files is only done in one recipe. First the files that go into the plots/figures are checked, then the LaTeX macros. Validation of the former (inputs to plots/figures) should be done manually. If its the first time you are doing this, you can see two examples of the dummy steps (with delete-me, you can use them if you like). These two examples should be removed before you can run the project. For the latter, you just have to update the checksums. The important thing to consider is that a simple checksum can be problematic because some file generators print their run-time date in the file (for example as commented lines in a text table). When checking text files, this Makefile already has this function: verify-txt-no-comments-leading-space. As the name suggests, it will remove comment lines and empty lines before calculating the MD5 checksum. For FITS formats (common in astronomy, fortunately there is a DATASUM definition which will return the checksum independent of the headers. You can use the provided function(s), or define one for your special formats.

• Feedback: As you use Maneage you will notice many things that if implemented from the start would have been very useful for your work. This can be in the actual scripting and architecture of Maneage, or useful implementation and usage tips, like those below. In any case, please share your thoughts and suggestions with us, so we can add them here for everyone's benefit.

• Re-preparation: Automatic preparation is only run in the first run of the project on a system, to re-do the preparation you have to use the option below. Here is the reason for this: when its necessary, the preparation process can be slow and will unnecessarily slow down the whole project while the project is under development (focus is on the analysis that is done after preparation). Because of this, preparation will be done automatically for the first time that the project is run (when .build/software/preparation-done.mk doesn't exist). After the preparation process completes once, future runs of ./project make will not do the preparation process anymore (will not call top-prepare.mk). They will only call top-make.mk for the analysis. To manually invoke the preparation process after the first attempt, the ./project make script should be run with the --prepare-redo option, or you can delete the special file above.

./project make --prepare-redo
• Pre-publication: add notice on reproducibility**: Add a notice somewhere prominent in the first page within your paper, informing the reader that your research is fully reproducible. For example in the end of the abstract, or under the keywords with a title like "reproducible paper". This will encourage them to publish their own works in this manner also and also will help spread the word.