Scripts are documents containing commands written in the syntax of whatever software you are using. When you run a script, the software reads and executes the commands in the order in which they appear.
You will write scripts with commands that instruct your software to carry out all the computations involved in your project: processing the data to prepare it for analysis, generating output for the Data Appendix, and conducting the analysis that produces the results you present in your report. Writing these scripts will constitute the majority of the work you do with your data.
When you have completed your project, anyone interested in your study will be able to reproduce all the computations underlying your results simply by running the scripts.
-
What if I think it would be easier to do some of the data processing or analysis "by hand" instead of with scripts?
For some steps in your data processing and analysis, it might appear easier to use Excel or some other interactive tool than to figure out the syntax for the commands you would need to write in a script to accomplish the same task.
As just one example: Suppose you want to combine data from two sources into a single file, but but don't know the commands you would need to write for your software to accomplish that. And suppose you know an alternative way of accomplishing that task that does not involve writing commands in a script--maybe it would be easy for you to use Excel to copy the data from one file and paste it into the other, or maybe you know how to do that using drop-down menus in a GUI for your software. In cases like this, you may be tempted to do the job with Excel, drop-down menus, or some other interactive tool you are familiar with.
But it is essential to resist that temptation. The key to ensuring the reproducibility of your project is writing scripts that execute all the steps of data processing and analysis required to generate the results you present in your report. Anything you do with drop-down menus, Excel, or any other interactive tool--anything for which you do not write commands in your scripts--will not be reproduced when you run your scripts.
So even if you think it would be easier to do something with an interactive tool like Excel or drop-down menus, you should take the time necessary to figure out how to write the commands that accomplish the task, and write them in a script.
Contents of the Scripts/ Folder
Subfolders
- ProcessingScripts/: The commands in these scripts transform your Input Data Files into your Analysis Data Files.
- DataAppendixScripts/: The commands in theses scripts produce the figures, tables, and descriptive statistics you present in your Data Appendix.
- AnalysisScripts/: The commands in these scripts generate the results you present in the Report you write for your project.
Document
- The master script: A single script that reproduces the Results of your project by executing all the other scripts, in the correct order.
Guidelines for Writing Scripts
The general guidelines presented on this page are the key to ensuring that all the work you do with your data is reproducible. You should follow these general guidelines in every script you write.
Additional guidelines that apply specifically to your processing scripts, Data Appendix scripts, analysis scripts, and the master script are provided on their respective pages.
Target folders
At various points in your scripts, you will need to write commands that tell your software to open existing files. Similarly, at some points you will need to write commands that save new files.
The folder containing an existing file you want to open, or in which you want to save a new file, is called the target folder.
When you write a command that opens an existing file or saves a new file, you need to specify the location of the target folder.
-
For example...
For example, when you write a command to open an Input Data File, you should specify that the target folder is the InputData/ folder (because your Input Data Files are stored in your InputData/ folder); and when you write a command to open an Analysis Data File, you should specify that the target folder is the AnalysisData/ folder (because your Analysis Data Files are stored in your AnalysisData/ folder).
Similarly, when you write a command to save an Analysis Data File, you should specify that the target folder is the AnalysisData/ folder (because you save your Analysis Data Files in the AnalysisData/ folder); and when you write a command to save a figure or table you will use in your Data Appendix, you should specify that the target folder is the DataAppendixOutput/ folder (because you save tables and figures used in the Data Appendix in the DataAppendixOutput/ folder).
Directory paths
You specify the location of the target folder by writing a directory path.
A directory path tells your software what path it must follow through your hierarchy of folders and subfolders to reach a target folder.
There are two types of directory paths. The difference between them is the starting place of the path that leads to the target folder:
- An absolute directory path tells the software how to navigate to the target folder starting from the root drive of the computer you are working on.
- A relative directory path tells the software how to navigate to the target folder starting from whatever folder is currently designated as the working directory for your software.
-
For example...
ABSOLUTE DIRECTORY PATHS
If the root directory on your computer is the C: drive, an absolute directory path to your AnalysisData/ folder might look like this:
>C:/Users/cnguyen/Documents/StatsClass/Project/Data/AnalysisData/
and an absolute directory path to your Results/ folder might look like this:
>C:/Users/cnguyen/Documents/StatsClass/Project/Output/Results/
RELATIVE DIRECTORY PATHS
If the working directory has been set to your Project/ folder, a relative directory path to your AnalysisData/ folder might look like this:
>Data/AnalysisData/
and a relative directory path to the Results/ subfolder of your Output/ folder might look like this:
>Output/Results/
Always use relative directory paths
Use a relative directory path whenever you need to specify the location of a target folder.
Do not use any absolute directory paths in your scripts.
-
Why use only relative directory paths?
Using relative directory paths makes it possible to write scripts that will run on anyone's computer (provided the working directory is set correctly).
If a script contains an absolute directory path that starts at the root directory of some particular computer, it will not run on any other computer.
A convention for designating the working directory
We recommend you adopt a simple convention:
Whenever you are working on or executing any of your scripts, designate your Project/ folder as the working directory.
Since relative directory paths give directions to the target folder starting at the working directory, you must keep track of which folder is designated as the working directory when you are writing or executing a script.
This can be confusing, because different kinds of software have different algorithms for choosing which folder is designated as the working directory when they are launched; sometimes the same software will behave differently depending on the operating system and other factors.
If you follow the convention we recommend, there will never be any ambiguity about which folder should be designated as the working directory at any point in the data processing or analysis.
-
Read about how this convention works
Following this convention has several practical implications:
- As you write your scripts, whenever you need to specify a relative directory path, remember that the starting place for the path should be the Project/ folder (since if you adopt the suggested convention, the Project/ folder will always be the working directory).
- Every time you start a new work session, you must make sure the Project/ folder is in fact set as the working directory.
- First, immediately after you launch your software, check to see what folder your software has designated as the working directory.
- If the working directory is set to something other than the Project/ folder, change the working directory to the Project/ folder.
- Unlike the work you do with your data, all of which must be executed by commands you write in scripts, you may do the steps of checking and changing the working directory at the beginning of a session "by hand"--e.g., using a drop-down menu, or executing a "change directory" command interactively.
- Since the steps of checking the working directory, and, if necessary, changing the working directory to the Project/ folder, are not executed by commands in a script, whenever you or someone else wants to run your scripts they will first have to carry out these steps "by hand".
- After making sure the Project/ folder has been designated as the working directory, do not change the working directory again: do not change it "by hand", and do not write any "change directory" commands in your scripts. Remember, the convention is that the working directory should be set to the Project/ folder at all times. So make sure the Project/ folder is designated as the working directory when you begin work, and then don't change it again.
-
Do I have to choose the Project/ folder as the working directory?
No.
But if you don't choose the Project/ folder as the working directory, you must choose one of the folders inside your Project/ folder. (The folder you choose does not have to be in the top level of the Project/ folder; you may choose a folder at any level of the hierarchy of folders in your Project/ folder.)
The reason for this is that it should be possible for someone else to copy your Project/ folder and everything inside it onto their own computer, and then run your scripts.
- Once a user has copied the Project/ folder and everything inside it onto their own computer, it will be possible for them to designate the working directory as the Project/ folder or any of the folders contained in the Project/ folder.
- If you choose a folder outside of the Project/ folder as the working directory, that folder will not be copied onto any other user's computer, and it will not be possible for them to designate a folder that does not exist on their computer as the working directory.
Comments
Comments in your scripts explain in plain English what the commands you have written in the syntax of your software accomplish.
Throughout all your scripts, you should include copious comments explaining what each command or sequence of commands accomplishes and what the purpose is.
Headers
Headers are comments that appear at the beginning of a script and provide information that anyone working with or executing the script should be aware of.
In every script you write, you should begin with a header that reminds the user of two essential things:
- When the script is executed, the entire Project/ folder, with all of its contents intact, should be copied onto the computer or workspace where the scripts will be executed.
- When the script is executed, the working directory for the software should be set to the Project/ folder (or whatever folder you chose to designate as the working directory when you wrote the script).
-
Why are these reminders important?
These reminders are the key to ensuring that the relative directory paths you write in your scripts function properly.
- Since the relative directory paths in your scripts tell your software how to navigate through the hierarchy of folders stored in your Project/ folder, the structure of the folders on the user's computer must match the structure of the Project/ folder.
- And since the relative directory paths all start at the Project/ folder (or whatever other subfolder you choose to designate as the working directory), that is where the working directory needs to be set.
It may also be valuable to include other types of information in a header.
-
What other types of information?
Other types of information commonly presented in headers include:
- A plain-English description of the purpose of the script and what it accomplishes.
- A list of existing files the script will need to access, and a list of new files the script creates and saves.
- Details about the software needed to run the script (e.g., versions of the program and add-on packages).
- The date the script was last modified, and the name of the person who made the modifications.
You should use your judgment about which, if any, of these or any other types of additional information would be useful.
Your instructor may also give guidelines about additional information to include in the headers of your scripts.
Preliminary commands: Clearing memory and choosing settings
Immediately after the header, and before the commands that carry out the main tasks of the script, it is often useful to write some preliminary commands to make sure your workspace and software are ready to go.
Some examples of preliminary commands that can be useful in many situations are described below. Which of these are useful to you will depend on the kind of software you are using and your preferences about how you like to work--so use your judgment and be selective.
Clearing memory
You may wish to write a command that tells your software to clear any data or settings in active memory.
Remember, though, that when you clear memory you may lose any changes you have made since the last save. So if your script includes a preliminary command that clears memory, be sure not to execute it if you have any data or settings in memory that you do not want to lose.
Settings for your software
After clearing data and settings, you may want to specify certain settings you want when you run your scripts. There is a wide range of settings you might want to specify. Some common examples include which character is used as a delimiter to indicate the end of a command, how output is displayed, and how much memory your operating system allocates to your software.
If you want these settings to be something other than the ones your software uses by default, it is often convenient to write commands at the beginning of your scripts that specify your choices.
Now you are ready to start the real work
After you have:
- Made sure the Project/ folder (or whichever subfolder you chose) is designated as the working directory,
- Written the header at the beginning of the script, and
- Written any preliminary commands to clear memory and/or specify the settings you desire
You can begin writing the commands that carry out the tasks the script is intended to accomplish.
The pages on Processing Scripts, the Data Appendix Script, and Analysis Scripts, provide guidance for writing each of those kinds of scripts.