install.packages("swirl")
library("swirl")
swirl()
lab 1 - command line and R
We will learn to use the command line by doing. Follow the instructions below.
command line
read the article that explains why it is a good idea to learn to use the command line Five reasons why researchers should learn to love the command line
Answer the questions showing that you did read the paper link to questions. You have multiple attempts.
follow the instructions on the paper’s github README page copied below for your convenience.
If you are on a Mac or Linux, open your Terminal (or similar) application. On Windows, you’ll need something like MobaXterm.
From the command prompt, clone this GitHub repository: git clone https://github.com/jperkel/nature_bash.git.
Enter the newly created directory: cd nature_bash.
Make the scripts executable: chmod +x *.sh.
Run the scripts in numeric order, being sure to prepend each file name with a period and slash, eg: ./01_init.sh). This tells the shell where to find these scripts (i.e., the current directory) when that location is not included in your file search PATH variable.
You don’t have to type the complete filename. Once you type enough of the filename to be uniquely recognized, you can use Bash’s ‘autocomplete’ feature to fill in the rest of the name for you. Try it: type ./01 and then hit the TAB key to complete the command.
There are six scripts in total:
01_init.sh: Creates a temporary directory (nature_tmpdir) beneath the current directory and fills it with dummy files
02_fix_filenames.sh: We created 217 files with date-stamped names using the pattern: DD-MM-YYYY, e.g., datafile-01-01-2020.txt, datafile-02-01-2020.txt, etc. This script uses the sed command and a for-loop to rename them using the standard YYYYMMDD pattern.
03_process_data.sh: We created a four dummy data files, such as might be output from a spectrophotometer (3 files of “readings” and 1 “background” file). This script pulls those data into a single data file, averages the readings, subtracts the background, and divides by 60 to give a per-second value. The result is a spreadsheet, which is displayed on screen.
04_geo.sh: This script downloads a compressed gene expression datafile from the NCBI Gene Expression Omnibus database, extracts it, and searches for a gene name given on the command line. Try it with the Cactin gene: ./04_geo.sh Cactin.
05_filter_geo.sh: This script uses sed and awk to filter the GEO datafile from 04_geo.sh for records with a p-value < 0.05 and an ensembl_gene_id value other than NA.
99_clean_all.sh: Deletes all temporary files.
Feel free to use chatGPT or similar to explain the scripts if you are not familiar with these commands.
R introduction
You are probably familiar with R. If not, start learning with swirl
Command line basics
For the bare minimum list of commands check out this link