8 Linux and the Command Line
8.1 Introduction to UNIX and its siblings
- UNIX
- Originally developed at AT&T Bell Labs circa 1970. Has experienced a long, multi-branched evolutionary path
- POSIX (Portable Operating System Interface)
- a set of specifications of what an OS needs to qualify as “a Unix”, to enhance interoperability among all the “Unix” variants
8.1.1 Various Unices
- OS X
- is a Unix!
- Linux
- is not fully POSIX-compliant, but certainly can be regarded as functionally Unix
8.1.2 Some Unix hallmarks
- Supports multi-users, multi-processes
- Highly modular: many small tools that do one thing well, and can be combined
- Culture of text files and streams
- Primary OS on HPC (High Performance Computing Systems)
- Main OS on which Internet was built
8.2 The Command Line Interface (CLI)
The CLI provides a direct way to interact with the Operating System, by typing in commands.
8.2.1 Why the CLI is worth learning
- Typically much more extensive access to features, commands, options
- Command statements can be written down, saved (scripts!)
- Easier automation
- Much “cheaper” to do work on a remote system (no need to transmit all the graphical stuff over the network)
8.2.2 Connecting to a remote server via ssh
From the gitbash (MS Windows) or the terminal (Mac) type:
You will be prompted for your username and password.
You can also directly add your username:
In this case, you will be only asked for your password as you already specified which user you want to connect with.
** You can also use the terminal from RStudio!!**
8.4 General command syntax
$ command [options] [arguments]
where command
must be an executable file on your PATH
* echo $PATH
and options
can usually take two forms
* short form: -a
* long form: --all
You can combine the options:
What do these options do?
8.4.1 find
Show me my Rmarkdown files!
Which files are larger than 1GB?
With more details about the files:
8.5 Getting things done
8.5.1 Some useful, special commands using the Control key
- Cancel (abort) a command:
Ctrl-c
- Stop (suspend) a command:
Ctrl-z
Ctrl-z
can be used to suspend, then background a process
8.5.2 Process management
- Like Windows Task Manager, OSX Activity Monitor
top
,ps
,jobs
(hitq
to get out!)kill
to delete an unwanted job or process- Foreground and background:
&
8.5.3 What about “space”
- How much storage is available on this system?
df -h
- How much storage am “I” using overall?
du -hs <folder>
- How much storage am “I” using, by subdirectory?
du -h <folder>
8.6 Uploading Files
You have several options to upload files to the server. Some are more convenient if you have few files, like RStudio interface, some are more built for uploading a lot of files at one, like specific software… and you guessed it the CLI :)
8.6.1 RStudio
You can only upload one file at the time (you can zip a folder to trick it):
8.6.2 sFTP Software
An efficient protocol to upload files is FTP (File Transfer Protocol). The s
stands for secured. Any software supporting those protocols will work to transfer files.
We recommend the following free software:
8.6.3 scp
The scp
command is another convenient way to transfer a single file or directory using the CLI. You can run it from Aurora or from your local computer. Here is the basic syntax:
scp </source/path> <hostname:/path/to/destination/>
Here is an example of my uploading the file 10min-loop.R
to Aurora from my laptop. The destination directory on Aurora is /home/brun/github_com/NCEAS/nceas-training/materials/files
:
If you want to upload an entire folder, you can add the -r
option to the command. The general syntax is:
Here is an example uploading all the images in the myplot
folder
8.7 Advanced Topics:
8.7.1 Unix systems are multi-user
- Who else is logged into this machine?
who
- Who is logged into “this shell”?
whoami
8.7.2 A sampling of simple commands for dealing with files
wc
count lines, words, and/or charactersdiff
compare two files for differencessort
sort lines in a fileuniq
report or filter out repeated lines in a file
8.7.3 All files have permissions and ownership
- Change permissions:
chmod
- Change ownership:
chown
List files showing ownership and permissions:
ls -l
schild@aurora:~/postdoc-training/data$ ls -l total 1136 -rw----r-- 1 schild scientist 1062050 May 29 2007 AT_85_to_89.csv -rwxrwxr-x 1 schild scientist 16200 Jun 26 11:20 env.csv -rwxr-xr-x 1 schild scientist 23358 Jun 26 11:20 locale.csv -rwxrwx--- 1 schild scientist 7543 Jun 26 11:20 refrens.csv -rwx------ 1 schild scientist 46653 Jun 26 11:20 sample.csv
Clear contents in terminal window:
clear
8.7.4 Getting help
<command> -h
,<command> --help
man
,info
,apropos
,whereis
- Search the web!
8.7.5 History
- See your command history:
history
- Re-run last command:
!!
(pronounced “bang-bang”) - Re-run 32th command:
!32
- Re-run 5th from last command:
!-5
- Re-run last command that started with ‘c’:
!c
8.7.6 Get into the flow, with pipes
$ ls *.png | wc -l
$ ls *.png | wc -l > pngcount.txt
$ diff <(sort file1.txt) <(sort file2.txt)
$ ls foo 2>/dev/null
- note use of
*
as character wildcard for zero or more matches (same in Mac and Windows);%
is equivalent wildcard match in SQL queries ?
matches single character;_
is SQL query equivalent
8.7.7 Text editing
8.7.7.1 Some editors
vim
emacs
nano
$ nano .bashrc
8.7.7.2 Let’s look at our text file
cat
print file(s)head
print first few lines of file(s)tail
print last few lines of file(s)less
“pager” – view file interactively (typeq
to quit command)qqqbfod --t
“octal dump” – to view file’s underlying binary/octal/hexadecimal/ASCII format
$ shild@aurora:~/data$ head -3 env.csv
EnvID,LocID,MinDate,MaxDate,AnnPPT,MAT,MaxAT,MinAT,WeatherS,Comments
1,*Loc ID,-888,-888,-888,-888,-888,-888,-888,-888
1,10101,-888,-888,-888,-888,-888,-888,-888,-888
$ shild@aurora:~/data$ head -3 env.csv | od -cx
0000000 E n v I D , L o c I D , M i n D
6e45 4976 2c44 6f4c 4963 2c44 694d 446e
0000020 a t e , M a x D a t e , A n n P
7461 2c65 614d 4478 7461 2c65 6e41 506e
0000040 P T , M A T , M a x A T , M i n
5450 4d2c 5441 4d2c 7861 5441 4d2c 6e69
0000060 A T , W e a t h e r S , C o m m
5441 572c 6165 6874 7265 2c53 6f43 6d6d
0000100 e n t s \r \n 1 , * L o c I D ,
6e65 7374 0a0d 2c31 4c2a 636f 4920 2c44
0000120 - 8 8 8 , - 8 8 8 , - 8 8 8 , -
382d 3838 2d2c 3838 2c38 382d 3838 2d2c
0000140 8 8 8 , - 8 8 8 , - 8 8 8 , - 8
3838 2c38 382d 3838 2d2c 3838 2c38 382d
0000160 8 8 , - 8 8 8 \r \n 1 , 1 0 1 0 1
3838 2d2c 3838 0d38 310a 312c 3130 3130
0000200 , - 8 8 8 , - 8 8 8 , - 8 8 8 ,
2d2c 3838 2c38 382d 3838 2d2c 3838 2c38
0000220 - 8 8 8 , - 8 8 8 , - 8 8 8 , -
382d 3838 2d2c 3838 2c38 382d 3838 2d2c
0000240 8 8 8 , - 8 8 8 \r \n
3838 2c38 382d 3838 0a0d
od
is especially useful in searching for hidden characters in your data- watch for carriage return
\r
and new line\n\
dos2unix
andunix2dos
8.7.8 Create custom commands with “alias”
alias lwc=’ls *.jpg | wc -l’
You can create a number of custom aliases that are available whenever you login, by putting commands such as the above in your shell start-up file, e.g. .bashrc
8.7.9 A sampling of more advanced utilities
grep
search files for textsed
filter and transform textfind
advanced search for files/directories
8.7.9.1 grep
Show all lines containing “bug” in my R scripts
Now count the number of occurrences per file
Print the names of files that contain bug
Print the lines of files that don’t contain bug
Print “hidden” dot-files in current directory
$ ls -a | grep '^\.'
8.7.9.2 sed
Remove all lines containing “bug”!
Call them buglets, not bugs!
Actually, only do this on lines starting with #
8.8 Online resources
Above are just a few of the most useful Linux & Unix commands based on our experience. There are many more, and they comprise a rich set, that will serve you for years. They can be used in combination, and run from scripts. They can empower you when using high-end analytical servers, or doing repetitive tasks!