How to become a command-line wizard

The most useful computer science class you’ve probably never taken

Image generated with Stable Diffusion

One thing that I have consistently observed throughout my career is that the most productive data scientists and engineers have usually one thing in common: they’re command-line wizards. They can navigate their computer’s file system, search for patterns in log files, and manage jobs, source code, and version control all from the command line, without relying on slow navigation with the mouse and graphical user interfaces.

Yet, this command-line ‘wizardry’, as it may appear to someone unfamiliar with shell tools, is not typically part of standard computer science curricula. An MIT course around mastering your command line is aptly named “The Missing Semester of Your CS Education”.

This post is my personal, 10-lesson ‘command-line wizardry 101’ class, targeted for readers that want to work more with the command line and less with graphical user interfaces. We’ll cover basics around the shell and the path variable, aliases, file permissions, streaming and piping, efficient job management, tmux, ssh, git, and vim.

Let’s get started. Welcome to CLW 101.

1. The shell

When you open your terminal, you’re looking at a shell, such as bash (borne again shell) or ZSH (z-shell). The shell really is a complete programming language with access to certain standard programs that allow for file system navigation and data manipulation. You can find out which shell you’re running by typing:

echo $SHELL

In bash, each time you start a new shell, the shell loads a sequence of commands that are specified inside the .bashrc file, which is typically in your home directory (if you use a Mac, there’s usually a .bash_profile file instead). In that file you can specify useful things such as your path variable or aliases (more on which below).

2. The path variable

When you type the name of certain programs into your shell, such as python, cat, or ls, how does the shell know where to get that program from? That’s the purpose of the path variable. This variable stores a list of all paths where the shell looks for programs, separated by colons. You can inspect your path variable by typing:

echo $PATH

And you can add additional directories to your path variable with this command:

export PATH="my_new_path:$PATH"

It’s best to add this command to your bashrc file, so that your additional directory is always in your path when you start a new shell.

3. Aliases

Aliases are custom commands that you can define in order to avoid typing lengthy commands over and over again, such as:

alias ll="ls -lah"
alias gs="git status"
alias gp="git push origin master"

Aliases can also be used to create safeguards for your development workflow. For example, by defining

alias mv="mv -i"

your terminal will warn you if the file you’re about to move does already exist under the new directory, so that you don’t accidentally overwrite files that you didn’t mean to overwrite.

Once you add these aliases into your bashrc file, they’re always available when you start a new shell.

4. File permissions and sudo

When multiple users share a machine, it’s important to set file permissions that determine which user can perform which operations on what data. When you type ls -l, you’ll see the files in your current directory along with their permissions in the following form:

-rwxrwxrwx

Here,

  • rwx stand for read, write, and execute rights, respectively
  • the 3 rwx blocks are for (1) user, (2) user group, and (3) everyone else. In the given example, all 3 of these entities have read, write, as well as execute permissions.
  • the dash indicates that this is a file. Instead of the dash, you can also see a d for directory or l for a symbolic link.

You can edit file permissions with chmod. For example, if you want to make a file executable for yourself, you’d type

chmod u+x my_program.py

👉 If a file is executable, how does the shell know how to execute it? This is specified with a ‘hashbang’ in the first row of the file, such as #!/bin/bash for a bash script or #!/bin/python for a python script.

Lastly, there’s a special ‘super user’ who has all of the permissions for all of the files. You can run any command as that super user writing sudo in front of that command. You can also launch a stand-alone sudo shell by executing

sudo su

⚠️ Use sudo with care. With sudo, you’re able to make changes to the code that controls your computer’s hardware, and a mistake could make your machine unusable. Remember, with great power comes great responsibilty.

5. Streaming and piping

The streaming operator > redirects the output from a program to a file. >> does the same thing, but it’s appending to an existing file instead of overwriting it, if it already exists. This is useful for logging your own programs like this:

python my_program.py > logfile

Another useful concept is piping: x | y executes program x, and the directs the output from x into program y. For example:

  • cat log.txt | tail -n5 : prints the last 5 lines from log.txt
  • cat log.txt | head -n5 : prints the first 5 lines from log.txt
  • cat -b log.txt | grep error : shows all lines in log.txt that contain the string ‘error’, along with the line number (-b)

6. Managing jobs

If you run a program from your command line (e.g. python run.py), the program will by default run in the foreground, and prevent you from doing anything else until the program is done. While the program is running in the foreground, you can:

  • type control+C, which will send a SIGINT (signal interrupt) signal to the program, which instructs the machine to interrupt the program immediately (unless the program has a way to handle these signals internally).
  • type control+Z, which will pause the program. After pausing the program can be continued either by bringing it to the foreground (fg), or by sending it to the backgroud (bg).

In order to start your command in the background right away, you use the & operator:

python run.py &

👉 How do you know which programs are currently running in the background? Use the command jobs. This will display the names of the jobs running as well as their process ids (PIDs).

Lastly,kill is a program to send signals to programs running in the background. For example,

  • kill -STOP %1 sends a STOP signal, pausing program 1.
  • kill -KILL %1 sends a KILL signal, terminating program 1 permanently.
Four terminal panes on my personal Macbook’s terminal with tmux (image by the author).

7. tmux

tmux (‘terminal multiplexer’) enables you to easily create new terminals and navigate between them. This can be extremely useful, for example you can use one terminal to navigate your file system and another terminal to execute jobs. With tmux, you can even have both of these side-by-side.

👉 Another reason to learn tmux is remote development: when you log out of a remote machine (either on purpose or accidentally), all of the programs that were actively running inside your shell are automatically terminated. On the other hand, if you run your programs inside a tmux shell, you can come simply detach the tmux window, log out, close your computer, and come back to that shell later as if you’ve never been logged out.

Here are some basic commands to get you started with tmux:

  • tmux new -s run creates new terminal session with name ‘run’
  • control-BD: detach this window
  • tmux a : attach to latest window
  • tmux a -t run : attach to window called ‘run’
  • control-B“ : add another terminal pane below
  • control-B% : add another terminal pane to the right
  • control-B➡️ : move to the terminal pane to the right (similar for left, up, down)

8. SSH and key pairs

ssh is a program for logging into remote machines. In order to log into remote machines, you’ll need to provide either a username and password, or you use a key pair, consisting of a public key (which both machines have access to) and a private key (which only your own machine has access to).

ssh-keygen is a program for generating such a key pair. If you run ssh-keygen, it will by default create a public key named id_rsa.pub and a private key named id_rsa, and place both into your ~/.ssh directory. You’ll need to add the public key to the remote machine, which, as you should know by now, you can do by piping together cat, ssh, and a streaming operator:

cat .ssh/id_rsa.pub | ssh user@remote 'cat >> ~/.ssh/authorized_keys'

Now you’ll be able to use ssh into remote just by providing your your private key:

ssh remote -i ~/.ssh/id_rsa

An even better practice is to create a file ~/.ssh/config which contains all of your ssh authentication configurations. For example, if your config file is as follows:

Host dev
HostName remote
IdentityFile ~/.ssh/id_rsa

Then you can log into remote by simply typing ssh dev.

9. git

git is a version control system that allows you to allows you to efficiently navigate your code’s versioning history and branches from the command line.

👉 Note that git is not the same as GitHub: git is a stand-alone program that can manage your code’s versioning on you local laptop, while GitHub is a place to host your code remotely.

Here are some essential git commands:

  • git add : specifies which files you want to include in the next commit
  • git commit -m 'my commit message' : commits the code change
  • git checkout -b dev : creates a new branch named ‘dev’ and check out that branch
  • git merge dev : merges dev into the current branch. If this creates merge conflicts, you’ll need to fix these conflicts manually, and then run git add file_that_changed; git merge --continue
  • git stash : reverts all changes, and git stash pop brings them back. This is useful if you made changes to the master branch, and then decide that you actually want those changes to be a separate branch.
  • git reset --hard : reverts all changes permanently

And here are some essential git commands for dealing with a remote host (e.g. GitHub):

  • git clone : clones a copy of the code repo to your local machine
  • git push origin master : pushes the changes to the remote host (e.g. GitHub)
  • git pull : pulls the latest version from remote. (This is the same as running git fetch; git merge;).

👉 Before being able to run a command such as git push origin master, you’ll need to authenticate with an ssh keypair (see Lesson 8). If you use GitHub, you can simply paste the public key under your profile settings.

10. vim

Vim is a powerful command-line based text editor. It’s a good idea to learn at least the very basic commands in vim:

  • every once in a while you may have to log into a remote machine and make a code change there. vim is a standard program and therefore usually available on any machine you work on.
  • when running git commit, by default git opens vim for writing a commit message. So at the very least you’ll want to know how to write, save, and close a file.

The most important thing to understand about vim is that there are different operation modes. Once you launch vim, you’re inside navigation mode, which you use to navigate through the file. Type i to start edit mode, in which you can make changes to the file. Type the Esc key to leave edit mode and go back to navigation mode.

The useful thing about navigation mode is that you’re able to rapidly navigate and manipulate the file with your keyboard, for example:

  • x deletes a character
  • dd deletes an entire row
  • b (back) goes to the previous word, n (next) goes to the next word
  • :wq saves your changes and closes the file
  • :q! ignores your changes and closes the file

For more (much more!) vim keyboard shortcuts, check out this vim cheatsheet.

Photo by Vasily Koloda on Unsplash

Final thoughts

Congratulations, you’ve completed ‘command line wizardry 101’. However, we’ve only scratched the surface here. For inspiration, consider the following problem:

“Given a text file and an integer k, print the kmost common words in the file (and the number of their occurrences) in decreasing frequency.”

As a data scientist, my first impulse may be to launch a jupyter notebook, load the data perhaps into pandas, and then use a function such as pandas agg. However, to a seasoned command-line wizard, this is a one-liner:

tr -cs A-Za-z '' | tr A-Z a-z | sort | uniq -c | sort -rn | sed ${1}q

This doesn’t look too different from Stable Diffusion’s imagination shown in the beginning of this article. Wizardry, indeed.

How to become a command-line wizard Republished from Source https://towardsdatascience.com/how-to-become-a-command-line-wizard-5d78d75fbf0c?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>

Time Stamp:

More from Blockchain Consultants