Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Helpful Linux Commands

Status: Draft Updated: 16 Apr 2024

watch

Periodically runs a chosen executable.
e.g: run nvidia-smi every two seconds with highlighting what has changed between consecutive runs

watch -t -d -n 2 nvidia-smi

rsync

Synchronise and backup data

rsync -vahPAX g@target.ip.v4.address:/target/path/ destination/directory/
rsync -vahPAX g@172.16.3.2:/home/g/Workspace g/

rsync -vahP /home/ganindu/Downloads/nvfw-dgxstationa100_23.3.1_230306.tar.gz    g@DGX-ganindu:/home/g/Downloads/


Options used in the example above:

-v : verbose
-a : archive (equals to -rlptgoD: recursive|replicate symlinks|preserve permissions|preserve modification times|preserve group|preserve owner|preserve device files and special files)
-h : human readable 
-P : progress
-A : preserve ACLs
-X : preserve extended attributes

nmcli

set upa a IPV4 connection

DHCP

nmcli con add con-name "eth0-DHCP" ifname eth0 type ethernet autoconnect yes ipv4.method auto

STATIC

nmcli con add con-name "eth0-static" ifname eth0 type ethernet autoconnect yes ipv4.method manual ipv4.addresses 172.16.1.232/24

setting up static ip a;longside DHCP (Jetson with Network Manager)

  1. Use nmcli con show to know what are the avaialble connections
NAME     UUID                                  TYPE      DEVICE  
DHCP     20037dd6-2486-412a-94f9-caa1c7fbf458  ethernet  lan1    
docker0  d0a3f9eb-f91d-4214-9076-5d121e9717c3  bridge    docker0 
  1. Set up aa new connection

here we are setting up 172.16.1.233

sudo nmcli con add type ethernet con-name "ONBOARD_STATIC" ifname lan1 ip4 172.16.1.233/24 gw4 0.0.0.0
sudo nmcli con mod "ONBOARD_STATIC" ipv4.dns "8.8.8.8,8.8.4.4"
sudo nmcli con mod "ONBOARD_STATIC" ipv4.method manual

activate connection with

sudo nmcli con up "ONBOARD_STATIC"

deactivate connection, when the MANUAL static connectiomn is down the DHCP connection will work

sudo nmcli con down "ONBOARD_STATIC" or
sudo nmcli con up "DHCP"


nvidia-smi -q -u

nvidia-smi topo -matrix

nvidia-smi topo -p2p rw

sudo ipmitool sel list

sudo ipmitool sensor

sudo ipmitool sdr

uptime -p

nvme list

NVSM: Nvidia System Management CLI (for NVidia DGX)

NVSM is an “always-on” health monitoring engine which is meant to catch issues proactively. We an drop in to the nvsm utility with nvsm

within nvsm

  • help (-h): for the documentation.

some sample frequently used commands

  • show health
  • show version

Update container

You will need to go to the support portal and download the latest update container.

(Below is an example with a .run file)

# update all
./nvfw-dgxstationa100_22.2.1_220209.run update_fw all

# update GPU firmware
sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw VBIOS

# show version information 
./nvfw-dgxstationa100_22.2.1_220209.run show_version

Give group permissions to a directory

  1. create a group e,g. tms.bucher
    • sudo groupadd tms.bucher
    • sudo usermod -aG tms.bucher user1
    • sudo usermod -aG tms.bucher user2
  2. then run sudo chown -R $(id -u):tms.bucher path/to/dir/

giving read write execute permission to the directory to group members

even though you changed ownerships the directiry may still not allow users to read, write or execute within that directory

gives read write permissions

sudo chmod -R g+rx path/to/dir/

gives execute permissions

sudo chmod -R g+rwx path/to/dir/

TLDR

in Unix and Linux file systems, permissions are represented both symbolically (e.g., rwxr-xr-x) and numerically (e.g., 755). The numeric representation is a three-digit octal number, with each digit representing a different set of users: the owner, the group, and others, in that order.

Each digit is the sum of its component bits in the following way:

Read (r) permission is 4.
Write (w) permission is 2.
Execute (x) permission is 1.

To form a digit for each user category, sum the permissions you want to grant. For example:

g+rx means you’re adding read (4) and execute (1) permissions for the group, which sums up to 5. Since we’re focusing on the group permissions, this corresponds to setting the group’s permission digit to 5. If we consider default permissions for the owner as read, write, and execute (7) and no permissions for others (0), this would numerically be represented as 750.

g+rwx means you’re adding read (4), write (2), and execute (1) permissions for the group, summing up to 7. Again, assuming full permissions for the owner (7) and none for others (0), this would be 770 numerically.

Here’s a breakdown of how the octal representation maps to symbolic permissions:

0 = ---
1 = --x
2 = -w-
3 = -wx
4 = r--
5 = r-x
6 = rw-
7 = rwx

So, to apply these permissions using chmod with numerical values:

To give the group read and execute permissions, without changing the owner’s permissions (assuming they’re full) and ensuring others have no permissions, you would use:

sudo chmod 750 path/to/dir/

To give the group read, write, and execute permissions (full permissions), again without changing the owner’s permissions and ensuring others have no permissions, you would use:

sudo chmod 770 path/to/dir/

These commands apply the permissions directly to the path/to/dir/ directory. If you want to apply them recursively to all contents within the directory as well, add the -R option to chmod, like chmod -R 750 path/to/dir/ or chmod -R 770 path/to/dir/.

sudo nvsm-health --show --log-level=debug

[tail] view dynamic logs

Tail looks like a very basic tool with limited functinality but it nicely format logfiles and make it easier to view dynamic log files.

The command below follows a logfile wile showing the complete log file.

tail -n +1 -f logs/4e51108a-693e-423d-a7f9-94f2b8a5cd82.txt 

o see the last 100 lines of a log while following it, you can use the following tail command:

tail -n 100 -f log_file.txt

This command will start at the end of the log file and display the last 100 lines. As new lines are added to the log file, tail will continue to display them. To stop following the log file, press Ctrl+C.

Here is an example of how to use the tail command to follow the last 100 lines of a system log file:

tail -n 100 -f /var/log/syslog

This command will display the last 100 lines of the system log file, and will continue to display new lines as they are added to the log file.

You can also use the tail command to follow the last 100 lines of a log file that is being compressed in real time. To do this, you can use the following command:

tail -n 100 -f <(zcat log_file.txt.gz)

This command will decompress the log file on the fly and display the last 100 lines. As new lines are added to the log file and compressed, tail will continue to display them.

The tail command is a very powerful tool for monitoring log files in real time. It is especially useful for troubleshooting problems or tracking the progress of long-running tasks.

serial over lan (SOL)

If a computer you are using has SOL capability, serial over lan you can activate a sol session over lan, this is sometimes useful when debugging startup stuff.

for this you need to set up things first,

sudo ipmitool  -I lanplus -H <address of the target> -U <username> -L OPERATOR  -P <password> sol activate

you can couple with the bmc startup command over the netwoek and inspect the serial output,

you cna also deactivate it

sudo ipmitool  -I lanplus -H <address of the target> -U <username> -L OPERATOR  -P <password> sol deactivate

Click here to report Errors, make Suggestions or Comments!