Figure out which ip addresses attacked my system at most

I'm very often on the command line of a linux machine (only plain console via ssh no UI) and have to analyze different information from text files. Typically, log files which contain line oriented things like login failures or alike.

For example having a auth.log on a Debian system which look like this (a 10 line snippet):

 1Jan  1 00:00:50 Debian-bullseye-latest-amd64-base sshd[2515800]: Invalid user web2 from 103.66.206.219 port 57758
 2Jan  1 00:00:50 Debian-bullseye-latest-amd64-base sshd[2515800]: pam_unix(sshd:auth): check pass; user unknown
 3Jan  1 00:00:50 Debian-bullseye-latest-amd64-base sshd[2515800]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=103.66.206.219 
 4Jan  1 00:00:52 Debian-bullseye-latest-amd64-base sshd[2515800]: Failed password for invalid user web2 from 103.66.206.219 port 57758 ssh2
 5Jan  1 00:00:53 Debian-bullseye-latest-amd64-base sshd[2515800]: Received disconnect from 103.66.206.219 port 57758:11: Bye Bye [preauth]
 6Jan  1 00:00:53 Debian-bullseye-latest-amd64-base sshd[2515800]: Disconnected from invalid user web2 103.66.206.219 port 57758 [preauth]
 7Jan  1 00:01:44 Debian-bullseye-latest-amd64-base sshd[2515995]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.190  user=root
 8Jan  1 00:01:46 Debian-bullseye-latest-amd64-base sshd[2515995]: Failed password for root from 218.92.0.190 port 34975 ssh2
 9Jan  1 00:01:51 Debian-bullseye-latest-amd64-base sshd[2515995]: Failed password for root from 218.92.0.190 port 34975 ssh2
10Jan  1 00:01:54 Debian-bullseye-latest-amd64-base sshd[2515995]: Failed password for root from 218.92.0.190 port 34975 ssh2
11...
12Jan  1 00:03:19 Debian-bullseye-latest-amd64-base sshd[2516404]: Failed password for root from 218.92.0.190 port 56977 ssh2
13Jan  1 00:03:21 Debian-bullseye-latest-amd64-base sshd[2516423]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=51.142.182.209  user=root
14Jan  1 00:03:22 Debian-bullseye-latest-amd64-base sshd[2516404]: Failed password for root from 218.92.0.190 port 56977 ssh2
15Jan  1 00:03:23 Debian-bullseye-latest-amd64-base sshd[2516423]: Failed password for root from 51.142.182.209 port 1024 ssh2
16Jan  1 00:03:23 Debian-bullseye-latest-amd64-base sshd[2516423]: Received disconnect from 51.142.182.209 port 1024:11: Bye Bye [preauth]
17Jan  1 00:03:23 Debian-bullseye-latest-amd64-base sshd[2516423]: Disconnected from authenticating user root 51.142.182.209 port 1024 [preauth]

I want to know how many attempts where made to break into my system via the user root on my system. So I have to filter the above file like this:

1cat auth.log | grep " Disconnected from authenticating user root " | wc -l
214134

That means during January up to know more than 14,000 times it has been tried to attack my system. Ok, but I want to know more. What about the different IP addresses which have been used for those attacks? Each line of that format contains also an ip address. So try to extract that information.

I usually go step-by-step to find the solution. So first I try to extract the ip address which can be extracted via the following:

1cat auth.log | grep " Disconnected from authenticating user root " | tr -s " " | cut -d " " -f11
2218.92.0.190
351.142.182.209
4218.92.0.190
5218.92.0.190
6218.92.0.190
7...

The output contain only the ip address. But now I would like to know how many ip addresses are being used. That means first we have to sort the ip addresses to identify duplicate entries in the resulting list. Unfortunately appending a sort to the pipes is simply not enough.

 1cat auth.log | grep " Disconnected from authenticating user root " | tr -s " " | cut -d " " -f11 | sort
 2...
 3121.26.142.238
 4121.26.142.238
 5121.26.142.238
 6121.26.142.238
 7121.26.142.238
 8121.26.142.238
 9121.26.142.238
10121.26.142.238
1112.191.116.182
1212.191.116.182
1312.191.116.182
1412.191.116.182
1512.191.116.182
1612.191.116.182
1712.191.116.182
1812.191.116.182
19...

If you take a deeper look into the list the order of ip addresses is not correct. I would expect to have 12.191.116.182 before 121.26.142.238. That means we have to go a different way. This can be done via sort. The way you have to go is via: sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n. The option -t is to define a different field separator. In case of ip addresses we use the . as separator. Finally, you have to define -k 1,1n which uses 1 the field number (the first digits of the ip address) and ,1n the start position and the type n means numeric. You can add the option --debug to understand the logic of the -k option. The result of this intermediate step looks like this:

 1cat auth.log | grep " Disconnected from authenticating user root " 
 2    tr -s " "
 3    cut -d " " -f11
 4    sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n
 5...
 6222.119.64.11
 7222.119.64.11
 8222.119.64.11
 9222.119.64.11
10222.119.64.11
11222.119.64.11
12222.119.64.11
13222.252.25.186
14222.252.25.186
15222.252.25.186
16222.252.25.186
17222.252.25.186
18222.252.25.186
19222.252.25.186
20222.252.25.186
21222.252.25.186
22222.252.25.186
23222.252.25.186
24222.252.25.186
25222.252.25.186
26223.17.0.181
27223.17.0.181
28223.17.0.181
29223.17.0.181
30223.17.0.181
31...

So now I need a way to group the same ip's into a group and count the number within that group. Luckily on linux that's an easy task just by using uniq which reports or omit repeated lines. So to know how many unique ip addresses are attacking my system can be solved by using this:

1cat auth.log | grep " Disconnected from authenticating user root " |  
2    tr -s " " |
3    cut -d " " -f11 |
4    sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n |
5    uniq |
6    wc -l
7691

So that means 691 unique ip addresses have attacked my system within this year. Furthermore, I would like to know what are the top 10 ip addresses which have been used. That can be accomplished by using:

 1cat auth.log | grep " Disconnected from authenticating user root " |  
 2    tr -s " " |
 3    cut -d " " -f11 |
 4    sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n |
 5    uniq -c |
 6    sort -nr |
 7    head -10
 8   
 9   1132 61.177.173.11
10    688 218.92.0.190
11    440 61.177.173.2
12    142 195.226.194.242
13    141 195.226.194.142
14     70 61.177.172.124
15     60 61.177.173.36
16     58 159.65.132.116
17     54 61.177.173.50
18     52 211.250.74.124
19    

So the ip address 61.177.173.11 has tried to attack my system 1132 times etc. Some explanations about the usage of uniq -c that automatically counts the unique lines and prints out something like this:

1   1132 61.177.173.11

The 1132 means the ip address has been occurred 1132 times within the file. The final part sort -nr is needed to sort based on that number and the r reverses the result meaning the largest number is at the beginning and head -10 prints out the first 10 lines.