Advertisement

Main Ad

How to create a website offline copy with wget in Kali Linux

During the "Reconnaissance" phase we might need to frequently access the targeted website and this can trigger some alarms. I used to rely on Httrack – or WebHttrack – for making one-on-one offline copies for a given web-page, but for some odd reasons it doesn't work on my current Kali installation. For those who want to give WEBHTTRACK a chance, one thing you need to remember: it's not included by default in Kali. In order to install webhttrack type the following:
apt-get update
apt-get install webhttrack
to get the full GUI version, or
apt-get update
apt-get install httrack
to get the command-line version only.

Searching for alternative easy ways to do it, I've found this tutorial from kossboss – all the credit goes there.
Open a terminal and type mkdir /mywebsitedownloads/ and then
cd / mywebsitedownloads – you can name the folder in any way you wish.
Now (copy and paste):
wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://www.nameofthesiteyouwanttocopy.com
Replace the nameofthesiteyouwanttocopy.com with the actual name of your targeted web-page. Below is the explanation of each command:

--limit-rate=200k: Limit the download to 200 Kb/sec – higher download rates might seem suspicious.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and
resumed).
--convert-links: convert links so that they work locally, off-line, instead of pointing to a website online.
--random-wait: Random waits between download – same reason as for the limit-rate.
-r: Recursive - downloads full website
-p: downloads everything, including pictures.
-E: gets the right extension of the file.
-e robots=off: prevent the website from considering your session as a robot/crawler.
-U mozilla: pretends to be just like a web-browser.

Once the download is completed you can find the offline copy in /nameofthefolder you used for saving your downloaded page – look for the home/index.html page.
wget-screen-running-offline-copy

You'll notice that it is an identical copy – it preserves the link structure, pictures, code and other formatting. Remember that anytime you interact directly with any online resources owned by the 'target', there's a chance you'll leave your digital fingerprint behind.