Blog

Member's Areas and wget

Earlier this evening I was discussing mirroring restricted areas of sites with wget on #ubuntu-au. The solution is pretty simple.

  1. Install the web developer extension for firefox
  2. Login to the target site
  3. On the webdev toolbar select Cookies > View Cookie Information
  4. For each of the cookie entries add the following to a file called wget-cookies.txt which should be saved in your home directory
<.target.domain.name>[tab]FALSE[tab]/[tab]FALSE[tab]1496836642[tab]<key>[tab]<value>

This is what it all mean?

  • <.target.domain.name> the domain of the site
  • TRUE the domain wide flag, if the domain starts with . this should be TRUE
  • / the path the cookie applies to
  • FALSE is the cookie secure (or available via HTTPS)
  • 1496836642 the expiry of the cookie (i am using 11:57:22 UTC on 7-Jun-2017)
  • <key> the name of the cookie
  • <value> the value of the cookie

If you just want to pull down a single page use the following command:

wget  --load-cookies ~/wget-cookies.txt <target-url>

Then you should have the target page

If you want to mirror the whole site as an authenticated user try something like:

wget --mirror -w 2 -p --convert-links  --load-cookies ~/wget-cookies.txt <target-url>

I tested this with a couple of my own sites and it seems to work well.

Before doing something like this, check the term of service and the license of the content to ensure that you are not in violation of either.