1. Mick West

    Mick West Administrator Staff Member

    Sometimes you want to link to a page that might change or vanish after a while, particularly if it contains outrageous bunk, or libelous material. Or it might just be something that is by it's nature temporary, like a job posting, or a service request. If you want to make sure you can reference it in the future, then a great free way of doing this is archive.is, a service that will give you both a linked snapshot of the page at that moment in time, and also a .zip version you can use offline, or attach to a post:

    Just copy the URL of the page, then go to:

    http://archive.today/ (formerly archive.is, as seen in images below. Currently both URLs work)

    And enter in the URL (or use their bookmarklet)
    [​IMG]

    It will do some magic, and you will get a page you can link to that should last for years, with:
    • Short URL
    • The date and time of the capture (UTC)
    • A link to "download zip"
    [​IMG]

    For permanence, click on "download zip", and then attach the file to the post. Like I have done below.

    They when you use the link, put the original link in, then archive link after it, like:
    https://www.cia.gov/careers/opportunities/clandestine/ncs-language-officer.html (http://archive.is/sxL1g)
     

    Attached Files:

    Last edited: Aug 28, 2014
    • Like Like x 1
    • Informative Informative x 1
    • Useful Useful x 1
  2. Mick West

    Mick West Administrator Staff Member

    Using the bookmark bar button is by far the easiest way of using archive.is, just drag it into the bar:

    How to install the bookmarklet.
    [​IMG]

    Then just click on it to archive the current page.
     
    • Like Like x 1
  3. Mick West

    Mick West Administrator Staff Member

    I have added a new function, the "archive.is" which shows up as a button under any post you can edit.

    [​IMG]

    Pressing it will archive the links externally via archive.is, and add links to those archive. Either as as plain URL next to a plain URL, or with a caret symbol (^) for inline links.

    [​IMG]

    I encourage you to use this whenever you post a link to something that might vanish.

    If it detects you already have an archive.is link in the post, then it will skip all links. So if you add links after the first archiving, you should either remove the old archive links and re-run, or create the archive links manually, as above.

    Example:
    http://xenforo.com (http://archive.is/Sbihm)

    Direct link^
     
    Last edited: Feb 9, 2014
    • Like Like x 2
    • Informative Informative x 1
  4. Buildy

    Buildy Member

    I've been reading through older material on the site the last couple weeks and noticed a fair number of dead youtube links. Does this archive the actual youtube video as well? (If that's what's linked)
     
  5. Mick West

    Mick West Administrator Staff Member

    No. In fact it will just skip over YouTube videos.

    Videos are very large. It's not economic to cache them. The only sensible way would be to download it, and then re-upload it to youtube (or Vimeo, etc).
     
    • Agree Agree x 1
    • Informative Informative x 1
  6. Critical Thinker

    Critical Thinker Senior Member

    How To Save URLs To The Wayback Machine On Demand

     
    • Informative Informative x 1
  7. Mick West

    Mick West Administrator Staff Member

    • Like Like x 1
  8. Mick West

    Mick West Administrator Staff Member

    I've just added code to also submit all new links to archive.org, although increasingly it seems like they are crawling the web so fast that they get most new stuff on existing sites within a day or two. So it is most useful for new sites that crop up and are not yet being indexed.
     
    • Like Like x 1
  9. deirdre

    deirdre Moderator Staff Member

    just wondering. do archived pages show up in google searches?

    Mostly I'm wondering if a 'hoax bunk' page is taken down the information disappears from the internet unless it happened to be crawled. but if we archive it... you can't unarchive something right?
     
  10. Mick West

    Mick West Administrator Staff Member

    You can't unarchive an archive.is/archive.today page, AFAIK, however you CAN unarchive an archive.org page.

    For something that is very important, I'd recommend downloading a local copy of the page.
     
    • Agree Agree x 1
    • Useful Useful x 1
  11. KAT

    KAT Active Member

    I used to work for a very huge website, and we found Wayback Machine had big gaps of our material. There's just so much they can crawl, not having Google's resources. We also had some private pages, known to many users but deliberately blocked to all crawlers. We'd periodically place the more long-term important documents from there into WBM by Critical Thinker's method, to save them from our own company's overzealous pruning of "obsolete" content.
     
  12. Mick West

    Mick West Administrator Staff Member

    And they do show up in google searches, but you need to specify the archive.is site:
    https://www.google.com/search?q="sandy+hook"+site%3Aarchive.is

    Archive.org is seemingly not googled, for archived pages at least.
     
    • Informative Informative x 1
  13. Mick West

    Mick West Administrator Staff Member

    The code I added essentially uses this same method, just programmatically grabs the URL:

    http://web.archive.org/save/{the url you want to save}

    And archive.org should grab the latest version of the page.
     
  14. KAT

    KAT Active Member

    Google re-crawls the same things on a regular basis, on some schedule based other judgement of the importance, interest and freshness of the content. When they find something they've seen before, they make their own cached copy of it. When they have done this, there is a tiny down arrow at the end of the green URL under the search result link, which lets you access the cached copy.

    If a page is not available when they go to crawl it again, if it is a "soft" error (300 or 500 series error code) they will keep trying. If it is a "hard" error code (the 400 series, 404 the best known) they correctly assume the content is gone forever, so they remove the search result link for it. This of course means there is nothing to hang their cached link onto. This is because it is bad for their business to provide dead links.

    Where do they get dead links to try and crawl? from other sites that have linked to them. Like this site. They will follow and try to index every link they see here, then treat them as above, if they find them dead. If you post two links here, the original and your WBM version, they will tr to index both.

    Google breathes data, so I suspect they don't actually discard that cache, but there's no way for us to get hold of it. They will continue to see and display URLs from WayBack Machine, which are WBM URLs, not the same as the original was. So anything you put in there is available forever, that is what WBM is all about.

    ...

    Re unarchiving, it's probably ok if you, who submitted something, can later remove it. It defeats the purpose if they let the original owner get it removed (although they'd have to if they get a legal take-down notice, i guess). But as you say, something terribly important is best "saved as HTML" to your own computer, from where you can get it back onto the internet (or other interested parties) again if necessary. I would not consider google docs, github etc safe places for such file sharing.
     
    Last edited by a moderator: Aug 27, 2014
    • Like Like x 1
  15. M Bornong

    M Bornong Senior Member

    I think it's a good idea, at least with facebook to check the archive, when I first archived this one from the MET Office Page, https://archive.today/P1vM4, even though I had all of the comments expanded, it only showed Ian's opening comment. Several days later, when Ian said he saw no encyclopedia entry (I assume he has that person blocked), I archived it a second time, https://archive.today/fpFvX , this time it showed all of the replies and apparently, Ian was able to see the photos from the encyclopedia.
     
  16. Mick West

    Mick West Administrator Staff Member

    Yes, Facebook can be a bit fiddly for the archiver, as it has dynamic content, an it varies based on who is viewing it, and probably other factors. The archive.is robot uses a dummy user, and is logged in, but would not be a member of any of the groups, so might not be able to get everything.

    If it's important, take a screenshot.
     
    • Agree Agree x 1
  17. Dick Holman

    Dick Holman New Member

    Another quirk with Facebook is, it only grabs the last 50 posts made, so archiving frequently in long threads is key.