Background: “Archive of the Internet” - the history of creation, mission and subsidiary projects
Probably, there are not so many users on Habré who have never heard of 3r339. The Internet Archive
(Internet Archive), a service that deals with the search and preservation of important for all of humanity digital data, whether it is Internet pages, books, videos or other type of information.
Who manages the Internet archive, when did it appear and what is its mission? Read about it in today's "Help".
kept 30 petabytes of information, this is about 300 billion web pages, 12 billion books, 4 million audio recordings, 3.3 million videos, 1.5 million photos and 170 thousand different software distributions. In just one year, the service noticeably “gained weight”, now “Archive” stores 339 billion web pages, 19 million books, 4.5 million video files, 4.7 million audio files, 3.2 million images of various kinds, 381 thousand distributions BY.
3r3171. How is the data storage organized? 3r3172.
Information is stored on hard drives in the so-called "data nodes". These are servers, each of which contains 36 hard drives (plus two disks with operating systems). Data nodes are grouped into arrays of 10 machines each and are cluster storage. In 201? “Archive” used 8-terabyte HDD, now the situation is about the same. It turns out that one node holds about 288 terabytes of data. In general, hard drives are also used in other sizes: ? 3 and 4 TB.
In 201? there were about 2?000 hard drives. The Archive data centers are equipped with air conditioning systems to maintain a microclimate with constant characteristics. One cluster storage of 10 nodes consumes about 5 kW of energy.
The structure of the Internet Archive is a virtual “library” that is divided into sections such as books, movies, music, etc. For each element there is a description entered in the catalog - usually this is the name, the name of the author and additional information. From a technical point of view, the elements are structured and are located in Linux directories.
The total amount of data stored by the Archive is 22 PB, while now there is room for another 22 PB. “Because we are paranoid,” say representatives of the service.
Look at the screenshot of the contents of the directory - there is a file with a name ending in "_files.xml". This is a directory with information about all the files in a directory.
3r3171. What will happen to the data if one or more servers fail? 3r3172.
Nothing bad will happen - data is duplicated . As soon as a new item appears in the Archive library, it is immediately replicated and placed on different hard drives on different servers. The process of “mirroring” content helps to cope with problems such as power outages and file system failures.
If the hard drive fails, it is replaced with a new one. Thanks to the mirrored and reduplicable data structure, the newcomer is immediately filled with data that was on the old, damaged HDD.
“Archive” has a specialized system that monitors the state of the HDD. On the day, you have to replace 6-7 failed drives.
3r3171. What is a wayback machine? 3r3172.
This is just one of the “Internet Archive” services, which specializes in saving web pages. The service has its own “spider”, which regularly inspects all sites accessible on the network and stores them on specialized servers. The more popular the website, the more often the robot copies its contents. If the resource administrator does not want the site information to be copied by the bot, it is enough to register the ban in the robots.txt file.
3r3165. Popular resources are copied frequently - almost daily. Wayback Machine even indexes social networks, including Twitter, Facebook
In 201? the “Archive” 3r3156. Launched the updated service Wayback Machine
by promising more convenient access to your saved web pages. The service was written, if not from scratch, then great reworked. Now it supports a number of file formats that were simply not previously saved. In the same 201? the organization said that its servers save about 1 billion web pages every week.
3r3165. This is what Twitter looked like in 200? 3r3166.
3r3171. What else can be found in the “Internet archive” database? 3r3172.
Books The collection of the organization is huge, it includes digitized books, both common and very rare editions. Books are stored not only in English, but also in many other languages. The "Archive" has specialized centers for scanning books, there are 33 such centers, they are located in five countries around the world.
On the day, the centers' employees scan about ?000 books. The service database contains millions of publications, the work on their digitization is financed by both ordinary people and various organizations, including libraries and funds.
Since 200? the Internet Archive has been keeping publicly accessible books from Google Book Search in its database. After the launch, the base of books quickly grew - in 2013 there were already more than 900 thousand books saved from the Google service.
One of the “Archive” services also provides access to books that are fully open, there are already over a million of them. This service is called Open Library.
Video. Service stores 4.5 million clips. They are divided by subject and have very different directions. The Archive servers store films, documentaries, records of sporting events, TV shows and many other materials.
In 201? “Archive” gave rise to a large-scale project - 3r3195. Digitization of videotapes
. At first it was about 40 thousand cassettes from the archive of Marion Stokes, a woman who had been recording news for many decades. Then other videotapes were added that fans of the idea of digitizing data important to humanity sent to the Archive.
Audio. Similarly, video, “Archive” stores audio files, which are also divided by topics. Last year, “Archive” began to implement its new project - deciphering shellac records, the oldest audio recording format. The sound was preserved on shellac plates, a natural resin that is secreted by female worms. Total archive Great 78 Project some hundreds of thousands of records .
Software. Of course, it’s simply impossible to store all the software created by mankind, even for Archive. The servers store vintage — for example, Macintosh software, DOS software, and other software. In 201? the staff of the "Archive" laid out 3r3209. 1500+ programs under Windows ???r3r3235. You can work directly in the browser. In 201? the Internet Archive released archive of software for the first Macintosh .
Games. Yes, Archive provides access to a huge number of games. You can play some of them in the browser emulator environment. Games are stored very different, including, and with portable analog-digital set-top boxes . There are games under MS-DOS and console games for Atari and ColecoVision.
For the first time the archive of old games organization Laid out back in 2013. These are 30-40 years old titles that could be played directly in the browser. These are games for consoles Atari 2600 (1977), Atari 7800 (1986), ColecoVision (1982), Philips Videopac G7000 (1978) and Astrocade (1983). The most interesting thing is that the Internet Archive has achieved that you can play quite legally. Now the collection has already over ?400 games and continues to grow.
It may be interesting
Situs QQ Online
Situs QQ Online