• Guest
HabraHabr
  • Main
  • Users

  • Development
    • Programming
    • Information Security
    • Website development
    • JavaScript
    • Game development
    • Open source
    • Developed for Android
    • Machine learning
    • Abnormal programming
    • Java
    • Python
    • Development of mobile applications
    • Analysis and design of systems
    • .NET
    • Mathematics
    • Algorithms
    • C#
    • System Programming
    • C++
    • C
    • Go
    • PHP
    • Reverse engineering
    • Assembler
    • Development under Linux
    • Big Data
    • Rust
    • Cryptography
    • Entertaining problems
    • Testing of IT systems
    • Testing Web Services
    • HTML
    • Programming microcontrollers
    • API
    • High performance
    • Developed for iOS
    • CSS
    • Industrial Programming
    • Development under Windows
    • Image processing
    • Compilers
    • FPGA
    • Professional literature
    • OpenStreetMap
    • Google Chrome
    • Data Mining
    • PostgreSQL
    • Development of robotics
    • Visualization of data
    • Angular
    • ReactJS
    • Search technologies
    • Debugging
    • Test mobile applications
    • Browsers
    • Designing and refactoring
    • IT Standards
    • Solidity
    • Node.JS
    • Git
    • LaTeX
    • SQL
    • Haskell
    • Unreal Engine
    • Unity3D
    • Development for the Internet of things
    • Functional Programming
    • Amazon Web Services
    • Google Cloud Platform
    • Development under AR and VR
    • Assembly systems
    • Version control systems
    • Kotlin
    • R
    • CAD/CAM
    • Customer Optimization
    • Development of communication systems
    • Microsoft Azure
    • Perfect code
    • Atlassian
    • Visual Studio
    • NoSQL
    • Yii
    • Mono и Moonlight
    • Parallel Programming
    • Asterisk
    • Yandex API
    • WordPress
    • Sports programming
    • Lua
    • Microsoft SQL Server
    • Payment systems
    • TypeScript
    • Scala
    • Google API
    • Development of data transmission systems
    • XML
    • Regular expressions
    • Development under Tizen
    • Swift
    • MySQL
    • Geoinformation services
    • Global Positioning Systems
    • Qt
    • Dart
    • Django
    • Development for Office 365
    • Erlang/OTP
    • GPGPU
    • Eclipse
    • Maps API
    • Testing games
    • Browser Extensions
    • 1C-Bitrix
    • Development under e-commerce
    • Xamarin
    • Xcode
    • Development under Windows Phone
    • Semantics
    • CMS
    • VueJS
    • GitHub
    • Open data
    • Sphinx
    • Ruby on Rails
    • Ruby
    • Symfony
    • Drupal
    • Messaging Systems
    • CTF
    • SaaS / S+S
    • SharePoint
    • jQuery
    • Puppet
    • Firefox
    • Elm
    • MODX
    • Billing systems
    • Graphical shells
    • Kodobred
    • MongoDB
    • SCADA
    • Hadoop
    • Gradle
    • Clojure
    • F#
    • CoffeeScript
    • Matlab
    • Phalcon
    • Development under Sailfish OS
    • Magento
    • Elixir/Phoenix
    • Microsoft Edge
    • Layout of letters
    • Development for OS X
    • Forth
    • Smalltalk
    • Julia
    • Laravel
    • WebGL
    • Meteor.JS
    • Firebird/Interbase
    • SQLite
    • D
    • Mesh-networks
    • I2P
    • Derby.js
    • Emacs
    • Development under Bada
    • Mercurial
    • UML Design
    • Objective C
    • Fortran
    • Cocoa
    • Cobol
    • Apache Flex
    • Action Script
    • Joomla
    • IIS
    • Twitter API
    • Vkontakte API
    • Facebook API
    • Microsoft Access
    • PDF
    • Prolog
    • GTK+
    • LabVIEW
    • Brainfuck
    • Cubrid
    • Canvas
    • Doctrine ORM
    • Google App Engine
    • Twisted
    • XSLT
    • TDD
    • Small Basic
    • Kohana
    • Development for Java ME
    • LiveStreet
    • MooTools
    • Adobe Flash
    • GreaseMonkey
    • INFOLUST
    • Groovy & Grails
    • Lisp
    • Delphi
    • Zend Framework
    • ExtJS / Sencha Library
    • Internet Explorer
    • CodeIgniter
    • Silverlight
    • Google Web Toolkit
    • CakePHP
    • Safari
    • Opera
    • Microformats
    • Ajax
    • VIM
  • Administration
    • System administration
    • IT Infrastructure
    • *nix
    • Network technologies
    • DevOps
    • Server Administration
    • Cloud computing
    • Configuring Linux
    • Wireless technologies
    • Virtualization
    • Hosting
    • Data storage
    • Decentralized networks
    • Database Administration
    • Data Warehousing
    • Communication standards
    • PowerShell
    • Backup
    • Cisco
    • Nginx
    • Antivirus protection
    • DNS
    • Server Optimization
    • Data recovery
    • Apache
    • Spam and antispam
    • Data Compression
    • SAN
    • IPv6
    • Fidonet
    • IPTV
    • Shells
    • Administering domain names
  • Design
    • Interfaces
    • Web design
    • Working with sound
    • Usability
    • Graphic design
    • Design Games
    • Mobile App Design
    • Working with 3D-graphics
    • Typography
    • Working with video
    • Work with vector graphics
    • Accessibility
    • Prototyping
    • CGI (graphics)
    • Computer Animation
    • Working with icons
  • Control
    • Careers in the IT industry
    • Project management
    • Development Management
    • Personnel Management
    • Product Management
    • Start-up development
    • Managing the community
    • Service Desk
    • GTD
    • IT Terminology
    • Agile
    • Business Models
    • Legislation and IT-business
    • Sales management
    • CRM-systems
    • Product localization
    • ECM / EDS
    • Freelance
    • Venture investments
    • ERP-systems
    • Help Desk Software
    • Media management
    • Patenting
    • E-commerce management
    • Creative Commons
  • Marketing
    • Conferences
    • Promotion of games
    • Internet Marketing
    • Search Engine Optimization
    • Web Analytics
    • Monetize Web services
    • Content marketing
    • Monetization of IT systems
    • Monetize mobile apps
    • Mobile App Analytics
    • Growth Hacking
    • Branding
    • Monetize Games
    • Display ads
    • Contextual advertising
    • Increase Conversion Rate
  • Sundry
    • Reading room
    • Educational process in IT
    • Research and forecasts in IT
    • Finance in IT
    • Hakatonas
    • IT emigration
    • Education abroad
    • Lumber room
    • I'm on my way

Why is SRE documentation important? Part 1

 3r3182. 3r3-31. Good evening everyone! 3r3167.  3r3182. 3r3167.  3r3182. The intensity of our launches varies from month to month. September students did not have time to finish the second month of the course 3–3–36. “Devops - practices and tools” 3r3-33170. , as we have the next stream. So we are again ready to share with you useful materials on the topic and look forward to at least open lessons . 3r3167.  3r3182. 3r3167.  3r3182. Today we will look at the first part of the article on how documentation allows SRE teams to manage new and existing services. 3r3167.  3r3182. 3r3167.  3r3182. SRE (site reliability engineering, roughly translated as “ensuring the reliability of information systems”, specialists in this field wear the same abbreviation) - a special discipline, thinking and a set of technical approaches aimed at ensuring the uptime of web products and services. SREs are at the junction of software development and systems engineering, solve operational problems, and develop scalable, reliable, and efficient solutions for designing, building, and operating large-scale distributed systems. 3r3167.  3r3182. 3r3167.  3r3182. SRE main tasks:
 3r3182. 3r3167.  3r3182. 3r375.  3r3182.
Monitoring and collecting metrics 3r3355. - determining the desired behavior of the service, the study of the actual behavior of the service and the elimination of differences.
 3r3182.
[i] Incident Response
- detection and effective response to service failures in order to maintain compliance with service availability with its SLA (service-level agreement).
 3r3182.
Capacity Planning 3r3355. - forecasting future demand and providing the necessary amount of computing resources in the respective locations to meet this demand.
 3r3182.
[i] Scaling service
- predictable deployment and removal of computing capacity of the service in the data center, often as a result of capacity planning.
 3r3182.
Change Management - change the behavior of the service without losing its reliability.
 3r3182.
[i] Performance 3r3355. - design, development and engineering associated with scaling, isolation, delays, bandwidth and efficiency.
 3r3182. 3r33939.
3r33170. 3r3167.  3r3182. SRE focuses on the life cycle of services: from concept and design to deployment, operation, improvement, and ultimately decommissioning. 3r3167.  3r3182. 3r3167.  3r3182. Before launching the service, SRE support it, providing advice in the field of system architecture, develop software platforms, frameworks and capacity plans, conduct a launch review. 3r3167.  3r3182. 3r3167.  3r3182. When the service is already running, SRE supports it as follows:
 3r3182. 3r3167.  3r3182. 3r375.  3r3182.
Measures and monitors availability, delays and overall system status.
 3r3182.
Check for scheduled system changes.
 3r3182.
Scale the stability of the system through some mechanisms, such as automation.
 3r3182.
Improve the system by promoting changes aimed at improving reliability and speed.
 3r3182.
Conduct response to incidents and “non-accusable” post-mortem.
 3r3182. 3r33939. 3r3167.  3r3182. When a service’s life comes to an end, SRE takes it out of service in a predictable way, with a clear explanation and complete documentation. 3r3167.  3r3182. 3r3167.  3r3182. In a mature SRE team, there is always full documentation for each SRE function. If you manage a SRE team or plan to organize it, this article will help you understand the types of documentation your team needs, which will allow you to plan and prioritize your work on documentation in parallel with other tasks of the team. 3r3167.  3r3182. 3r3167.  3r3182. 3r3149. The story of SRE
3r3167.  3r3182. Before discussing the nuances of SRE documentation, let's take a look at the day in the life of Zoe, the newly created SRE. 3r3167.  3r3182. 3r3167.  3r3182. There is a second change of Zoe in the role of SRE on the flagship project AcmeSale in Acme Inc. While she only adapts to the team, oversees the work of her colleagues and takes notes. But now she still has a pager. 3r3167.  3r3182. 3r3167.  3r3182. As luck would have it, the pager calls at 2:30 in the morning. The message says “Job Ragnarok leaned back”, Zoe has no idea what that means. She scrolls through her notes and finds a link to the main dashboard page. Everything looks OK. She is trying to find a document on the Acme intranet that refers to Ragnarok, and after a few precious minutes she finds an outdated document on the service architecture, which turns out to be a critical dependency for AcmeSale. 3r3167.  3r3182. 3r3167.  3r3182. Fortunately, in dizdok there is a link to the “Ragnarok Ops” page, which has a link to dashboards with useful graphs. The page also mentions the ragtool script, which is probably capable of helping with a solution to the problem, but Zoe is hearing about it for the first time. Therefore, it sends a request for help to a pager to another SRE with many years of experience in this service and tools. Unfortunately, there is no answer. Zoe checks mail and sees a message that her colleague is offline for a full hour due to health problems. After weighing all the pros and cons, she calls her technical list, but the call goes to voice mail. Everything suggests that it is necessary to solve this problem independently. 3r3167.  3r3182. 3r3167.  3r3182. After spending some time searching for information about the mysterious ragtool script, she finds a document with a brief description of its command line parameters and where to find it. She runs ragtool —restart and in the hope of crossing her fingers. Nothing changes, traffic drops even more. She desperately scans the rest of the command-line options, but she’s not sure that they’ll do any more harm. Finally, she decides to use ragtool —rebalance e — dc = atlanta, because according to the charts it is clear that the problem is especially noticeable in the data center of Atlanta. The traffic schedule begins to slowly creep up, and Zoe rejoices in the victory. MTTR (mean time to repair, average time to restore service to working condition) is 45 minutes. 3r3167.  3r3182. 3r3167.  3r3182. The next day, Zoe conducts a post-mortem discussion of this incident. This is because the problem turned out to be particularly large and turned into a loss of income, plus the manager asks for more post-mortems. She asks the team how the rest of its members would solve this problem, and hears three different approaches. It turns out that a single troubleshooting process simply does not exist. Also, her colleagues admit that the notice “lay back” is not the best name, and the failure occurred due to a known bug that simply was not a priority. 3r3167.  3r3182. 3r3167.  3r3182. Finally, Steve, her techlide, asks: “What version of ragtool did you take?”, And then notes that the version used is terribly old. The new version was released a week ago, along with completely new documentation that describes all the features and even explains how to solve the problem “Job Ragnarok leaned back”. This version would reduce MTTR to five minutes. 3r3167.  3r3182. 3r3167.  3r3182. The existence of a new version of ragtool turns out to be a surprise for half the team, while the other half more or less know about the new version and the guide. The latest version of the script is in Steve's home directory, obviously in the bin /folder. Zoe adds this to her notes for future use, hoping to quietly refine the rest of the shift. She wonders whether Techlid or anyone on the team will deal with the problems discussed on the post-mortem, or the whole future SRE will have to endure such a painful experience. 3r3167.  3r3182. Later that day, Zoe participates in a meeting where the SRE team communicates with the development team about the service handover. Steve manages the meeting, asks several earlier questions about operational procedures and the current issue of service reliability, asks developers to make changes before the SRE team can take responsibility for the service. Zoe was already at several rallies held by Steve and other senior SREs. She understands that the questions and tasks assigned to the developers vary greatly, depending on who holds the meeting and what problem the SRE team dealt with last week. 3r3167.  3r3182. 3r3167.  3r3182. Zoe secretly dreams of more consistent standards and procedures, but does not yet understand how to arrive at this goal. Later, she hears the two developers laughing about the coffee machine, that many questions were loosely connected with the pager, and they don’t understand at all where they came from. Zoe dreams that developers understand that SRE is not only carrying a pager. Returning to the workplace, Zoe finds several tickets that need to be disassembled, and no longer thinks about it. 3r3167.  3r3182. 3r3167.  3r3182. Fortunately, all the characters and events of this story are invented. Nevertheless, think about it, but does it look like something that you have encountered in reality. The solution to the problems of this fictional team is very obvious, and in the next section we will discuss it in more detail. 3r3167.  3r3182. 3r3167.  3r3182. 3r3149. The importance of documentation 3r3-350. 3r3167.  3r3182. In the early stages of a SRE team, an organization is highly dependent on the work of individual highly skilled individuals within a team. The team stores important concepts and principles of exploitation as crumbs of “tribal knowledge”, orally transmitted to new team members. If these principles are not unified and not documented, most likely, at some point they will have to be painfully re-taught by trial and error. Sometimes team members perform operational procedures as a strict sequence of steps defined by their predecessors in the distant past, without even understanding the causal relationships of these steps. If this is not stopped, the processes become fragmented and degenerate, as soon as the team starts to grow to solve new problems. 3r3167.  3r3182. 3r3167.  3r3182. The SRE team can prevent this process by creating high-quality documentation that will serve as the foundation for the growth of such teams and the introduction of a systematic approach to managing new and unfamiliar services. These documents preserve tribal knowledge in the form in which they are easy to find, maintain and search for them. New team members are trained through a systematic and well-designed program. These are the hallmarks of a mature SRE team. 3r3167.  3r3182. 3r3167.  3r3182. The remainder of this article describes the different types of documents that SRE create during the life cycle of a supported service. 3r3167.  3r3182. 3r3167.  3r3182. THE END
 3r3182. 3r3167.  3r3182. In the next part, we will look at all these types in detail, but for now we are waiting for your comments and question, as well as inviting us to r3r3169. open lesson
. 3r3178. 3r3182. 3r3182. 3r3182. 3r33175. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r3176. 3r3182. 3r3178. 3r3182. 3r3182. 3r3182. 3r3182.

It may be interesting

  • Comments
  • About article
  • Similar news
This publication has no comments.

weber

Author

14-11-2018, 10:30

Publication Date

Administration / DevOps

Category
  • Comments: 0
  • Views: 303
Razor support in Visual Studio Code
How to run SQL Profiler Trace at night,
Quantum networks: prospects and
Amplifiers of low frequency classes: A,
Is it time to upgrade?
How to achieve the first positions in
Write a comment
Name:*
E-Mail:


Comments
this is really nice to read..informative post is very good to read..thanks a lot! How is the cost of house cleaning calculated?
Today, 17:14

Legend SEO

It’s very informative and you are obviously very knowledgeable in this area. You have opened my eyes to varying views on this topic with interesting and solid content.

entegrasyon programları
Today, 17:09

taxiseo2

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.

entegrasyon programları
Today, 17:02

taxiseo2

I found so many interesting stuff in your blog especially its discussion. From the tons of comments on your articles, I guess I am not the only one having all the enjoyment here! keep up the good work...먹튀

Today, 16:50

raymond weber

Lose Weight Market provides the best fitness tips, workout guides, keto recipes and diet plans, yoga workout routine and plans, healthy recipes, and more! Check Out: Lose Weight Market


Corvus Health provides medical training services as well as recruiting high quality health workers for you or placing our own best team in your facility. Check Out: Health Workforce Recruitment


Today, 19:37

noorseo

Adv
Website for web developers. New scripts, best ideas, programming tips. How to write a script for you here, we have a lot of information about various programming languages. You are a webmaster or a beginner programmer, it does not matter, useful articles will help to make your favorite business faster.

Login

Registration Forgot password