• Guest
HabraHabr
  • Main
  • Users

  • Development
    • Programming
    • Information Security
    • Website development
    • JavaScript
    • Game development
    • Open source
    • Developed for Android
    • Machine learning
    • Abnormal programming
    • Java
    • Python
    • Development of mobile applications
    • Analysis and design of systems
    • .NET
    • Mathematics
    • Algorithms
    • C#
    • System Programming
    • C++
    • C
    • Go
    • PHP
    • Reverse engineering
    • Assembler
    • Development under Linux
    • Big Data
    • Rust
    • Cryptography
    • Entertaining problems
    • Testing of IT systems
    • Testing Web Services
    • HTML
    • Programming microcontrollers
    • API
    • High performance
    • Developed for iOS
    • CSS
    • Industrial Programming
    • Development under Windows
    • Image processing
    • Compilers
    • FPGA
    • Professional literature
    • OpenStreetMap
    • Google Chrome
    • Data Mining
    • PostgreSQL
    • Development of robotics
    • Visualization of data
    • Angular
    • ReactJS
    • Search technologies
    • Debugging
    • Test mobile applications
    • Browsers
    • Designing and refactoring
    • IT Standards
    • Solidity
    • Node.JS
    • Git
    • LaTeX
    • SQL
    • Haskell
    • Unreal Engine
    • Unity3D
    • Development for the Internet of things
    • Functional Programming
    • Amazon Web Services
    • Google Cloud Platform
    • Development under AR and VR
    • Assembly systems
    • Version control systems
    • Kotlin
    • R
    • CAD/CAM
    • Customer Optimization
    • Development of communication systems
    • Microsoft Azure
    • Perfect code
    • Atlassian
    • Visual Studio
    • NoSQL
    • Yii
    • Mono и Moonlight
    • Parallel Programming
    • Asterisk
    • Yandex API
    • WordPress
    • Sports programming
    • Lua
    • Microsoft SQL Server
    • Payment systems
    • TypeScript
    • Scala
    • Google API
    • Development of data transmission systems
    • XML
    • Regular expressions
    • Development under Tizen
    • Swift
    • MySQL
    • Geoinformation services
    • Global Positioning Systems
    • Qt
    • Dart
    • Django
    • Development for Office 365
    • Erlang/OTP
    • GPGPU
    • Eclipse
    • Maps API
    • Testing games
    • Browser Extensions
    • 1C-Bitrix
    • Development under e-commerce
    • Xamarin
    • Xcode
    • Development under Windows Phone
    • Semantics
    • CMS
    • VueJS
    • GitHub
    • Open data
    • Sphinx
    • Ruby on Rails
    • Ruby
    • Symfony
    • Drupal
    • Messaging Systems
    • CTF
    • SaaS / S+S
    • SharePoint
    • jQuery
    • Puppet
    • Firefox
    • Elm
    • MODX
    • Billing systems
    • Graphical shells
    • Kodobred
    • MongoDB
    • SCADA
    • Hadoop
    • Gradle
    • Clojure
    • F#
    • CoffeeScript
    • Matlab
    • Phalcon
    • Development under Sailfish OS
    • Magento
    • Elixir/Phoenix
    • Microsoft Edge
    • Layout of letters
    • Development for OS X
    • Forth
    • Smalltalk
    • Julia
    • Laravel
    • WebGL
    • Meteor.JS
    • Firebird/Interbase
    • SQLite
    • D
    • Mesh-networks
    • I2P
    • Derby.js
    • Emacs
    • Development under Bada
    • Mercurial
    • UML Design
    • Objective C
    • Fortran
    • Cocoa
    • Cobol
    • Apache Flex
    • Action Script
    • Joomla
    • IIS
    • Twitter API
    • Vkontakte API
    • Facebook API
    • Microsoft Access
    • PDF
    • Prolog
    • GTK+
    • LabVIEW
    • Brainfuck
    • Cubrid
    • Canvas
    • Doctrine ORM
    • Google App Engine
    • Twisted
    • XSLT
    • TDD
    • Small Basic
    • Kohana
    • Development for Java ME
    • LiveStreet
    • MooTools
    • Adobe Flash
    • GreaseMonkey
    • INFOLUST
    • Groovy & Grails
    • Lisp
    • Delphi
    • Zend Framework
    • ExtJS / Sencha Library
    • Internet Explorer
    • CodeIgniter
    • Silverlight
    • Google Web Toolkit
    • CakePHP
    • Safari
    • Opera
    • Microformats
    • Ajax
    • VIM
  • Administration
    • System administration
    • IT Infrastructure
    • *nix
    • Network technologies
    • DevOps
    • Server Administration
    • Cloud computing
    • Configuring Linux
    • Wireless technologies
    • Virtualization
    • Hosting
    • Data storage
    • Decentralized networks
    • Database Administration
    • Data Warehousing
    • Communication standards
    • PowerShell
    • Backup
    • Cisco
    • Nginx
    • Antivirus protection
    • DNS
    • Server Optimization
    • Data recovery
    • Apache
    • Spam and antispam
    • Data Compression
    • SAN
    • IPv6
    • Fidonet
    • IPTV
    • Shells
    • Administering domain names
  • Design
    • Interfaces
    • Web design
    • Working with sound
    • Usability
    • Graphic design
    • Design Games
    • Mobile App Design
    • Working with 3D-graphics
    • Typography
    • Working with video
    • Work with vector graphics
    • Accessibility
    • Prototyping
    • CGI (graphics)
    • Computer Animation
    • Working with icons
  • Control
    • Careers in the IT industry
    • Project management
    • Development Management
    • Personnel Management
    • Product Management
    • Start-up development
    • Managing the community
    • Service Desk
    • GTD
    • IT Terminology
    • Agile
    • Business Models
    • Legislation and IT-business
    • Sales management
    • CRM-systems
    • Product localization
    • ECM / EDS
    • Freelance
    • Venture investments
    • ERP-systems
    • Help Desk Software
    • Media management
    • Patenting
    • E-commerce management
    • Creative Commons
  • Marketing
    • Conferences
    • Promotion of games
    • Internet Marketing
    • Search Engine Optimization
    • Web Analytics
    • Monetize Web services
    • Content marketing
    • Monetization of IT systems
    • Monetize mobile apps
    • Mobile App Analytics
    • Growth Hacking
    • Branding
    • Monetize Games
    • Display ads
    • Contextual advertising
    • Increase Conversion Rate
  • Sundry
    • Reading room
    • Educational process in IT
    • Research and forecasts in IT
    • Finance in IT
    • Hakatonas
    • IT emigration
    • Education abroad
    • Lumber room
    • I'm on my way

Python: how to reduce memory consumption by half by adding just one line of code?

Hi habr.
 
 
In one project, where it was necessary to store and process a fairly large dynamic list, testers began to complain about the lack of memory. A simple way to fix the problem with a little blood by adding just one line of code is described below. Result on the picture:
 
Python: how to reduce memory consumption by half by adding just one line of code?  
 
How it works, continued under the cut.
3r3171.
 
 
Consider a simple “learning” example — create a DataItem class containing personal 3r36969. information about the person, such as name, age and address.
 
class DataItem (object):
def __init __ (self, name, age, address):
self.name = name
self.age = age
self.address = address

 
“Children's” question - how much does such an object take up in memory?
 
 
Let's try the solution in the forehead:
 
d1 = DataItem ("Alex", 4? "-") 3r-3272. print ("sys.getsizeof (d1):", sys.getsizeof (d1))
 
We get the answer 56 bytes. It seems a little, quite satisfied.
 
However, we check on another object in which there is more dаta:
 
d2 = DataItem ("Boris", 2? "In the middle of nowhere")
print ("sys.getsizeof (d2):", sys.getsizeof (d2))

 
The answer is again 56. At this moment we understand that something is not right here, and not everything is as simple as it seems at first glance.
 
 
Intuition does not fail us, and everything is really not so simple. Python is a very flexible language with dynamic typing, and it stores for its work. Tsuev huchu A considerable amount of additional data. Which in themselves occupy a lot. Just for example, sys.getsizeof ("") returns 33 - yes, as many as 33 bytes per empty line! And sys.getsizeof (1) will return 24 - 24 bytes for the whole number (I ask programmers in C to move away from the screen and not read further, so as not to lose faith in the beautiful). For more complex elements, such as a dictionary, sys.getsizeof (dict ()) returns 272 bytes - and this is for 3r3623 empty. dictionary. I will not continue further, I hope the principle is clear, 3r3686. Yes, and RAM manufacturers need to sell their chips
.
 
 
But back to our DataItem class and the “childish” question. How much is this class in memory? To begin with, we will output the entire contents of the class at a lower level:
 
def dump (obj):
for attr in dir (obj):
print ("obj.% s =% r"% (attr, getattr (obj, attr)))

 
This function will show what is hidden “under the hood” so that all Python functions (typing, inheritance and other buns) can function.
 
The result is impressive:
 
3r388.
 
 
How much does it all take up entirely? On github, there was a function that counts the actual amount of data, recursively calling getsizeof for all objects.
 
def get_size (obj, seen = None):
# From https://goshippo.com/blog/measure-real-size-any-python-object/
# Recursively finds size of objects
size = sys.getsizeof (obj)
if seen is None:
seen = set ()
obj_id = id (obj)
if obj_id in seen:
return 0
# Important mark as seen * before * entering recursion to gracefully handle
# self-referential objects
Seen.add (obj_id)
if isinstance (obj, dict):
size + = sum ([get_size(v, seen) for v in obj.values()])
size + = sum ([get_size(k, seen) for k in obj.keys()])
elif hasattr (obj, '__dict__'):
size + = get_size (obj .__ dict__, seen)
elif hasattr (obj, '__iter__') and not isinstance (obj, (str, bytes, bytearray)):
size + = sum ([get_size(i, seen) for i in obj])
return size

 
We try it:
 
d1 = DataItem ("Alex", 4? "-") 3r-3272. print ("get_size (d1):", get_size (d1))
d2 = DataItem ("Boris", 2? "In the middle of nowhere")
print ("get_size (d2):", get_size (d2))

 
We get 460 and 484 bytes, respectively, which is more like the truth.
 
 
With this function, you can conduct a series of experiments. For example, I wonder how much space the data will take if the DataItem structures are put in the list. The get_size ([d1]) Function returns 532 bytes — apparently, these are the very same 460 + some overhead. But get_size ([d1, d2]) Returns 863 bytes - less than 460 + 484 individually. Even more interesting is the result for get_size ([d1, d2, d1]) - we get 871 bytes, only slightly more, i.e. Python is smart enough not to allocate memory for the same object a second time.
 
 
Now we come to the second part of the question - is it possible to reduce memory consumption? Yes you can. Python is an interpreter, and we can expand our class at any time, for example, add a new field:
 
d1 = DataItem ("Alex", 4? "-") 3r-3272. print ("get_size (d1):", get_size (d1))
d1.weight = 66
print ("get_size (d1):", get_size (d1))

 
This is great, but if we have [u] not needed 3r3155. this functionality, we can force the interpreter to list the class objects using the __slots__ directive:
 
class DataItem (object):
__slots__ =['name', 'age', 'address']
def __init __ (self, name, age, address):
self.name = name
self.age = age
self.address = address

 
More information can be found in the documentation (3r3-33170. RTFM
), In which it is written that it will be possible to explicitly declare the membership of the __dict__ and __weakref__. significant
".
 
We check: yes, indeed significant, get_size (d1) returns 64 bytes instead of 46? i.e. 7 times less. As a bonus, objects are created about 20% faster (see the first screenshot of the article).
 
 
Alas, with real use of such a large gain in memory will not be due to other overhead costs. Create an array of 10?000 by simply adding elements, and see the memory consumption:
 
data =[]
for p in range (100000):
data.append (DataItem ("Alex", 4? "middle of nowhere"))
snapshot = tracemalloc.take_snapshot ()
top_stats = snapshot.statistics ('lineno')
total = sum (stat.size for stat in top_stats)
print ("Total allocated size:% .1f MB"% (total /(1024 * 1024)))

 
We have 16.8 MB without __slots__ and 6.9 MB with it. Not 7 times of course, but it’s not bad at all, considering that the code change was minimal.
 
 
Now about the shortcomings. Activating __slots__ prohibits the creation of all elements, including __dict__, which means, for example, such a code for translating the structure in json will not work:
 
def toJSON (self):
return json.dumps (self .__ dict__)

 
But it is easy to fix, it is enough to generate your dict programmatically, going through all the elements in the loop:
 
def toJSON (self):
data = dict ()
for var in self .__ slots__:
data[var]= getattr (self, var)
return json.dumps (data)

 
 
It would also be impossible to dynamically add new variables to the class, but in my case this was not required.
 
 
And the last test for today. Let's see how much memory the entire program takes. Add an infinite loop to the end of the program so that it does not close, and look at the memory consumption in the Windows Task Manager.
 
Without __slots__:
 
 
16.8MB miraculously turned into 70MB (did C programmers hope to return to the screen yet?).
 
 
With __slots__ enabled:
 
 
6.9Mb turned into 27Mb well, after all, we saved the memory, 27Mb instead of 70 is not so bad for the result of adding one line of code.
 
 
What if you need to save even more memory? This is possible using the library. numpy , allowing you to create structures in the C-style, but in my case it would require a deeper refinement of the code, and the first method was enough.
 
 
It is strange that the use of __slots__ has never been analyzed in detail on Habré, I hope this article will fill this gap a bit.
 
 
Instead of a conclusion.
 
It may seem that this article is an anti-Python ad, but it’s not at all. Python is very reliable (to “drop” a Python program you need 3r33261. Very 3r33262. To try hard), an easy-to-read and convenient language for writing code These advantages in many cases outweigh the disadvantages, but if you need maximum performance and efficiency, you can use libraries like numpy, written in C ++, which work with data quite quickly and efficiently.
 
 
Thank you all for your attention, and good code :)

It may be interesting

  • Comments
  • About article
  • Similar news
This publication has no comments.

weber

Author

27-10-2018, 01:15

Publication Date

Python / High performance / Programming

Category
  • Comments: 0
  • Views: 429
Use STP to create p2p channels
We push parameters into unsafe
Interest and usefulness of python. Part
All you need to know about the garbage
We break the stack in STM8
Nine questions about working with
Write a comment
Name:*
E-Mail:


Comments
At homeschooling 101, our goal is to empowering parents to develop and provide the ideal school set-up for their children right at the comforts of their home. Check Out: Online Education for Kids
Today, 17:06

noorseo

Major Thanks for the post.p2p4u

Today, 17:02

ss

I genuinely believed you would probably have something useful to say. All I hear is a bunch of whining about something that you can fix if you were not too busy looking for attention. After all, I know it was my choice to read .. [url = https: //gamebnat.net] 먹튀 [/ url]

Today, 15:56

raymond weber

Lots of interesting comments, but it feels like users are really experts in their field, and it's very cool!
Today, 15:49

claudedufont

This is a good idea, thank you very much to the author!
Today, 15:47

claudedufont

Adv
Website for web developers. New scripts, best ideas, programming tips. How to write a script for you here, we have a lot of information about various programming languages. You are a webmaster or a beginner programmer, it does not matter, useful articles will help to make your favorite business faster.

Login

Registration Forgot password