Setting smaller buffer size for sys.stdin?

Posted on

Question :

Setting smaller buffer size for sys.stdin?

I’m running memcached with the following bash command pattern:

memcached -vv 2>&1 | tee memkeywatch2010098.log 2>&1 | ~/bin/memtracer.py | tee memkeywatchCounts20100908.log

to try and track down unmatched gets to sets for keys platform wide.

The memtracer script is below and works as desired, with one minor issue. Watching the intermediate log file size, memtracer.py doesn’t start getting input until memkeywatchYMD.log
is about 15-18K in size. Is there a better way to read in stdin or perhaps a way to cut the buffer size down to under 1k for faster response times?

#!/usr/bin/python

import sys
from collections import defaultdict

if __name__ == "__main__":


    keys = defaultdict(int)
    GET = 1
    SET = 2
    CLIENT = 1
    SERVER = 2

    #if <
    for line in sys.stdin:
        key = None
        components = line.strip().split(" ")
        #newConn = components[0][1:3]
        direction = CLIENT if components[0].startswith("<") else SERVER

        #if lastConn != newConn:        
        #    lastConn = newConn

        if direction == CLIENT:            
            command = SET if components[1] == "set" else GET
            key = components[2]
            if command == SET:                
                keys[key] -= 1                                                                                    
        elif direction == SERVER:
            command = components[1]
            if command == "sending":
                key = components[3] 
                keys[key] += 1

        if key != None:
            print "%s:%s" % ( key, keys[key], )
Asked By: David

||

Answer #1:

You can completely remove buffering from stdin/stdout by using python’s -u flag:

-u     : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
         see man page for details on internal buffering relating to '-u'

and the man page clarifies:

   -u     Force stdin, stdout and stderr to  be  totally  unbuffered.   On
          systems  where  it matters, also put stdin, stdout and stderr in
          binary mode.  Note that there is internal  buffering  in  xread-
          lines(),  readlines()  and  file-object  iterators ("for line in
          sys.stdin") which is not influenced by  this  option.   To  work
          around  this, you will want to use "sys.stdin.readline()" inside
          a "while 1:" loop.

Beyond this, altering the buffering for an existing file is not supported, but you can make a new file object with the same underlying file descriptor as an existing one, and possibly different buffering, using os.fdopen. I.e.,

import os
import sys
newin = os.fdopen(sys.stdin.fileno(), 'r', 100)

should bind newin to the name of a file object that reads the same FD as standard input, but buffered by only about 100 bytes at a time (and you could continue with sys.stdin = newin to use the new file object as standard input from there onwards). I say “should” because this area used to have a number of bugs and issues on some platforms (it’s pretty hard functionality to provide cross-platform with full generality) — I’m not sure what its state is now, but I’d definitely recommend thorough testing on all platforms of interest to ensure that everything goes smoothly. (-u, removing buffering entirely, should work with fewer problems across all platforms, if that might meet your requirements).

Answered By: Alex Martelli

Answer #2:

You can simply use sys.stdin.readline() instead of sys.stdin.__iter__():

import sys

while True:
    line = sys.stdin.readline()
    if not line: break # EOF

    sys.stdout.write('> ' + line.upper())

This gives me line-buffered reads using Python 2.7.4 and Python 3.3.1 on Ubuntu 13.04.

Answered By: Søren Løvborg

Answer #3:

The sys.stdin.__iter__ still being line-buffered, one can have an iterator that behaves mostly identically (stops at EOF, whereas stdin.__iter__ won’t) by using the 2-argument form of iter to make an iterator of sys.stdin.readline:

import sys

for line in iter(sys.stdin.readline, ''):
    sys.stdout.write('> ' + line.upper())

Or provide None as the sentinel (but note that then you need to handle the EOF condition yourself).

Answered By: Antti Haapala

Answer #4:

This worked for me in Python 3.4.3:

import os
import sys

unbuffered_stdin = os.fdopen(sys.stdin.fileno(), 'rb', buffering=0)

The documentation for fdopen() says it is just an alias for open().

open() has an optional buffering parameter:

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer.

In other words:

  • Fully unbuffered stdin requires binary mode and passing zero as the buffer size.
  • Line-buffering requires text mode.
  • Any other buffer size seems to work in both binary and text modes (according to the documentation).
Answered By: Denilson Sá Maia

Answer #5:

It may be that your troubles are not with Python but with the buffering that the Linux shell injects when chaining commands with pipes. When this is the problem, the input is not buffered by line, but by 4K block.

To stop this buffering, precede the command chain with the unbuffer command from the expect package, such as:

unbuffer memcached -vv 2>&1 | unbuffer -p tee memkeywatch2010098.log 2>&1 | unbuffer -p ~/bin/memtracer.py | tee memkeywatchCounts20100908.log

The unbuffer command needs the -p option when used in the middle of a pipeline.

Answered By: EvertW

Answer #6:

The only way I could do it with python 2.7 was:

tty.setcbreak(sys.stdin.fileno())

from Python nonblocking console input . This completly disable the buffering and also suppress the echo.

EDIT: Regarding Alex’s answer, the first proposition (invoking python with -u) is not possible in my case (see shebang limitation).

The second proposition (duplicating fd with smaller buffer: os.fdopen(sys.stdin.fileno(), 'r', 100)) is not working when I use a buffer of 0 or 1, as it is for an interactive input and I need every character pressed to be processed immediatly.

Answered By: calandoa

Leave a Reply

Your email address will not be published.