How your Python program behaves

a story on how to build a program slicer

Róman Joost - Pycon-AU 2015 - Brisbane

Me

  • Python Developer with background in building web apps with Zope

  • Contributor to GIMP

  • Working at Red Hat on internal tools

Educational Projects

Primary purpose: learning experience

No shipPing dates!!

Takeaways

1. Find a study topic

2. Build tools to help yourself

3. Keep a diary

Find a study Topic

Something out of your Comfort zone

Watch out

Easy to get side tracked

Static Analysis

What general Errors can we deduce from code alone?

Reasoning

or why program slicing?

def get_primes_wrong(limit=50):
    primes = [2, 3]
    for x in xrange(4, limit + 1):
        if x % 2 == 0 or x % 3 == 0:
            continue
        primes.append(x)
    return primes
def get_primes_old(limit=50):
    potentials = []
    primes = []
    for i in xrange(2, limit + 1):
        potentials.append(i)
    potentials.reverse()
    while potentials:
        x = potentials.pop()
        primes.append(x)
        new = []
        for p in potentials:
            if p % x != 0:
                new.append(p)
        potentials = new

    return primes
# https://github.com/fabric/fabric/blob/master/fabric/operations.py
def _run_command(command, shell=True, pty=True, combine_stderr=True,
    sudo=False, user=None, quiet=False, warn_only=False, stdout=None,
    stderr=None, group=None, timeout=None, shell_escape=None):
    """
    Underpinnings of `run` and `sudo`. See their docstrings for more info.
    """
    manager = _noop
    if warn_only:
        manager = warn_only_manager
    # Quiet's behavior is a superset of warn_only's, so it wins.
    if quiet:
        manager = quiet_manager
    with manager():
        # Set up new var so original argument can be displayed verbatim later.
        given_command = command

        # Check if shell_escape has been overridden in env
        if shell_escape is None:
            shell_escape = env.get('shell_escape', True)

        # Handle context manager modifications, and shell wrapping
        wrapped_command = _shell_wrap(
            _prefix_commands(_prefix_env_vars(command), 'remote'),
            shell_escape,
            shell,
            _sudo_prefix(user, group) if sudo else None
        )
        # Execute info line
        which = 'sudo' if sudo else 'run'
        if output.debug:
            print("[%s] %s: %s" % (env.host_string, which, wrapped_command))
        elif output.running:
            print("[%s] %s: %s" % (env.host_string, which, given_command))

        # Actual execution, stdin/stdout/stderr handling, and termination
        result_stdout, result_stderr, status = _execute(
            channel=default_channel(), command=wrapped_command, pty=pty,
            combine_stderr=combine_stderr, invoke_shell=False, stdout=stdout,
            stderr=stderr, timeout=timeout)

        # Assemble output string
        out = _AttributeString(result_stdout)
        err = _AttributeString(result_stderr)

        # Error handling
        out.failed = False
        out.command = given_command
        out.real_command = wrapped_command
        if status not in env.ok_ret_codes:
            out.failed = True
            msg = "%s() received nonzero return code %s while executing" % (
                which, status
            )
            if env.warn_only:
                msg += " '%s'!" % given_command
            else:
                msg += "!\n\nRequested: %s\nExecuted: %s" % (
                    given_command, wrapped_command
                )
            error(message=msg, stdout=out, stderr=err)

        # Attach return code to output string so users who have set things to
        # warn only, can inspect the error code.
        out.return_code = status

        # Convenience mirror of .failed
        out.succeeded = not out.failed

        # Attach stderr for anyone interested in that.
        out.stderr = err

        return out

Reasoning gets harder with increasing complexity

def get_primes(limit=50):
    primes = [2, 3]
    for x in xrange(4, limit + 1):
        if x % 2 == 0 or x % 3 == 0:
            continue
        primes.append(x)
    return primes

def get_primes(limit=50):
    primes = [2, 3]
    for x in xrange(4, limit + 1):
        if x % 2 == 0 or x % 3 == 0:
            continue
        primes.append(x)
    return primes

def get_primes_old(limit=50):
    potentials = []
    primes = []
    for i in xrange(2, limit + 1):
        potentials.append(i)
    potentials.reverse()
    while potentials:
        x = potentials.pop()
        primes.append(x)
        new = []
        for p in potentials:
            if p % x != 0:
                new.append(p)
        potentials = new

    return primes

def get_primes_old(limit=50):
    potentials = []
    primes = []
    for i in xrange(2, limit + 1):
        potentials.append(i)
    potentials.reverse()
    while potentials:
        x = potentials.pop()
        primes.append(x)
        new = []
        for p in potentials:
            if p % x != 0:
                new.append(p)
        potentials = new

    return primes

What program statements potentially affect the values of the variables at program point l?

2. Build tools to help yourself

Homework

Does the tool already exist?

Homework

What Problem does it solve?

Use Prototypes...

to explore the problem space

How to slice

  1. Compute the Control Flow Graph

  2. Compute the Data Flow Graph

  3. Compute a Program Dependency Graph based on CFG and DFG

  4. A Slice are all nodes the slicing criterion (initial node) are transitively dependent on

First Implementation

  • Python Program

  • Represents a graph with a dictionary

  • Edges are represented based on variable names

it "works"!

Problems

  • Fiddling with Occurrences of variable names doesn't scale well

  • Variable names don't create a Control Flow Graph

  • Motivation to improve the program further was gone :(

Second Attempt

Graph

Edges are reads/writes of AST nodes

Problems

  • Still not constructing a CFG, Not even data flow Analsysis

  • works only with Python

Homework

  • CourseRA: Algorithms II

    • study graphs

  • Study Haskell

Haskell

NEW PARADIGM

Hoopl

A library to support dataflow analysis and optimization

Compilers!

Why not use their dataflow analysis frameworks?

Focus is

static analysis on intermediate representation (IR) on the way to compiled code

Haskell

How to "measure" progress?

3. Keep a diary

See your own progress

by reading your own diary

Great rubber duck!

Know what to focus on next

Future Plans

Language Independent?

Proper Slicing

using CFG and Data Flow Analysis

References

THANK YOU!

We are Hiring!

Python Software Engineer

How your Python Program behaves

By Roman Joost

How your Python Program behaves

  • 1,068
Loading comments...

More from Roman Joost