oVirt

A short story about

GOOD

t

h

e

BAD

t

h

e

UGLY

t

h

e

&

Infra

Who's that guy talking?

David Caro

  • oVirt Infra maintainer
  • Continuous Integration Engineer for the RHEV project at Red Hat

dcaroest@redhat.com|david@dcaro.es


dcaro @irc

Index

 

  • What is oVirt?
  • Infra
  • Continuous integration
  • Builds
  • Release
  • Questions

oVirt? Never heard of that...

  • oVirt is a virtual data center manager
  • Delivers powerful management of multiple virtual machines on multiple hosts
  • Uses KVM and libvirt underneath.
  • Can be installed on Fedora, CentOS or RHEL (Debian comming soon...)

Rough architecture overview

Everybody likes screenshots

INFRA

Quick infra overview

Hosting

Support services

GERRIT

Infra: Hosting bureaucracy

Fast response and smooth on easy tickets

  • Open standard port to specific ips
  • Reboot machines
  • ...

 

Looooooooong and painful on non trivial tasks

Infra : Support bureaucracy (cont.)

Infra: No performance monitoring

Not enough time

Not critical enough

Same old same old

}

Infra: no orchestration

There is was urgent reason to it (not enough hosts)

Infra: cfg mgmt data not in vcs

Hiera

Solution:

Infra: no final network @PHX lab

  • ovirt.org zone DNS
  • PHX lab support
  • PHX lab ipmi access
  • Openshift account
  • Amazon account

Infra: some services are restricted

Only Red Hat or selected people

Infra: no storage LB/Horizontal scaling

11 TB Hardware RAID 5

Two servers

Ext4

+

                   Master-slave

=

No Horizontal Scaling

+

No Load Balancing

Infra: not everything in cgf mgmt

Pacemaker

Crm

DRBD

oVirt

...

Infra: Big iron

2x :

  • 32 cores Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
  • 64 GB RAM
  • 11 TB Hardware Raid 5 hard drives
  • 4 x 1 GB NICs

8x :

  • 32 cores Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
  • 64 GB RAM
  • 1 TB Hardware Raid 5 hard drives
  • 4 x 1 GB NICs

Plus alterway

Plus linode

Plus Amazon vm

Plus openshift

Infra: Fault tolerant NFS

Block device redundancy

Resource clustering

  • IP
  • NFS server
  • DRBD master/slave

Quadruplicated NICs with link aggregation - 802.3ad (LACP)

Network redundancy

Network throughput

Infra: Basic monitoring

Infra: oVirt hosted engine

  • High available engine
  • No single point of failure (well, almost)

Continuos Integration

CI: flow overview

Sends patch

Triggers jobs

Gives feedback

Merge patch

Triggers jobs

Repeat if needed

+1/-1...

Run tests + build packages

Gives feedback

Runs checks

CI: No merge gates

Jenkins verification required on gerrit

Current status:

  • Testing after merge
  • Testing on each patchset

Some solutions:

CI: No merge gates (cont.)

Zuul:

CI badness: No CI pipeline

  • No test dependencies
  • All tests rebuild any artifacts needed
  • A bit harder to troubleshoot (complicates visualization)
  • Run all the tests, all the time, until finished

Jenkins Build Flow

Jenkins Flow Jobs (alpha)

Possible helper tools:

CI : No common test process

$ mvn test
$ make check
make maven BUILD_GWT=0 BUILD_UT=1 EXTRA_BUILD_FLAGS="-P enable-dao-tests \
    -s ${ALTERNATE_MVN_SETTINGS_FILE} \
    -D engine.db.url=jdbc:postgresql://localhost/${DB_NAME}"
## pyflakes
find . -iname *py | xargs pyflakes | awk -F\: '{printf "%s:%s: [E]%s\n", $1, $2, $3}' > pyflakes.report
results=(${PIPESTATUS[@]})
for res in ${results[@]}; do
  rc=$(($rc + $res))
done
if [[ "${results[1]}" -ne 123 ]] && [[ "$rc" -ne 0 ]]; then
  exit $rc
fi

## pep8
rc=0
find . -iname *py | xargs pep8 > pep8.report
results=(${PIPESTATUS[@]})
for res in ${results[@]}; do
  rc=$(($rc + $res))
done
if [[ "${results[1]}" -ne 123 ]] && [[ "$rc" -ne 0 ]]; then
  exit $rc
fi
exit 0
export NOSE_SKIP_STRESS_TESTS=1
# disable pep8 checks on unit test, since we have seperate job on it.
export PEP8=$(which true)
export PYFLAKES=$(which true)

sh -x autogen.sh --system
make all
make check

CI: No automatic dependency resolution

Tox

Docker

  • Binds test code to devel code
  • Creates inter-repo dependencies
  • Requires close devel-infra sync
  • Complicates a LOT pre-merge tests

Helpful tools:

Mock

CI: Using mock as os-level abstraction

Some tests depend on custom slave setup (packages, databases, users...)

Ephemeral slaves:

  • One slave per test
  • Cleanup is just remove vm
  • No problem installing packages, creating devices, loading kernel modules ...

CI: False code failures

Tried to isolate with groovy & job-states

  • Test output is not standardized
  • Tests process is not standardized
  • Hard maintenance

Integrating is more about building standards than adapters

CI: automated bug lifecycle

On Bugzilla:

On gerrit:

CI: Quick slave setup

+

=

+

Slave up in less than 2 mins!

CI: Version controlled tests

Jenkins Job Builder

- project:
    name: projectA
    dist:
        - fc19
        - fc20
        - el6
    jobs:
      - '{name}-wahtever-{dist}' 
- job-template:
    name: '{name}-whatever-{dist}'
    node: 'slave-{dist}'
    triggers:
      - timed: '@midnight'
    builders:
      - shell: make whatever
      - shell: make clean
    publishers:
      - archive:
           artifacts: '*.log'

Generic job setup

Explode the template

CI: Per patch tests/builds

Builds

Build: Building once per test

Leads to:

  • Not really testing what you release
  • Extra time lost in build

Copy artifacts from job to job

Use central artifact repository

Possible solutions:

Build: Complicated build scripts

That made it very complicated:

Autotools -runs-> shell scripts -expand-> templates -generate> makefiles -run-> sed scripts -generate-> shell scripts -run-> maven -runs-> shell scripts -run-> spec file -used_by-> rpmbuild -runs-> shell scripts....

Started using autotools GNU make as build interface

Build: Dependency resolution cycle

Spec file

Autotools

Generated by

Dependencies defined in

Build: Dependency resolution cycle

Build goodness: mock

  • Big community
  • Heavily used in production
  • Focused on rpms
  • Uses chroot to containerize
  • Any build job can run on any slave

Release

Release: Custom manual builds on releases

  • Prone to human errors
  • Not testing released packages until last minute
  • Manual release process (error prone)
  • Code packaged might not be in the repo

Release: version != code

  • Unable to know which code gets included in which version
  • Harder to troubleshoot
  • You need an external source of information to relate code <-> package

Release: asynchronous repository release

  • You don't  know if it went well from jenkins
  • You don't really know when it will be deployed (hard to orchestrate)

 

  • Easier to secure
  • No direct access from jenkins to the repos (harder to propagate errors)

Release: automated test builds

  • For each merge a package gets built
  • For almost each patch, a package gets built
  • No broken packages
  • Can be used for further tests (build once)

 

  • Using mock to emulate different distros/versions
  • It also ensure the environment is minimal to check dependency issues

Release: automated nightly releases

  • Tests repository closure
  • Enables us to test (almost) latest version of every package together before final release composition
  • Allows brave users to use latest and greatest

Questions?

dcaroest@redhat.com|david@dcaro.es


dcaro @irc

oVirt Infra

By David Caro

oVirt Infra

  • 742
Loading comments...

More from David Caro