Blockchain Analysis

Using SPark and Openshift


Jirka Kremser

6th Oct 2017


  • Blockchain 101
  • Blockchain to graph
  • Graph data in Spark
  • DEMO

What is Blockchain

  • Distributed ledger
  • Linked list of blocks
  • Trust stems from Merkle trees and proof of work (aka mining)
  • Cryptography

What is Block

  • Set of transactions approved at once
  • Metadata
  • Hard limit 1 MB (*)

What is Transaction

  • (INs, OUTs)
  • sum of INs ≥ sum of OUTs
  • confirming ~ including it to a new block and finding the correct Nonce



What is Transaction (more general case)

Transactions to Graph

  • M:N transactions produces a lot of edges
  • Apache Parquet
  • blockchain binary data -> parquet converter

Spark  Graphs

  • GraphX
  • GraphFrames
  • built-ins (label propagation, pagerank, triangles, bfs, etc.)
  • motif ~ cypher
  • fast with unanchored queries
  • Pregel

Talk is Cheap

Demo time


  • Blockchain is out there
  • GraphFrames vs GraphX
  • Reproducible experiments with notebooks, Docker and OpenShift
  • Good representation of data may be crucial

How to get started

More projects, tutorials and examples can be found at

This presentation


Thank You!