Docker, Docker Swarm and Azure guide

Table of contents

Azure configurations. 1

Docker on Ubuntu Server. 1

Connect to Azure Virtual Image for Docker on Ubuntu Server. 3

Container Registry. 4

Build, push and remove images to and from registry. 4

Docker Swarm Configurations. 4

Create a swarm.. 5

Deploy a service. 5

Inspect a service. 5

Tips and Tricks. 5

Docker Swarm.. 5

Spring Boot 6

Docker Swarm.. 6

Notice: This guide assumes you have Docker installed locally for some of the steps to work.

Azure configurations

Docker on Ubuntu Server

First step is to create a new VM supporting docker.

The above is rather self-explanatory.

The only bigger thing is to choose between a password or SSH key. You can generate one with Putty, use an Unix environment or use linux bash support for Windows(type bash in powershell).

Then type ssh-keygen {filename or path + filename}

Then copy the public key from your generatedkeyfile {filename}.pub

Connect to Azure Virtual Image for Docker on Ubuntu Server

ssh -i {filename or path + filename} {username}@{virtual image IP or DNS name}

ssh -i /root/.ssh/mykeyfile dockeradmin

To get the virtual image IP go to the newly created virtual image in Azure UI. In the Azure Portal UI select you resource group where the Docker on ubuntu was created, from there you should be able to find your virtual image. Just press on the virtual image to get a new overview screen of the virtual image.

Next you should see the DNS name and Virtual IP address. Use either with the SSH command.

Remember to access the images from a Azure Container Registry you have to log in to the registry. See Container Registry.

Container Registry

Next create a private container registry for images publishing and management.

Follow this guide from Microsoft, it’s pretty good:

Now to connect to the registry open your terminal/powershell and type:

docker login {registry domain} -u {username} -p {password}

docker login -u xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -p myPassword

Build, push and remove images to and from registry

Once you have your image definition ready use the following command to build a new image:

docker build -t {registry domain}/{image name}:{image version number} .

docker build -t .

Then to push it:

docker push {registry domain}/{image name}:{image version number}

docker push

docker pull {registry domain}/{image name}:{image version number}

docker rmi {registry domain}/{image name}:{image version number}

docker rmi

Docker Swarm Configurations

Create a swarm

This is very simple, just type the following command in the Azure Docker on Ubuntu virtual image:

docker swarm init

This will create you a basic structure for your swarm. More info on the Docker Swarm here:

Deploy a service

So this will create a container and replicate it in your swarm, meaning that the Docker Swarm will automatically create instances as task in your swarm worker nodes of your container.

docker service create –replicas 1 –name {service name} -p {docker network port}:{container internal port} –with-registry-auth {registry domain}/{image name}:{image version number}

docker service create –replicas 1 –name myservice -p 8080:8080 –with-registry-auth

Inspect a service

docker service inspect {service name}

docker service inspect –pretty {service name}

This will give you an overview of the service and it’s configurations.

Run docker service ps {service name} to see which nodes are running the service.

Run docker ps on the node where the task is running to see details about the container for the task.

Docker Stack Deploy

The services can also be created and updated using a Docker Compose file (v3 and above). Deployment using a Docker Compose file is otherwise similar to the steps described above, but instead of creating and updating services, the compose file should be edited and deployed.

Deploying a stack

Before deploying the stack for the first time, you need to authenticate with the container registry:

docker login {registry-domain}

A stack defined in a compose file can be deployed with the following command:

docker stack deploy –with-registry-auth –compose-file {path-to-compose-file} {stack-name}

Updating a stack

An existing stack can be updated with the same command as creating it. However, with-registry-auth option may be omitted.

docker stack deploy –compose-file {path-to-compose-file} {stack-name}

Docker Compose

Using docker compose is rather simple but can be a bit confusing at first. Basically with docker-compose you can specify a cluster of container with apps, dbs etc in one shot. With a single command you can set things up.

Compose file example

To start our you need to create a docker-compose.yml file and start configuring the docker environment.

In the example below the compose file defines two containers/services:

  1. apppostgres
    1. A postgre database
  2. app
    1. A spring boot application using the database

These two services are joined together with a common network and static IP. Notice that the postgre database needs a volumes configuration to keep the data even if the database is shutdown.

Also notice that the spring boot application is linked and put to depend on the database service. An in the datasource URL for the database the configuration refers to the container name of the database service/container.

More details:

version: ‘2’




container_name: apppostgres

restart: always


context: .

dockerfile: postgredockerfile


– 5432:5432





– pgdata:/var/lib/postgresql/data


– mynetwork



container_name: app


– 8080:8080



SPRING_DATASOURCE_URL: jdbc:postgresql://apppostgres:5432/appdb


context: .

dockerfile: Dockerfile


– apppostgres


– "apppostgres"


– mynetwork





driver: bridge



– subnet:




app: 172.10.1.


To use these command navigate to the location where the docker-compose.yml file resides.

To build the environment:

docker-compose build

To start the services(notice the -d parameter, it means to detach from the services once they are up):

Docker-compose up -d

To stop and kill(remove) the services:

Dokcer-compose stop

Dokcer-compose kill

And to update a single service/container, try this link:

Tips and Tricks

Docker Swarm

Send registry authentication details to swarm agents:

docker service update –with-registry-auth {service name}

Publish or remove a port as a node port:

docker service update –publish-add 4578: 4578 {service name}

docker service update –publish-rm 4578 {service name}

Publish a port with a specific protocol:

docker service update –publish-add 4578: 4578 /udp client

Update the used image in the service:

docker service update –image {registry domain}/{image name}:{image version number} {service name}

Add and remove an environmental variable:

docker service update –env-add " MY_DATA_URL =" {service name}

docker service update –env-rm "MY_DATA_URL" {service name}

Connect to the container terminal:

docker exec -it {container ID or name} /bin/sh

Look at the logs from the container:

docker logs –follow {container ID or name}

Spring Boot

Run jar package with spring profile:

java -jar .buildlibsmyapp.jar -d "SPRING_PROFILES_ACTIVE=profile1"

Repackage the spring boot application:

gradle bootrepackage

Docker Swarm

Create a service with environmental variable – spring profile:

docker service create –replicas 1 –name {service name} -e "SPRING_PROFILES_ACTIVE=profile1" -p {docker network port}:{container internal port} –with-registry-auth {registry domain}/{image name}:{image version number}

How to fix internet connectivity errors in Android Studio emulator

This was a strange problem for me. I am not a Java or Android developer but had to create a small Android app to test something. I noticed that no internet connection was available to the emulated device.

After to Googling and wondering I realized that the solution was to:

Disable my ethernet card from my laptop, then restart my emulator. This I had to do because for some reason the emulator or Android studio doesn’t recognize a WiFi card as the primary connection if you have both the ethernet card and the WiFi card enabled. Might be a configuration in Android Studio for make this work but I had to do it from Windows Control Panel: Control Panel\Network and Internet\Network Connections

Azure DocumentDB Code Samples: How to use Azure DocumentDB


I’ve been working a bit on DocumentDB and thought of posting some sample code on how to use it. It might save people time and energy. I have had to work around some issues and headaches.


Notice, there is one thing you should take care of. Define your functions with the async keyword and use the await keyword on async function calls. Failure to do this will result in hanging application code.

Also, make sure that you are not accidentally calling synchronous functions from the Task class or some other place that is related to an async call. This will also hang the application code. Calling the Wait() function is one of the, also calling the Result property in the wrong place will result in the same problem.

A quote on the problem from a site:

“If you call the async method on the SAME thread that you then call Result or Wait() on, you will probably deadlock because once the async task has finished, it will wait to re-acquire the previous thread but it can’t because the thread is blocked on the call to Result/Wait()

you can use async tasks and await to avoid this problem but there is also another clever trick, certainly in newer versions of the .net framework and that is to invoke your async task on another thread, not on the one you are calling your method with. It is as simple as this:

var task = Task.Run(() => myMethodAsync());

which involves the method on a thread from the thread pool. When your calling thread then waits and blocks using Wait() or Result, the async task will NOT need to wait for your thread, it will re-acquire the one from the threadpool, finish and signal your waiting thread to allow it to continue!”


/// <summary>
/// Sample class to be used for object serialization when handling data to the DocumentDB
/// </summary>
public class MyDocumentDBDataContainer
public String Title { get; set; }
public byte[] FileData { get; set; }

public String FileName { get; set; }

public class InnerDataContainer
public String Title { get; set; }
public int SomeNumber { get; set; }

public InnerDataContainer InnerData { get; set; }

public partial class Form1 : Form
/// <summary>
/// The DocumentDB address, end point where it exists
/// </summary>
private const string EndpointUrl = "";

/// <summary>
/// This can be the primary key you get from the Azure DocumentDB settings UI
/// </summary>
private const string AuthorizationKey =

/// <summary>
/// A temp object for holding the documentDB database for processing
/// </summary>
private Database database;

/// <summary>
/// Same as above but for a collection
/// </summary>
private DocumentCollection collection;

public Form1()

private async void button1_Click(object sender, EventArgs e)
Stream myStream = null;
OpenFileDialog openFileDialog1 = new OpenFileDialog();

openFileDialog1.InitialDirectory = "c:\\";
openFileDialog1.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
openFileDialog1.FilterIndex = 2;
openFileDialog1.RestoreDirectory = true;

// Open a file to get some byte data to upload into DocumentDB
if (openFileDialog1.ShowDialog() == DialogResult.OK)
if ((myStream = openFileDialog1.OpenFile()) != null)
using (myStream)
MemoryStream ms = new MemoryStream();
await CreateDocumentClient(ms, openFileDialog1.FileName);
catch (Exception ex)
Exception baseException = ex.GetBaseException();
Console.WriteLine("Error: {0}, Message: {1}", ex.Message, baseException.Message);
catch (Exception ex)
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);

/// <summary>
/// This is the main work horse here. The function will create a database, a collection and a sample document if they do not exist.
/// NOTICE: This is very important, define you functions with the async keyword and use the await keyword on async function calls. Failure to do this will result in haning application code.
/// Also make sure that you are not accidentally calling synchronous functions from the Task class or some other place that is related to a async call. This will also hang the application code.
/// More on this:
/// Also notice that documentDB uses "links" to identify things. You will run into DocumentDB objects and a property SelfLink. This seems to just be a way of how things work.
/// </summary>
/// <param name="fileDataStream"></param>
/// <param name="fileName"></param>
/// <returns></returns>
private async Task CreateDocumentClient(MemoryStream fileDataStream, String fileName)
// Create a new instance of the DocumentClient
var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);

var databaseID = "myDBTest";
var collectionID = "myCollectionTest";

// Get the database and if it does not exist create it
this.database = this.GetDatabase(client, databaseID);
if (database == null)
this.database = await CreateDatabase(client, databaseID);

// Get the collection and if it does not exist then create it
this.collection = this.GetDocumentCollection(client, collectionID);
if(this.collection == null)
this.collection = await this.CreateCollection(client, collectionID);

// Create a temp data container, pass forward to be created in DocumentDB
MyDocumentDBDataContainer data = new MyDocumentDBDataContainer() { Title = fileName, InnerData = new MyDocumentDBDataContainer.InnerDataContainer() { Title = "InnerDataTitle", SomeNumber = 1 }, FileData = fileDataStream.ToArray(), FileName = fileName };
var result = await this.CreateDocument(client, data);

// Get the newly created document. Notice: In these code examples I use a title but you can use any identifier you wish.
var dataFromDocumentDB = this.ReadDocument(client, data.Title);

// Re-Create the file from the byte data from the DocumentDB storage
File.WriteAllBytes(dataFromDocumentDB.FileName, dataFromDocumentDB.FileData);

#region DocumentManagement

private async Task<Document> DeleteDocument(DocumentClient client, String documentTitle)
var documentToDelete =
.Where(e => e.Title == documentTitle)

Document doc = GetDocument(client, documentToDelete.Title);

var result = await client.DeleteDocumentAsync(doc.SelfLink);
return result.Resource;

private async Task<Document> UpdateDocument(DocumentClient client, String documentTitle)
// Update a Document

var singleDocument =
.Where(e => e.Title == documentTitle)

Document doc = GetDocument(client, singleDocument.Title);
MyDocumentDBDataContainer employeUpdated = singleDocument;
singleDocument.InnerData.SomeNumber = singleDocument.InnerData.SomeNumber + 1;
var result = await client.ReplaceDocumentAsync(doc.SelfLink, singleDocument);

return result.Resource;

private Document GetDocument(DocumentClient client, string id)
return client.CreateDocumentQuery(this.collection.SelfLink)
.Where(e => e.Id == id)

private MyDocumentDBDataContainer ReadDocument(DocumentClient client, String documentTitle)
// Read the collection

//var data = client.CreateDocumentQuery<MyDocumentDBDataContainer>(this.collection.SelfLink).AsEnumerable();
//foreach (var item in data)
// Console.WriteLine(item.Title);
// Console.WriteLine(item.FileData);
// Console.WriteLine(item.InnerData.Title);
// Console.WriteLine("----------------------------------");

// Read A Document - Where Name == "John Doe"
var singleDocument =
.Where(e => e.Title == documentTitle)

return singleDocument;

//Console.WriteLine("-------- Read a document---------");

private async Task<Document> CreateDocument(DocumentClient client, object documentObject)

var result = await client.CreateDocumentAsync(collection.SelfLink, documentObject);
var document = result.Resource;

Console.WriteLine("Created new document: {0}\r\n{1}", document.Id, document);
return document;


private async Task<Database> CreateDatabase(DocumentClient client, String databaseID)
Console.WriteLine("******** Create Database *******");

var databaseDefinition = new Database { Id = databaseID };
var result = await client.CreateDatabaseIfNotExistsAsync(databaseDefinition);
var database = result.Resource;

Console.WriteLine(" Database Id: {0}; Rid: {1}", database.Id, database.ResourceId);
Console.WriteLine("******** Database Created *******");

return database;

private DocumentCollection GetDocumentCollection(DocumentClient client, String collectionID)
var collections = client.CreateDocumentCollectionQuery(database.CollectionsLink,
"SELECT * FROM c WHERE = '" + collectionID + "'").AsEnumerable();
if(collections.Count() > 0)
return collections.First();

return null;

private async Task QueryDocumentsWithPaging(DocumentClient client)
Console.WriteLine("**** Query Documents (paged results) ****");
Console.WriteLine("Quering for all documents");

var sql = "SELECT * FROM c";
var query = client.CreateDocumentQuery(collection.SelfLink, sql).AsDocumentQuery();

while (query.HasMoreResults)
var documents = await query.ExecuteNextAsync();

foreach (var document in documents)
Console.WriteLine(" Id: {0}; Name: {1};",,;


private Database GetDatabase(DocumentClient client, String databaseID)
//bool databaseExists = false;
Console.WriteLine("******** Get Databases List ********");

var databases = client.CreateDatabaseQuery().ToList();

foreach (var database in databases)
Console.WriteLine(" Database Id: {0}; Rid: {1}", database.Id, database.ResourceId);
return database;

Console.WriteLine("Total databases: {0}", databases.Count);

return null;

private async Task<DocumentCollection> CreateCollection(DocumentClient client, string collectionId, string offerType = "S1")

Console.WriteLine("**** Create Collection {0} in {1} ****", collectionId,

var collectionDefinition = new DocumentCollection { Id = collectionId };
var options = new RequestOptions { OfferType = offerType };
var result = await

collectionDefinition, options);
var collection = result.Resource;

Console.WriteLine("Created new collection");

return collection;

#region DifferentWaysOfDoingThings
private async Task<Document> CreateDocuments2(DocumentClient client, byte[] fileData)
Console.WriteLine("**** Create Documents ****");

dynamic document1Definition = new
name = "New Customer 1",
address = new
addressType = "Main Office",
addressLine1 = "123 Main Street",
location = new
city = "Brooklyn",
stateProvinceName = "New York"
postalCode = "11229",
countryRegionName = "United States"
fileDataBinary = fileData

Document document1 = CreateDocument2(client, document1Definition);
Console.WriteLine("Created document {0} from dynamic object", document1.Id);

return document1;

private async Task<Document> CreateDocument2(DocumentClient client, object documentObject)

var result = await client.CreateDocumentAsync(collection.SelfLink, documentObject);
var document = result.Resource;

Console.WriteLine("Created new document: {0}\r\n{1}", document.Id, document);
return document;


Ethical Hacking: Terminology – Part 1

I’ve started a new course on ethical hacking to get a better understanding of the internet, software security, personal security etc.

I’ll post a series of posts where I will write down my notes on what I’ve learned.

I’ll start today with some basic terminology:

Term Description
White Hat Hacker People that do hacking to help others, legal and ethical
Black Hat Hacker Unethical and unlegal activities
Grey Hat Hacker Between White and Black hat
Footprinting Information gathering on your target, on your task: like figuring out network related information, or software related details, or getting information from real world things or people. General information gathering in regard to your chosen target
DoS (Just you) Denial Of Service – On person performs a certain amount of request, more than the server can handle, to make the server crash. Servers can handle only a certain amount of requests and so the requests that does not fit into the request pool limit will be dropped out. If the service attack comes from one location/machine this is should not be possible.
DDoS (multiple people) Domain Denial Of Service – When you multiple computers/machines doing the service attack it will be harder for the software to know who to kick out.


The attack is not hard to do but the preparation is hard. You need to have multiple machines and to do this usually you have to infect other computers to create a bot farm of machines.

RAT Remote Administration Tools – For DDoS attacks needed a software that can be distributed upon other computers. This gives you control of a computer and allows you to hide your identity. The operations are not visible to a normal user. You can even hide them so that they do not show in normal operating system diagnostic tools.
FUD ( Anti-virus can not detect) Fully Undetectable – Also needed for DDoS attacks. Not labeled as malicious by anti-virus programs
Fishing Applying a bait and someone acts on it. Example: You get an email from someone and you click on it. Either it uploads something malicious or you do something that compromises your data, security.


Usually these are done so that the links look authentic but once you click on them you are redirected to some other server, which is not the one you would expect.


An easy way to spot these kind of addresses is to look at the address. If it is not from an HTTPS address then you are probably dealing with a false address. HTTPS addresses are much harder to fake.

SQL Injections Passing SQL Queries to HTTP requests. Allowing SQL command to run on a server to get or alter data that is not others wise intended to see or use.
VPN Virtual Private Network – Routing and encrypting traffic data between you and the VPN server/provider. A way of anonymizing yourself.


There is no real easy way to identify you unless the VPN Provider gives up your identity.

Proxy A less reliable way of staying anonymous. You could route your traffic between many proxies but the more proxies you have the harder it is to add new proxies to your traffic. This is mostly because of internet speed limitations, not enough available bandwidth. It will slow down you actions.


You can use free proxies and you can use paid proxies but paid ones leave a trace of whom you are.

Tor Open Source – Another way to hide your identity. Faster than proxies but slower than VPNs. Routes traffic through different routes, routers, places to hide your trace.


There is a very high chance of staying hidden (99.99%), there are tools, ways to find but highly unlikely.

VPS Virtual Private Server – a “security layer”, example: a virtual machine inside an actual machine that serves as a database server for you web server. This is done so that the database is not accessible from the outside directly.


In this way you can be specific who and from where can access that virtual machine.

Key Loggers Tools that are used to extract information from a machine, these needs to be deployed to a machine where the tool gathers key strokes and send that information to a location for analysis.


Key Loggers can extract existing information as well, you can modify the settings of a key logger (what, where, how to act), you can take screenshots, to use a camera on a device, microphone etc.

Terminal An interface to control your operating system. GUI tools are not as nearly as powerful as terminal tools.


Most hacking tools are designed for the terminal. Once you know how to do it in the terminal, you’ll know how to do it in the GUI.

Firewall A firewall is configured through iptable commands.


Linux firewall is open source and it has a HUDE amount of options. On Windows, by default you have some of these options but you will need to buy some package or application to get more options.

Root Kit rootkit is a collection of computer software, typically malicious, designed to enable access to a computer or areas of its software that would not otherwise be allowed (for example, to an unauthorized user) and often masks its existence or the existence of other software.
Reverse-shells There are thousands of Reverse-shells. You have a program that infects another device that program opens up a reverse connection from that device back to you. Therefore, you can keep up controlling an external device.


Usually you need to break through a router first and reconfigure it to give you more access to a network and machines.

C# Parametrized Property

I am currently working on a project where I have to convert some VB.NET code to C#. One of the problems is some of the VB:s features that are not supported on C#, like parameterized properties.

I also wanted to be able to create C# code that is usable in VB in the same way it was earlier. Since parameterized properties are not supported in C# the first impression was to create getter and setter functions but this is clunky in VB and breaks old code which assumes accessing a variable/data in a certain manner.

After some wondering and not accepting defeat by C# I came up with a solution.

It requires the following steps:

  1. Create a new class that represents the property
  2. Use this keyword with [] operator to assign a get and set property (the class will function as the property itself)
  3. To use functionality from the parent class you force the property class with a reference to the parent class. After this, you are able to access functionalities from the parent class as long as the visibility is set to public.

Here is the code:

public class PropertyName
 private PartenClass refPartenClass = null;
 public PropertyName(refPartenClass parentClass)
 if (parentClass == null)
 throw new Exception("parent class parameter can not be null");

 this.refPartenClass = parentClass;

 public string this[string path, String DefaultValue = ""]
 PartenClass item = null;
 item = this.refPartenClass.LocateItem(path, false);
 if (item == null)
 return DefaultValue;
 else if (string.IsNullOrEmpty(item.ItemValue))
 return DefaultValue;
 return item.ItemValue;
 PartenClass item = null;
 item = this.refPartenClass.LocateItem(path, true);
 item.ItemValue = value;

This is then how you use it:

  1. Create a declaration of the property class as a property itself
  2. Instantiate it during the ParentClass constructor and pass the parent as reference

public class ParentClass
public PropertyName Value { get; set; }

private void InitializeParametrizedProperties()
 this.Value = new PropertyName(this);
public ParentClass()


Btw, nothing stops you from overloading the this[] definition to accept a different kind of parameters and return values. Just be careful with the same kind of parameter definitions. The compiler won’t know what to call.

Installing Apache Spark Cluster on Odroid C2 and Raspberry Pi


Summary. 1

Network configurations. 2

Odroid C2 Ubuntu 16.04 Mate. 2

Static IP. 2

Setting the Hostname. 2

Raspberry Pi 3

Static IP. 3

Setting the Hostname. 4

Common configurations. 5

Hosts file editing. 5

Disabling IP6. 5

Firewall 6

Creating an admin user to Spark. 6

Enable SSH communication between nodes. 6

Swap. 7

Installing other packages for Spark. 8

Java Installation. 8

Installing Scala. 9

Scientific Python installation. 9

Installing Apache Spark. 9

Spark Configurations. 10

Slaves. 10 11

spark-defaults.conf. 12 13

Bash configurations. 14

Starting and Stoping Spark. 16

Submiting work to the cluster. 16


For this Spark cluster installation, I used the following packages and tools:

· Oracle Java SDK 8

· Scala 2.12.1

· Apache Spark 2.1.0

Even if you encounter URLs to other versions than those written above, they are not true. I was just lazy changing my own notes, which I have gathered from several sources. So change the files names and URLs to correspond to those you want and need.

Before I go on describing the process, I just want to mention that while it has been a bit difficult doing this installation, this was fun once I got it to work.

My biggest problems where that I come from the Microsoft ecosystem where most things are preaty much behind a UI, you don’t have to understand or know that much necessarily.

Is this good or bad? Well it depends, sometimes having a button that does things is nice but on the other hand, it takes away from actually knowing what you are doing. For example, managing users, privileges, file system etc is a totally different thing on Linux and you actually have to know what you are doing.

I found the experience with Linux very fun and enjoyable. What I struggled the most with was Spark and Hadoop (not the topic of this post). It was difficult to understand which configurations are needed. I had problems getting Hadoop to do anything and the errors whereas usually with any software obscure, or in other words a pain in the ass.

I really had to focus and want to make it to work. I felt like giving up at times.

Anyway learning using Linux was the most fun, so much fun that I ended up installing a Linux distro dual booting with Windows.

Network configurations

Odroid C2 Ubuntu 16.04 Mate

In my situation, I did the configurations through the UI but you can do it through the terminal.

Static IP

To configure a static IP goto: System > Preferences > Internet and Network > Network Proxy

There go to the IPv4 Settings and add your network desired information for the Odroid C2.

Setting the Hostname

GoTo: System > Administration > Network

Raspberry Pi

Static IP

On the top right corner press the mouse second button on the network indicator(the two arrows pointing up and down). Then select “Wireless & Wires Network Settings”. Then select the Interface and eht0 and configure the network configurations you desire.

Setting the Hostname

Goto: ”Start” icon > Preferences > Raspberry Pi Configuration > System tab > Hostname

You might need to restart your system.

Common configurations

Hosts file editing

This configuration needs must be done on every machine. You will need hostnames for your machines so that Spark and Hadoop can properly communicate between machines (nodes) in your cluster.

Type in terminal:

sudo nano /etc/hosts

Add your machine IPs and desired hostnames. My hosts file looks like this:

Notice that I have removed everything else and left the localhost definition. This is a strange this, with Spark I could not get the workers to communicate properly without the localhost definition but with Hadoop it was the other way around, not sure now why this is.

Disabling IP6

After looking several tutorials on installing Hadoop all mentioned that it is a good practice to disable the IP6 support. Apparently Hadoop does not support IP6 properly or at all. Since Apache Spark work ontop of Hadoop then I applied the same method on Spark also.

Open the /etc/sysctl.conf file for editing and add the following at the end of the file:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1


In case you have problems between the nodes in a cluster disable the firewall or allow the nodes to communicate between the nodes with the needed ports.

Creating an admin user to Spark

In terminal create the user, add it to a group and give it admin privileges:

sudo addgroup spark
sudo adduser –ingroup spark spuser
sudo adduser spuser sudo

Login into the user and do everything related to Spark with this user:

su spuser

Enable SSH communication between nodes

This is to avoid using authentication when using Spark. If you do not do this you might run into problems and you also are constantly required to type in the account and password which you want to run Spark on.

Next create the SSH key and add it to the authorized_keys file.

$ cd ~
$ ssh-keygen -t rsa -P “”
$ cat ~/.ssh/ > ~/.ssh/authorized_keys

Verify that the SSH tunnel is working:

ssh localhost

Copy the public keys to the slaves nodes in you cluster

$ ssh-copy-id spuser@raspberrypi01

$ ssh-copy-id spuser@raspberrypi02

Then test the connection to the slaves:

$ ssh raspberrypi01


This is something I had to do for Odroid C2. Even though the Odroid has double the RAM Raspberry Pi has, it also has Ubuntu installed on it which takes up nearly half the memory when booting. The Raspbian OS takes about a little over 150 MB, so about 15 % of the Raspberry Pi’s total RAM.

I ran into problems when I wanted to use the Odroid as the Master and also as a slave node for calculation. Because I wanted to use as much memory as possible on the actual Raspberry Pi nodes I allocated 768 MB, which is OK for the Pi’s but I could not allocate less than 512 MB for Odroid and allocating 512 MB caused the Odroid to swap and since there was no swap created the OS crashed or became unresponsive.

To combat this problem I create a swap for Odroid, the size of the Swap was double the RAM, 4 GB:

The Swap creation guide below is from:

Checking for the Swap Information

Before we begin, we will first check for the swap space available on the server or system

We can use the below command to see that the system is having the swap partition or not

$ free -h

We can also run the below command but if the swap partitions do not exist, we cannot see any information.

$ sudo swapon –s

In the above command, we can see that the swap is not enabled or not configured for this server to configure the swap in this machine. We will first check for the free disk space available with the below command –

$ df –h

Creating a Swap File

As we know the disk space availability, we can go ahead and create a swap file on the filesystem. To create the swap file we can use ‘fallocate’ a package or utility which can create a preallocated size to instantly. As we have a little space on the server will create a swap file with 512 MB size to create a swap file below is the command.

$ sudo fallocate -l 512M /swapfile

And to check the swap file we will use the below command

$ ls -lh /swapfile
-rw-r–r– 1 root root 512M Sep 6 14:22 /swapfile

Enabling the Swap to use the Swap File

Before, we are going to enable the swap, we need to fix the file permission that other than root any others can read/write the file below is the command to change the file permission.

$ sudo chmod 600 /swapfile

Once, we change the permission we will check the file below and execute the below command to check the swap file permissions.

Once, we change the permission we will check the file below and execute the below command to check the swap file permissions.

$  ls -lh /swapfile
-rw——- 1 root root 512M Sep  6 14:22 /swapfile

We will now make this file as a swap space using this below command –

$ sudo mkswap /swapfile
Setting up swapspace version 1, size = 524284 KiB
no label, UUID=d02e2bbb-5fcc-4c7b-9f85-4ae75c9c55f9

Now we will enable the swap by using the below command

$ sudo swapon –s
Filename                                Type            Size    Used    Priority
/swapfile                               file            524284  0       -1

We can also check with free –h commands to see the swap partition

$ free –h

Making the Swap Partition/File to start Permanent

As in the above steps, we have created the swap partition and we are able to use that swap for temporary memory and once the machine is rebooted then the swap, setting will be lost to needed to use this swap file permanently we will make the swap file permanent.

We will edit the /etc/stab and add the information to mount the swap file even if we reboot the machine

$ sudo  vi /etc/fstab

Add the below line to the existing file.

/swapfile            none     swap     sw         0            0

For better performance for using the swap memory, we can do some tweaks.


Installing other packages for Spark

For my installation, I needed a few other packages to make things work:

  • Oracle Java version 8
  • Scala

Java Installation


For a more automatic installation, type the following commands:

$ sudo apt-get install oracle-java8-jdk


$ sudo apt-get update && sudo apt-get install oracle-java8-jdk


$ sudo update-alternatives –config java


For a more manual one got to the Oracle website and download the Linux ARM 64 Hard Float ABI package:


Notice that my Raspberry Pi with Raspbian from December had the 32 bit Java 7 version installed. I used the 64 Bit Java 8 for Odroid C2 and 32 bit for Java 8 for Raspberry Pi.

When you have the package, do the following:

Enter the command to extract jdk-8-linux-arm-vfp-hflt.tar.gz to /opt directory.

$ sudo tar zxvf jdk-8-linux-arm-vfp-hflt.tar.gz -C /opt


Set default java and javac to the new installed jdk8.

$ sudo update-alternatives –install /usr/bin/javac javac /opt/jdk1.8.0/bin/javac 1

$ sudo update-alternatives –install /usr/bin/java java /opt/jdk1.8.0/bin/java 1


$ sudo update-alternatives –config javac

$ sudo update-alternatives –config java


After all, verify with the commands with -version option.

$ java -version

$ javac –version

I also added as the owner the spark user:

$ sudo chown -R spuser:spark jdk1.8.0/

Notice: To make life easier you should add environmental variables to your bashrc file. More on this in the Spark installation portion.

Installing Scala

Navigate to the following URL:

I downloaded the tar package, extracted it to a location and added proper privileges to the spark user:

$ sudo tar zxvf scala-2.12.1.tgz -C /opt

$ sudo chown -R spuser:spark scala-2.12.1/


Notice: To make life easier you should add environmental variables to your bashrc file. More on this in the Spark installation portion.

Scientific Python installation


This is not a requirement but I used these scripts to install Jupyter and Python 3.5 on my cluster nodes:

Installing Apache Spark


Start by downloading you desired package from Apache Spark URL:

Or user wget: wget

Then extract Spark package

$ sudo tar -xvzf spark.2.1.0.tar.gz -C /opt/

Add the Spark User as owner

$ cd /opt
$ sudo chown -R hduser:hadoop spark.2.1.0/

If you are having problem with access to the spark folder or if you are installing Hadoop and have configured namenodes file system locations etc use the following command to add more privileges to the desired locations:

$sudo chmod 750 /opt/hadoop/hadoop_data/hdfs

Spark Configurations

Go to the Spark conf folder, depending where you installed spark:

$ cd /media/microSD/spark/conf

There are four files I had to configure for the cluster to work:

  • Slaves
  • properties
  • spark-defaults.conf

For more info on these files check out Spark documentation:

The first step is to rename some of the files mentioned above. Some of the files have the “.template” file extension on them, remove it:

$ mv

If the slaves file does not exist then using nano you will create is automatically:

$ sudo nano slaves

The above assumes you are in the conf folder.

When you are done with the configurations on your master just copy them with scp to all slave nodes. Make sure you change the node specific values in these files(more on this below).


Here you add the slave machines hostname, or the machines you want to do the work for you, the calculations:




With this file all we want is to minimize the amount of log on screen, it will be much easier to spot what is going on when you are not flooded with basic operational messages.

What you need to do it to change this:

log4j.rootCategory=INFO, console

to this:

log4j.rootCategory=WARN, console


Here we just want to specify the master URL so that we do not always have to specify it when submitting work to the Spark cluster:

spark.master                     spark://odroid64:7077

Here you specify the parameters that your cluster will use to communicate with all the nodes within it:











There are many variables, which you can tweak, I used and had to use the above ones to get things to work.

The SPARK_MASTER_IP and SPARK_MASTER_HOST HAVE to be the same on all nodes. The rest have to correspond to the actual physical node where the configuration file resides.




Bash configurations


For my cluster I used the following configurations(disregard the Hadoop ones, not necessary for Spark):

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121

export HADOOP_HOME=/media/microSD/hadoop-2.7.3





export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop


export HADOOP_OPTS=”$HADOOP_OPTS -Djava.library.path=/media/microSD/hadoop-2.7.3/lib/native/”

#export PATH=$PATH:$JAVA_HOME/bin:/media/microSD/spark/sbin:/media/microSD/spark/sbi


export SBT_HOME=/media/microSD/sbt

export SPARK_HOME=/media/microSD/spark

export SCALA_HOME=/media/microSD/scala-2.12.1

export PATH=$PATH:$JAVA_HOME/bin




The important ones for Spark are:

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121

export SPARK_HOME=/media/microSD/spark

export SCALA_HOME=/media/microSD/scala-2.12.1

export PATH=$PATH:$JAVA_HOME/bin



You want to add Spark, Scala and Java to the PATH environmental variable to be able to access commands easily from Terminal.

You can then copy these configurations to the slave nodes using scp command:

$ scp ~/.bachrc spuser@raspberrypi01:~


To force a refresh of the environmental variables

$ source ~/.bachrc

Starting and Stoping Spark

This is simple.

To start Spark type:


To stop Spark:


To access Spark web UI:



Submiting work to the cluster


Use the Spark specific command:


More info:

Problems running scripts and batch files with Task Scheduler

This is probably one of the most annoying things I’ve encountered. I’ve been trying to run a scheduled task using a cmd file and every time I tried to run it simply didn’t work.

Then thanks to search results someone pointed out a solution and it makes no sense, thank you Microsoft. I guess there might some “logical” explanation but it is a mystery to me. Maybe something to do with UAC?

Anyway, to the solution should you happen to run into the same problem:

Do this when when specifying an action: Specify only the script/batch file name and in the Start IN(Optional) specify the full path where the file is located.