How to fix internet connectivity errors in Android Studio emulator

This was a strange problem for me. I am not a Java or Android developer but had to create a small Android app to test something. I noticed that no internet connection was available to the emulated device.

After to Googling and wondering I realized that the solution was to:

Disable my ethernet card from my laptop, then restart my emulator. This I had to do because for some reason the emulator or Android studio doesn’t recognize a WiFi card as the primary connection if you have both the ethernet card and the WiFi card enabled. Might be a configuration in Android Studio for make this work but I had to do it from Windows Control Panel: Control Panel\Network and Internet\Network Connections

Azure DocumentDB Code Samples: How to use Azure DocumentDB

Hi,

I’ve been working a bit on DocumentDB and thought of posting some sample code on how to use it. It might save people time and energy. I have had to work around some issues and headaches.

 

Notice, there is one thing you should take care of. Define your functions with the async keyword and use the await keyword on async function calls. Failure to do this will result in hanging application code.

Also, make sure that you are not accidentally calling synchronous functions from the Task class or some other place that is related to an async call. This will also hang the application code. Calling the Wait() function is one of the, also calling the Result property in the wrong place will result in the same problem.

A quote on the problem from a site:

“If you call the async method on the SAME thread that you then call Result or Wait() on, you will probably deadlock because once the async task has finished, it will wait to re-acquire the previous thread but it can’t because the thread is blocked on the call to Result/Wait()

you can use async tasks and await to avoid this problem but there is also another clever trick, certainly in newer versions of the .net framework and that is to invoke your async task on another thread, not on the one you are calling your method with. It is as simple as this:

var task = Task.Run(() => myMethodAsync());

which involves the method on a thread from the thread pool. When your calling thread then waits and blocks using Wait() or Result, the async task will NOT need to wait for your thread, it will re-acquire the one from the threadpool, finish and signal your waiting thread to allow it to continue!” http://lukieb.blogspot.fi/2016/07/calls-to-azure-documentdb-hang.html

 


/// <summary>
/// Sample class to be used for object serialization when handling data to the DocumentDB
/// </summary>
public class MyDocumentDBDataContainer
{
public String Title { get; set; }
public byte[] FileData { get; set; }

public String FileName { get; set; }

public class InnerDataContainer
{
public String Title { get; set; }
public int SomeNumber { get; set; }
}

public InnerDataContainer InnerData { get; set; }
}

public partial class Form1 : Form
{
/// <summary>
/// The DocumentDB address, end point where it exists
/// </summary>
private const string EndpointUrl = "https://mydocumentdbtest.documents.azure.com:443/";

/// <summary>
/// This can be the primary key you get from the Azure DocumentDB settings UI
/// </summary>
private const string AuthorizationKey =
"";

/// <summary>
/// A temp object for holding the documentDB database for processing
/// </summary>
private Database database;

/// <summary>
/// Same as above but for a collection
/// </summary>
private DocumentCollection collection;

public Form1()
{
InitializeComponent();
}

private async void button1_Click(object sender, EventArgs e)
{
Stream myStream = null;
OpenFileDialog openFileDialog1 = new OpenFileDialog();

openFileDialog1.InitialDirectory = "c:\\";
openFileDialog1.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
openFileDialog1.FilterIndex = 2;
openFileDialog1.RestoreDirectory = true;

// Open a file to get some byte data to upload into DocumentDB
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
try
{
if ((myStream = openFileDialog1.OpenFile()) != null)
{
using (myStream)
{
try
{
MemoryStream ms = new MemoryStream();
myStream.CopyTo(ms);
await CreateDocumentClient(ms, openFileDialog1.FileName);
}
catch (Exception ex)
{
Exception baseException = ex.GetBaseException();
Console.WriteLine("Error: {0}, Message: {1}", ex.Message, baseException.Message);
}
}
}
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
}
}
}

/// <summary>
/// This is the main work horse here. The function will create a database, a collection and a sample document if they do not exist.
/// NOTICE: This is very important, define you functions with the async keyword and use the await keyword on async function calls. Failure to do this will result in haning application code.
/// Also make sure that you are not accidentally calling synchronous functions from the Task class or some other place that is related to a async call. This will also hang the application code.
/// More on this: http://lukieb.blogspot.fi/2016/07/calls-to-azure-documentdb-hang.html
/// Also notice that documentDB uses "links" to identify things. You will run into DocumentDB objects and a property SelfLink. This seems to just be a way of how things work.
/// </summary>
/// <param name="fileDataStream"></param>
/// <param name="fileName"></param>
/// <returns></returns>
private async Task CreateDocumentClient(MemoryStream fileDataStream, String fileName)
{
// Create a new instance of the DocumentClient
var client = new DocumentClient(new Uri(EndpointUrl), AuthorizationKey);

var databaseID = "myDBTest";
var collectionID = "myCollectionTest";

// Get the database and if it does not exist create it
this.database = this.GetDatabase(client, databaseID);
if (database == null)
{
this.database = await CreateDatabase(client, databaseID);
}

// Get the collection and if it does not exist then create it
this.collection = this.GetDocumentCollection(client, collectionID);
if(this.collection == null)
{
this.collection = await this.CreateCollection(client, collectionID);
}

// Create a temp data container, pass forward to be created in DocumentDB
MyDocumentDBDataContainer data = new MyDocumentDBDataContainer() { Title = fileName, InnerData = new MyDocumentDBDataContainer.InnerDataContainer() { Title = "InnerDataTitle", SomeNumber = 1 }, FileData = fileDataStream.ToArray(), FileName = fileName };
var result = await this.CreateDocument(client, data);

// Get the newly created document. Notice: In these code examples I use a title but you can use any identifier you wish.
var dataFromDocumentDB = this.ReadDocument(client, data.Title);

// Re-Create the file from the byte data from the DocumentDB storage
File.WriteAllBytes(dataFromDocumentDB.FileName, dataFromDocumentDB.FileData);
}



#region DocumentManagement

private async Task<Document> DeleteDocument(DocumentClient client, String documentTitle)
{
var documentToDelete =
client.CreateDocumentQuery<MyDocumentDBDataContainer>(this.collection.SelfLink)
.Where(e => e.Title == documentTitle)
.AsEnumerable()
.First();

Document doc = GetDocument(client, documentToDelete.Title);

var result = await client.DeleteDocumentAsync(doc.SelfLink);
return result.Resource;
}

private async Task<Document> UpdateDocument(DocumentClient client, String documentTitle)
{
// Update a Document

var singleDocument =
client.CreateDocumentQuery<MyDocumentDBDataContainer>(this.collection.SelfLink)
.Where(e => e.Title == documentTitle)
.AsEnumerable()
.First();

Document doc = GetDocument(client, singleDocument.Title);
MyDocumentDBDataContainer employeUpdated = singleDocument;
singleDocument.InnerData.SomeNumber = singleDocument.InnerData.SomeNumber + 1;
var result = await client.ReplaceDocumentAsync(doc.SelfLink, singleDocument);

return result.Resource;
}

private Document GetDocument(DocumentClient client, string id)
{
return client.CreateDocumentQuery(this.collection.SelfLink)
.Where(e => e.Id == id)
.AsEnumerable()
.First();
}

private MyDocumentDBDataContainer ReadDocument(DocumentClient client, String documentTitle)
{
// Read the collection

//var data = client.CreateDocumentQuery<MyDocumentDBDataContainer>(this.collection.SelfLink).AsEnumerable();
//foreach (var item in data)
//{
// Console.WriteLine(item.Title);
// Console.WriteLine(item.FileData);
// Console.WriteLine(item.InnerData.Title);
// Console.WriteLine("----------------------------------");
//}

// Read A Document - Where Name == "John Doe"
var singleDocument =
client.CreateDocumentQuery<MyDocumentDBDataContainer>(this.collection.SelfLink)
.Where(e => e.Title == documentTitle)
.AsEnumerable()
.FirstOrDefault();

return singleDocument;

//Console.WriteLine("-------- Read a document---------");
//Console.WriteLine(singleDocument.Title);
//Console.WriteLine(singleDocument.FileData);
//Console.WriteLine(singleDocument.InnerData.Title);
//Console.WriteLine("-------------------------------");
}

private async Task<Document> CreateDocument(DocumentClient client, object documentObject)
{

var result = await client.CreateDocumentAsync(collection.SelfLink, documentObject);
var document = result.Resource;

Console.WriteLine("Created new document: {0}\r\n{1}", document.Id, document);
return document;
}

#endregion

private async Task<Database> CreateDatabase(DocumentClient client, String databaseID)
{
Console.WriteLine();
Console.WriteLine("******** Create Database *******");

var databaseDefinition = new Database { Id = databaseID };
var result = await client.CreateDatabaseIfNotExistsAsync(databaseDefinition);
var database = result.Resource;

Console.WriteLine(" Database Id: {0}; Rid: {1}", database.Id, database.ResourceId);
Console.WriteLine("******** Database Created *******");

return database;
}

private DocumentCollection GetDocumentCollection(DocumentClient client, String collectionID)
{
var collections = client.CreateDocumentCollectionQuery(database.CollectionsLink,
"SELECT * FROM c WHERE c.id = '" + collectionID + "'").AsEnumerable();
if(collections.Count() > 0)
return collections.First();

return null;
}

private async Task QueryDocumentsWithPaging(DocumentClient client)
{
Console.WriteLine();
Console.WriteLine("**** Query Documents (paged results) ****");
Console.WriteLine();
Console.WriteLine("Quering for all documents");

var sql = "SELECT * FROM c";
var query = client.CreateDocumentQuery(collection.SelfLink, sql).AsDocumentQuery();

while (query.HasMoreResults)
{
var documents = await query.ExecuteNextAsync();

foreach (var document in documents)
{
Console.WriteLine(" Id: {0}; Name: {1};", document.id, document.name);
}
}

Console.WriteLine();
}

private Database GetDatabase(DocumentClient client, String databaseID)
{
//bool databaseExists = false;
Console.WriteLine();
Console.WriteLine();
Console.WriteLine("******** Get Databases List ********");

var databases = client.CreateDatabaseQuery().ToList();

foreach (var database in databases)
{
Console.WriteLine(" Database Id: {0}; Rid: {1}", database.Id, database.ResourceId);
return database;
}

Console.WriteLine();
Console.WriteLine("Total databases: {0}", databases.Count);

return null;
}

private async Task<DocumentCollection> CreateCollection(DocumentClient client, string collectionId, string offerType = "S1")
{

Console.WriteLine();
Console.WriteLine("**** Create Collection {0} in {1} ****", collectionId,
database.Id);

var collectionDefinition = new DocumentCollection { Id = collectionId };
var options = new RequestOptions { OfferType = offerType };
var result = await

client.CreateDocumentCollectionAsync(database.SelfLink,
collectionDefinition, options);
var collection = result.Resource;

Console.WriteLine("Created new collection");
//ViewCollection(collection);

return collection;
}

#region DifferentWaysOfDoingThings
private async Task<Document> CreateDocuments2(DocumentClient client, byte[] fileData)
{
Console.WriteLine();
Console.WriteLine("**** Create Documents ****");
Console.WriteLine();

dynamic document1Definition = new
{
name = "New Customer 1",
address = new
{
addressType = "Main Office",
addressLine1 = "123 Main Street",
location = new
{
city = "Brooklyn",
stateProvinceName = "New York"
},
postalCode = "11229",
countryRegionName = "United States"
},
fileDataBinary = fileData
};

Document document1 = CreateDocument2(client, document1Definition);
Console.WriteLine("Created document {0} from dynamic object", document1.Id);
Console.WriteLine();

return document1;
}

private async Task<Document> CreateDocument2(DocumentClient client, object documentObject)
{

var result = await client.CreateDocumentAsync(collection.SelfLink, documentObject);
var document = result.Resource;

Console.WriteLine("Created new document: {0}\r\n{1}", document.Id, document);
return document;
}

#endregion
}

Ethical Hacking: Terminology – Part 1

I’ve started a new course on ethical hacking to get a better understanding of the internet, software security, personal security etc.

I’ll post a series of posts where I will write down my notes on what I’ve learned.

I’ll start today with some basic terminology:

Term Description
White Hat Hacker People that do hacking to help others, legal and ethical
Black Hat Hacker Unethical and unlegal activities
Grey Hat Hacker Between White and Black hat
Footprinting Information gathering on your target, on your task: like figuring out network related information, or software related details, or getting information from real world things or people. General information gathering in regard to your chosen target
DoS (Just you) Denial Of Service – On person performs a certain amount of request, more than the server can handle, to make the server crash. Servers can handle only a certain amount of requests and so the requests that does not fit into the request pool limit will be dropped out. If the service attack comes from one location/machine this is should not be possible.
DDoS (multiple people) Domain Denial Of Service – When you multiple computers/machines doing the service attack it will be harder for the software to know who to kick out.

 

The attack is not hard to do but the preparation is hard. You need to have multiple machines and to do this usually you have to infect other computers to create a bot farm of machines.

RAT Remote Administration Tools – For DDoS attacks needed a software that can be distributed upon other computers. This gives you control of a computer and allows you to hide your identity. The operations are not visible to a normal user. You can even hide them so that they do not show in normal operating system diagnostic tools.
FUD ( Anti-virus can not detect) Fully Undetectable – Also needed for DDoS attacks. Not labeled as malicious by anti-virus programs
Fishing Applying a bait and someone acts on it. Example: You get an email from someone and you click on it. Either it uploads something malicious or you do something that compromises your data, security.

 

Usually these are done so that the links look authentic but once you click on them you are redirected to some other server, which is not the one you would expect.

 

An easy way to spot these kind of addresses is to look at the address. If it is not from an HTTPS address then you are probably dealing with a false address. HTTPS addresses are much harder to fake.

SQL Injections Passing SQL Queries to HTTP requests. Allowing SQL command to run on a server to get or alter data that is not others wise intended to see or use.
VPN Virtual Private Network – Routing and encrypting traffic data between you and the VPN server/provider. A way of anonymizing yourself.

 

There is no real easy way to identify you unless the VPN Provider gives up your identity.

Proxy A less reliable way of staying anonymous. You could route your traffic between many proxies but the more proxies you have the harder it is to add new proxies to your traffic. This is mostly because of internet speed limitations, not enough available bandwidth. It will slow down you actions.

 

You can use free proxies and you can use paid proxies but paid ones leave a trace of whom you are.

Tor Open Source – Another way to hide your identity. Faster than proxies but slower than VPNs. Routes traffic through different routes, routers, places to hide your trace.

 

There is a very high chance of staying hidden (99.99%), there are tools, ways to find but highly unlikely.

VPS Virtual Private Server – a “security layer”, example: a virtual machine inside an actual machine that serves as a database server for you web server. This is done so that the database is not accessible from the outside directly.

 

In this way you can be specific who and from where can access that virtual machine.

Key Loggers Tools that are used to extract information from a machine, these needs to be deployed to a machine where the tool gathers key strokes and send that information to a location for analysis.

 

Key Loggers can extract existing information as well, you can modify the settings of a key logger (what, where, how to act), you can take screenshots, to use a camera on a device, microphone etc.

Terminal An interface to control your operating system. GUI tools are not as nearly as powerful as terminal tools.

 

Most hacking tools are designed for the terminal. Once you know how to do it in the terminal, you’ll know how to do it in the GUI.

Firewall A firewall is configured through iptable commands.

 

Linux firewall is open source and it has a HUDE amount of options. On Windows, by default you have some of these options but you will need to buy some package or application to get more options.

Root Kit rootkit is a collection of computer software, typically malicious, designed to enable access to a computer or areas of its software that would not otherwise be allowed (for example, to an unauthorized user) and often masks its existence or the existence of other software.
Reverse-shells There are thousands of Reverse-shells. You have a program that infects another device that program opens up a reverse connection from that device back to you. Therefore, you can keep up controlling an external device.

 

Usually you need to break through a router first and reconfigure it to give you more access to a network and machines.

C# Parametrized Property

I am currently working on a project where I have to convert some VB.NET code to C#. One of the problems is some of the VB:s features that are not supported on C#, like parameterized properties.

I also wanted to be able to create C# code that is usable in VB in the same way it was earlier. Since parameterized properties are not supported in C# the first impression was to create getter and setter functions but this is clunky in VB and breaks old code which assumes accessing a variable/data in a certain manner.

After some wondering and not accepting defeat by C# I came up with a solution.

It requires the following steps:

  1. Create a new class that represents the property
  2. Use this keyword with [] operator to assign a get and set property (the class will function as the property itself)
  3. To use functionality from the parent class you force the property class with a reference to the parent class. After this, you are able to access functionalities from the parent class as long as the visibility is set to public.

Here is the code:


public class PropertyName
{
 private PartenClass refPartenClass = null;
 public PropertyName(refPartenClass parentClass)
 {
 if (parentClass == null)
 throw new Exception("parent class parameter can not be null");

 this.refPartenClass = parentClass;
 }

 public string this[string path, String DefaultValue = ""]
 {
 get
 {
 PartenClass item = null;
 item = this.refPartenClass.LocateItem(path, false);
 if (item == null)
 {
 return DefaultValue;
 }
 else if (string.IsNullOrEmpty(item.ItemValue))
 {
 return DefaultValue;
 }
 else
 {
 return item.ItemValue;
 }
 }
 set
 {
 PartenClass item = null;
 item = this.refPartenClass.LocateItem(path, true);
 item.ItemValue = value;
 }
 }
}

This is then how you use it:

  1. Create a declaration of the property class as a property itself
  2. Instantiate it during the ParentClass constructor and pass the parent as reference

public class ParentClass
{
public PropertyName Value { get; set; }

private void InitializeParametrizedProperties()
 {
 this.Value = new PropertyName(this);
 }
public ParentClass()
 {
 this.InitializeParametrizedProperties();
 }

}

Btw, nothing stops you from overloading the this[] definition to accept a different kind of parameters and return values. Just be careful with the same kind of parameter definitions. The compiler won’t know what to call.

Installing Apache Spark Cluster on Odroid C2 and Raspberry Pi

Contents

Summary. 1

Network configurations. 2

Odroid C2 Ubuntu 16.04 Mate. 2

Static IP. 2

Setting the Hostname. 2

Raspberry Pi 3

Static IP. 3

Setting the Hostname. 4

Common configurations. 5

Hosts file editing. 5

Disabling IP6. 5

Firewall 6

Creating an admin user to Spark. 6

Enable SSH communication between nodes. 6

Swap. 7

Installing other packages for Spark. 8

Java Installation. 8

Installing Scala. 9

Scientific Python installation. 9

Installing Apache Spark. 9

Spark Configurations. 10

Slaves. 10

log4j.properties. 11

spark-defaults.conf. 12

spark-env.sh. 13

Bash configurations. 14

Starting and Stoping Spark. 16

Submiting work to the cluster. 16

Summary

For this Spark cluster installation, I used the following packages and tools:

· Oracle Java SDK 8

· Scala 2.12.1

· Apache Spark 2.1.0

Even if you encounter URLs to other versions than those written above, they are not true. I was just lazy changing my own notes, which I have gathered from several sources. So change the files names and URLs to correspond to those you want and need.

Before I go on describing the process, I just want to mention that while it has been a bit difficult doing this installation, this was fun once I got it to work.

My biggest problems where that I come from the Microsoft ecosystem where most things are preaty much behind a UI, you don’t have to understand or know that much necessarily.

Is this good or bad? Well it depends, sometimes having a button that does things is nice but on the other hand, it takes away from actually knowing what you are doing. For example, managing users, privileges, file system etc is a totally different thing on Linux and you actually have to know what you are doing.

I found the experience with Linux very fun and enjoyable. What I struggled the most with was Spark and Hadoop (not the topic of this post). It was difficult to understand which configurations are needed. I had problems getting Hadoop to do anything and the errors whereas usually with any software obscure, or in other words a pain in the ass.

I really had to focus and want to make it to work. I felt like giving up at times.

Anyway learning using Linux was the most fun, so much fun that I ended up installing a Linux distro dual booting with Windows.

Network configurations

Odroid C2 Ubuntu 16.04 Mate

In my situation, I did the configurations through the UI but you can do it through the terminal.

Static IP

To configure a static IP goto: System > Preferences > Internet and Network > Network Proxy

There go to the IPv4 Settings and add your network desired information for the Odroid C2.

Setting the Hostname

GoTo: System > Administration > Network

Raspberry Pi

Static IP

On the top right corner press the mouse second button on the network indicator(the two arrows pointing up and down). Then select “Wireless & Wires Network Settings”. Then select the Interface and eht0 and configure the network configurations you desire.

Setting the Hostname

Goto: ”Start” icon > Preferences > Raspberry Pi Configuration > System tab > Hostname

You might need to restart your system.

Common configurations

Hosts file editing

This configuration needs must be done on every machine. You will need hostnames for your machines so that Spark and Hadoop can properly communicate between machines (nodes) in your cluster.

Type in terminal:

sudo nano /etc/hosts

Add your machine IPs and desired hostnames. My hosts file looks like this:

Notice that I have removed everything else and left the localhost definition. This is a strange this, with Spark I could not get the workers to communicate properly without the localhost definition but with Hadoop it was the other way around, not sure now why this is.

Disabling IP6

After looking several tutorials on installing Hadoop all mentioned that it is a good practice to disable the IP6 support. Apparently Hadoop does not support IP6 properly or at all. Since Apache Spark work ontop of Hadoop then I applied the same method on Spark also.

Open the /etc/sysctl.conf file for editing and add the following at the end of the file:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Firewall

In case you have problems between the nodes in a cluster disable the firewall or allow the nodes to communicate between the nodes with the needed ports.

Creating an admin user to Spark

In terminal create the user, add it to a group and give it admin privileges:

sudo addgroup spark
sudo adduser –ingroup spark spuser
sudo adduser spuser sudo

Login into the user and do everything related to Spark with this user:

su spuser

Enable SSH communication between nodes

This is to avoid using authentication when using Spark. If you do not do this you might run into problems and you also are constantly required to type in the account and password which you want to run Spark on.

Next create the SSH key and add it to the authorized_keys file.

$ cd ~
$ ssh-keygen -t rsa -P “”
$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Verify that the SSH tunnel is working:

ssh localhost

Copy the public keys to the slaves nodes in you cluster

$ ssh-copy-id spuser@raspberrypi01

$ ssh-copy-id spuser@raspberrypi02

Then test the connection to the slaves:

$ ssh raspberrypi01

Swap

This is something I had to do for Odroid C2. Even though the Odroid has double the RAM Raspberry Pi has, it also has Ubuntu installed on it which takes up nearly half the memory when booting. The Raspbian OS takes about a little over 150 MB, so about 15 % of the Raspberry Pi’s total RAM.

I ran into problems when I wanted to use the Odroid as the Master and also as a slave node for calculation. Because I wanted to use as much memory as possible on the actual Raspberry Pi nodes I allocated 768 MB, which is OK for the Pi’s but I could not allocate less than 512 MB for Odroid and allocating 512 MB caused the Odroid to swap and since there was no swap created the OS crashed or became unresponsive.

To combat this problem I create a swap for Odroid, the size of the Swap was double the RAM, 4 GB:

The Swap creation guide below is from:

http://www.tutorialspoint.com/articles/how-to-enable-or-add-swap-space-on-ubuntu-16-04

Checking for the Swap Information

Before we begin, we will first check for the swap space available on the server or system

We can use the below command to see that the system is having the swap partition or not

$ free -h

We can also run the below command but if the swap partitions do not exist, we cannot see any information.

$ sudo swapon –s

In the above command, we can see that the swap is not enabled or not configured for this server to configure the swap in this machine. We will first check for the free disk space available with the below command –

$ df –h

Creating a Swap File

As we know the disk space availability, we can go ahead and create a swap file on the filesystem. To create the swap file we can use ‘fallocate’ a package or utility which can create a preallocated size to instantly. As we have a little space on the server will create a swap file with 512 MB size to create a swap file below is the command.

$ sudo fallocate -l 512M /swapfile

And to check the swap file we will use the below command

$ ls -lh /swapfile
-rw-r–r– 1 root root 512M Sep 6 14:22 /swapfile

Enabling the Swap to use the Swap File

Before, we are going to enable the swap, we need to fix the file permission that other than root any others can read/write the file below is the command to change the file permission.

$ sudo chmod 600 /swapfile

Once, we change the permission we will check the file below and execute the below command to check the swap file permissions.

Once, we change the permission we will check the file below and execute the below command to check the swap file permissions.

$  ls -lh /swapfile
-rw——- 1 root root 512M Sep  6 14:22 /swapfile

We will now make this file as a swap space using this below command –

$ sudo mkswap /swapfile
Setting up swapspace version 1, size = 524284 KiB
no label, UUID=d02e2bbb-5fcc-4c7b-9f85-4ae75c9c55f9

Now we will enable the swap by using the below command

$ sudo swapon –s
Filename                                Type            Size    Used    Priority
/swapfile                               file            524284  0       -1

We can also check with free –h commands to see the swap partition

$ free –h

Making the Swap Partition/File to start Permanent

As in the above steps, we have created the swap partition and we are able to use that swap for temporary memory and once the machine is rebooted then the swap, setting will be lost to needed to use this swap file permanently we will make the swap file permanent.

We will edit the /etc/stab and add the information to mount the swap file even if we reboot the machine

$ sudo  vi /etc/fstab

Add the below line to the existing file.

/swapfile            none     swap     sw         0            0

For better performance for using the swap memory, we can do some tweaks.

 

Installing other packages for Spark

For my installation, I needed a few other packages to make things work:

  • Oracle Java version 8
  • Scala

Java Installation

 

For a more automatic installation, type the following commands:

$ sudo apt-get install oracle-java8-jdk

 

$ sudo apt-get update && sudo apt-get install oracle-java8-jdk

 

$ sudo update-alternatives –config java

 

For a more manual one got to the Oracle website and download the Linux ARM 64 Hard Float ABI package: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

 

Notice that my Raspberry Pi with Raspbian from December had the 32 bit Java 7 version installed. I used the 64 Bit Java 8 for Odroid C2 and 32 bit for Java 8 for Raspberry Pi.

When you have the package, do the following:

Enter the command to extract jdk-8-linux-arm-vfp-hflt.tar.gz to /opt directory.

$ sudo tar zxvf jdk-8-linux-arm-vfp-hflt.tar.gz -C /opt

 

Set default java and javac to the new installed jdk8.

$ sudo update-alternatives –install /usr/bin/javac javac /opt/jdk1.8.0/bin/javac 1

$ sudo update-alternatives –install /usr/bin/java java /opt/jdk1.8.0/bin/java 1

 

$ sudo update-alternatives –config javac

$ sudo update-alternatives –config java

 

After all, verify with the commands with -version option.

$ java -version

$ javac –version

I also added as the owner the spark user:

$ sudo chown -R spuser:spark jdk1.8.0/

Notice: To make life easier you should add environmental variables to your bashrc file. More on this in the Spark installation portion.

Installing Scala

Navigate to the following URL: http://www.scala-lang.org/download/

I downloaded the tar package, extracted it to a location and added proper privileges to the spark user:

$ sudo tar zxvf scala-2.12.1.tgz -C /opt

$ sudo chown -R spuser:spark scala-2.12.1/

 

Notice: To make life easier you should add environmental variables to your bashrc file. More on this in the Spark installation portion.

Scientific Python installation

 

This is not a requirement but I used these scripts to install Jupyter and Python 3.5 on my cluster nodes:

https://github.com/kleinee/jns

Installing Apache Spark

 

Start by downloading you desired package from Apache Spark URL: http://spark.apache.org/downloads.html

Or user wget: wget http://www.eu.apache.org/dist/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.tar.gz

Then extract Spark package

$ sudo tar -xvzf spark.2.1.0.tar.gz -C /opt/

Add the Spark User as owner

$ cd /opt
$ sudo chown -R hduser:hadoop spark.2.1.0/

If you are having problem with access to the spark folder or if you are installing Hadoop and have configured namenodes file system locations etc use the following command to add more privileges to the desired locations:

$sudo chmod 750 /opt/hadoop/hadoop_data/hdfs

Spark Configurations

Go to the Spark conf folder, depending where you installed spark:

$ cd /media/microSD/spark/conf

There are four files I had to configure for the cluster to work:

  • Slaves
  • properties
  • spark-defaults.conf
  • spark-env.sh

For more info on these files check out Spark documentation:

http://spark.apache.org/docs/latest/configuration.html

http://spark.apache.org/docs/latest/spark-standalone.html

The first step is to rename some of the files mentioned above. Some of the files have the “.template” file extension on them, remove it:

$ mv log4j.properties.template log4j.properties

If the slaves file does not exist then using nano you will create is automatically:

$ sudo nano slaves

The above assumes you are in the conf folder.

When you are done with the configurations on your master just copy them with scp to all slave nodes. Make sure you change the node specific values in these files(more on this below).

Slaves

Here you add the slave machines hostname, or the machines you want to do the work for you, the calculations:

odroid64

raspberrypi01

raspberrypi02

log4j.properties

With this file all we want is to minimize the amount of log on screen, it will be much easier to spot what is going on when you are not flooded with basic operational messages.

What you need to do it to change this:

log4j.rootCategory=INFO, console

to this:

log4j.rootCategory=WARN, console

spark-defaults.conf

Here we just want to specify the master URL so that we do not always have to specify it when submitting work to the Spark cluster:

spark.master                     spark://odroid64:7077

spark-env.sh

Here you specify the parameters that your cluster will use to communicate with all the nodes within it:

SPARK_MASTER_IP=odroid64

SPARK_WORKER_MEMORY=512m

SPARK_MASTER_HOST=odroid64

SPARK_LOCAL_IP=odroid64

SPARK_WORKER_CORES=2

SPARK_DAEMON_MEMORY=512m

SPARK_EXECUTOR_INSTANCES=1

SPARK_EXECUTOR_CORES=2

SPARK_EXECUTOR_MEMORY=512m

SPARK_DRIVER_MEMORY=512m

There are many variables, which you can tweak, I used and had to use the above ones to get things to work.

The SPARK_MASTER_IP and SPARK_MASTER_HOST HAVE to be the same on all nodes. The rest have to correspond to the actual physical node where the configuration file resides.

 

 

 

Bash configurations

 

For my cluster I used the following configurations(disregard the Hadoop ones, not necessary for Spark):

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121

export HADOOP_HOME=/media/microSD/hadoop-2.7.3

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

#export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin

export HADOOP_OPTS=”$HADOOP_OPTS -Djava.library.path=/media/microSD/hadoop-2.7.3/lib/native/”

#export PATH=$PATH:$JAVA_HOME/bin:/media/microSD/spark/sbin:/media/microSD/spark/sbi

 

export SBT_HOME=/media/microSD/sbt

export SPARK_HOME=/media/microSD/spark

export SCALA_HOME=/media/microSD/scala-2.12.1

export PATH=$PATH:$JAVA_HOME/bin

export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SCALA_HOME/bin

export SPARK_MASTER_URL=http://192.168.10.65:7077

 

The important ones for Spark are:

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121

export SPARK_HOME=/media/microSD/spark

export SCALA_HOME=/media/microSD/scala-2.12.1

export PATH=$PATH:$JAVA_HOME/bin

export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SCALA_HOME/bin

export SPARK_MASTER_URL=http://192.168.10.65:7077

You want to add Spark, Scala and Java to the PATH environmental variable to be able to access commands easily from Terminal.

You can then copy these configurations to the slave nodes using scp command:

http://www.hypexr.org/linux_scp_help.php

$ scp ~/.bachrc spuser@raspberrypi01:~

 

To force a refresh of the environmental variables

$ source ~/.bachrc

Starting and Stoping Spark

This is simple.

To start Spark type:

$ start-all.sh

To stop Spark:

$ stop-all.sh

To access Spark web UI:

http://odroid64:8080/

 

Submiting work to the cluster

 

Use the Spark specific command:

spark-submit

More info:

http://spark.apache.org/docs/latest/submitting-applications.html

Problems running scripts and batch files with Task Scheduler

This is probably one of the most annoying things I’ve encountered. I’ve been trying to run a scheduled task using a cmd file and every time I tried to run it simply didn’t work.

Then thanks to search results someone pointed out a solution and it makes no sense, thank you Microsoft. I guess there might some “logical” explanation but it is a mystery to me. Maybe something to do with UAC?

Anyway, to the solution should you happen to run into the same problem:

Do this when when specifying an action: Specify only the script/batch file name and in the Start IN(Optional) specify the full path where the file is located.

taskschedularbatch

 

How to backup your private repositories

GitHub Repository Backup Steps

Contents

Install git. 1

Use SSH key for communication with GitHub. 1

https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/. 1

Generate SSH Key. 1

Generating a new SSH key. 1

Adding your SSH key to the ssh-agent. 2

Add the SSH key to your GitHub account or the organization. 2

Setup a default user name and email 4

Set your username for every repository on your computer: 4

Setting your email address for every repository on your computer. 4

Create an access token if you want to clone repositories without a username and password. 5

Ways to access the GitHub API with an access token. 6

Cloning repositories. 7

Sample git command: 7

Sample node.js tool 7

Usage. 7

Options: 7

Examples: 7

Exclude and Only options. 7

Install git

https://git-scm.com/download/win

Use SSH key for communication with GitHub

https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/

Generate SSH Key

Generating a new SSH key

1. Open Git Bash.

2. Paste the text below, substituting in your GitHub email address.

3. ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

This creates a new ssh key, using the provided email as a label.

Generating public/private rsa key pair.

4. When you’re prompted to "Enter a file in which to save the key," press Enter. This accepts the default file location.

5. Enter a file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]

6. At the prompt, type a secure passphrase. For more information, see "Working with SSH key passphrases".

7. Enter passphrase (empty for no passphrase): [Type a passphrase]
Enter same passphrase again: [Type passphrase again]

Adding your SSH key to the ssh-agent

Before adding a new SSH key to the ssh-agent to manage your keys, you should have checked for existing SSH keys and generated a new SSH key.

If you have GitHub for Windows installed, you can use it to clone repositories and not deal with SSH keys. It also comes with the Git Bash tool, which is the preferred way of running git commands on Windows.

1. Ensure the ssh-agent is running:

o If you are using the Git Shell that’s installed with GitHub Desktop, the ssh-agent should be running.

o If you are using another terminal prompt, such as Git for Windows, you can use the "Auto-launching the ssh-agent" instructions in "Working with SSH key passphrases", or start it manually:

o # start the ssh-agent in the background
o eval $(ssh-agent -s)
o Agent pid 59566

2. Add your SSH key to the ssh-agent. If you used an existing SSH key rather than generating a new SSH key, you’ll need to replace id_rsa in the command with the name of your existing private key file.

3. $ ssh-add ~/.ssh/id_rsa

https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/

Add the SSH key to your GitHub account or the organization

For a single user

https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/

For organization

Copy the SSH key to your clipboard(use Git bash)

If your SSH key file has a different name than the example code, modify the filename to match your current setup. When copying your key, don’t add any newlines or whitespace.

$ clip < ~/.ssh/id_rsa.pub
# Copies the contents of the id_rsa.pub file to your clipboard

Tip: If clip isn’t working, you can locate the hidden .ssh folder, open the file in your favorite text editor, and copy it to your clipboard.

Notice that the .ssh folder is usually located at: C:Users{your account name}.ssh

Go to you Organization management windows and select from the left navigation: SSH and GPG keys

From this new view then press New SSH Key button

The type a title and copy the key from the clipboard

Setup a default user name and email

You can do this for a specific repository or globally. For this situation globally if preferable.

https://help.github.com/articles/setting-your-email-in-git/

https://help.github.com/articles/setting-your-username-in-git/

Set your username for every repository on your computer:

1. Navigate to your repository from a command-line prompt.

2. Set your username with the following command.

3. git config –global user.name "Billy Everyteen"

4. Confirm that you have set your username correctly with the following command.

5. git config –global user.name

Billy Everyteen

Setting your email address for every repository on your computer

1. Open Git Bash.

2. Set your email address with the following command:

3. git config --global user.email "your_email@example.com"

4. Confirm that you have set your email address correctly with the following command.

5. git config --global user.email
your_email@example.com

Create an access token if you want to clone repositories without a username and password

This is also useful if you want to create your own application to use the GitHub API for something you want to be done.

For a single user

https://help.github.com/articles/creating-an-access-token-for-command-line-use/

For the organization

Navigate to the Organization management and from the left navigation go to: Developer settings -> Personal access tokens

From this new view press the button: Generate new token

From the new view add a title to the access token then select the needed privileges to your organization. For repository backup you might need the following in the picture below.

IMPORTANT: After you have created the token remember to copy and store the key because this is the only time you will see it.

Ways to access the GitHub API with an access token

https://developer.github.com/v3/#authentication

https://developer.github.com/v3/repos/

https://developer.github.com/v3/oauth_authorizations/#create-a-new-authorization

Cloning repositories

https://help.github.com/articles/duplicating-a-repository/

Sample git command:

git clone --bare https://github.com/exampleuser/old-repository.git

Sample node.js tool

https://github.com/tegon/clone-org-repos

Usage

 cloneorg [OPTIONS] [ORG]

Options:

 -p, --perpage NUMBER number of repos per page (Default is 100)
 -t, --type STRING can be one of: all, public, private, forks, sources,
 member (Default is all)
 -e, --exclude STRING Exclude passed repos, comma separated
 -o, --only STRING Only clone passed repos, comma separated
 -r, --regexp BOOLEAN If true, exclude or only option will be evaluated as a
 regexp
 -u, --username STRING Username for basic authentication. Required to
 access github api
 --token STRING Token authentication. Required to access github api
 -a, --gitaccess Protocol to use in `git clone` command. Can be `ssh` (default), `https` or `git`
 -s, --gitsettings Additional parameters to pass to git clone command. Defaults to empty.
 --debug Show debug information
 -v, --version Display the current version
 -h, --help Display help and usage details

Examples:

clones all github/twitter repositories, with HTTP basic authentication. A password will be required

cloneorg twitter -u GITHUB_USERNAME
cloneorg twitter --username=GITHUB_USERNAME

clones all github/twitter repositories, with an access token provided by Github

cloneorg twitter --token GITHUB_TOKEN

If an environment variable GITHUB_TOKEN is set, it will be used.

export GITHUB_TOKEN='YOUR_GITHUB_API_TOKEN'

Add a -p or –perpage option to paginate response

cloneorg mozilla --token=GITHUB_TOKEN -p 10

Exclude and Only options

If you only need some repositories, you can pass -o or –only with their names

cloneorg angular --token=GITHUB_TOKEN -o angular

This can be an array to

cloneorg angular --token=GITHUB_TOKEN -o angular,material,bower-angular-i18n

This can also be an regular expression, with -r or –regexp option set to true.

cloneorg marionettejs --token=GITHUB_TOKEN -o ^backbone -r true

The same rules apply to exclude options

cloneorg jquery --token=GITHUB_TOKEN -e css-framework # simple
cloneorg emberjs --token=GITHUB_TOKEN -e website,examples # array
cloneorg gruntjs --token=GITHUB_TOKEN -e $-docs -r true # regexp
cloneorg gruntjs --token=GITHUB_TOKEN -e $-docs -r true --gitaccess=git # Clone using git protocol
# Clone using git protocol and pass --recurse to `git clone` to clone submodules also
cloneorg gruntjs --token=GITHUB_TOKEN -e $-docs -r true --gitaccess=git --gitsettings="--recurse"