In this blog post I will write some ideas on coupling and how to approach coupling. Some of these ideas are based on a course I’m going trough by Udi Dahan on system design.
First to start at; in object oriented programming there is almost always going to be coupling. If there where no coupling in the code that piece of code most likely will not be very useful.
It is the amount of coupling that is of concern. You usually want to aim for loosely coupled code and to this as developer we aim at but usually gradually over time as a project grows and the code grows, what was once a nice piece of code and easy to understand has become highly coupled and hard to read/maintain.
Also things in real life aren’t that simple; it isn’t just about aiming for loosely coupled code but more of understanding your system and making the best decisions based on that information. You may end up adding some tightly coupled elements to avoid other bigger problems.
Afferent Couplings (Ca): The number of classes in other packages that depend upon classes within the package is an indicator of the package’s responsibility. Afferent = incoming.
Efferent Couplings (Ce): The number of classes in other packages that the classes in the package depend upon is an indicator of the package’s dependence on externalities. Efferent = outgoing.
So coupling is a measure of dependencies.
Afferent Coupling :
Who depends on you.
Measure of how many other packages use a specific package.
Incoming dependencies.
Example: Self-contained, used in a number of places
Library
Framework
Logging
Efferent Coupling :
Who do you depend on.
Measure of how many different packages are used by a specific package.
Outgoing dependencies.
Example: Usually used in a presentation layer; user interactions
Hidden coupling
Shared resources between applications, like a database. Actions performed by one application may affect the other application negatively.
Coupling aspects for systems
Platform
Coupling between two or more pieces of code/application; interoperability. Using a mechanism of communication that is limited to certain applications.
Temporal
Service A calls Service B, Service A waits for B to finish.
If the communication is synchronous; then it is a highly coupled.
If the communication is asynchronous; then it is more loosely coupled.
Spatial
How binded is your solution to other machines or services.
If you solution stops functioning if one machines or service goes down; then it is highly coupled.
If your solutions continues to work if a machines or service goes down and restarts; then it is loosely coupled.
Solutions
General
In your deployment pipelines have check on how many incoming or outgoing dependencies are for a piece of code. If the dependency is higher than a certain number; fail the build.
Use code reviews to talk about the failed build to see what to do about the dependencies.
Use tools that analyze your project and code to get insights into your project.
Minimize afferent and efferent coupling. Zero coupling isn’t really possible.
Specific: Platform
Text-based representation on the wire (XML/JSON), with or without schema
Notice: Schemas are more about developer productivity than interoperability, it’s about saving development time without writing things yourself
Use standards based transfer protocol like HTTP, SMTP, UDP
Consider which protocol is best suited for you based on different functionality and aspects. Different protocols have different level of functionality and reliability. Use the right tool for the job
Consider the benefits of existing standards to solve your problems: SOAP / WSDL / REST
Generally be aware of existing standards and technologies. Know what they do and what suits your needs and chose the right tool. If a technology support things that you need and another technology does not consider using the one that is time tested and ready to use and not creating your own version of the same functionalities
Specific: Temporal
Avoid multi-threaded solutions to the problem, it brings about other problems such as:
Deadlock – Occurs when two competing processes are waiting for each other to finish, allowing neither to finish.
Starvation – Occurs when a process never gains accesses to resources, never allowing the program to finish.
Race Conditions – Occurs when processes that must occur in a particular order occur out of order due to multiple threading. More specifically, this is discussing a data race, please avoid arguments such as this one.
Livelock – Occurs when two threads are dependent on each other signals and are both threads respond to each others signals. If this is the case, the threads can cause a loop similar to something between a deadlock and starvation.
Increased Complexity − Multithreaded processes are quite complicated. Coding for these can only be handled by expert programmers.
Complications due to Concurrency − It is difficult to handle concurrency in multithreaded processes. This may lead to complications and future problems.
Difficult to Identify Errors− Identification and correction of errors is much more difficult in multithreaded processes as compared to single threaded processes.
Testing Complications− Testing is a complicated process i multithreaded programs as compared to single threaded programs. This is because defects can be timing related and not easy to identify.
Unpredictable results− Multithreaded programs can sometimes lead to unpredictable results as they are essentially multiple parts of a program that are running at the same time.
Complications for Porting Existing Code − A lot of testing is required for porting existing code in multithreading. Static variables need to be removed and any code or function calls that are not thread safe need to be replaced.
Consider a publisher/subscriber pattern
Subscriber must be able to make decisions based on somewhat stale data. In a distributed system this is inevitable.
Requires a strong division of responsibility between publishers and subscribers.
Only one logical publisher should be able to publish a given kind of event, being the source of truth; a single source of truth.
Where (and why) not to do pub/sub: when business requirements demand consistency
Note: If you can’t do pub/sub, you can’t do request/response which means that you need to consider combining the two services as one
State something that happened (past tense). Subscribers shouldn’t be able to invalidate this; Good: “OrderAccepted”
If you have to talk about data, state its validity; ProductPriceUpdated { Price: $5, ValidTo: 1/1/15 }
Specific: Spatial
Spatial coupling is about firstly considering the logical elements then the physical elements like consumers, load balancing etc => Routing
Strongly-typed messages simplify routing vs document-centric messaging
Avoid content based routing because it creates a large logical coupling (having large amount of logic and properties that define many options how to process a message, like a real life document)
Prefer smaller more specific messages; split larger ones into smaller ones
They should clearly articulate what needs to be done
Application level code should not need to know where cooperating services are on the network
Delegate communications to lower layer – the service agent pattern myAgent.Send(message);
These are my notes on a course on distributed systems by Udi Dahan.
I’m listening to it and decided that the best way to learn is to take notes while I listen, and maybe share what I learn once in a while.
Lets start by defining what a system is:
Systems are not applications, they are made up of multiple executable elements on multiple machines, with multiple sources of information. A system deals with connectivity.
An application is a single executable and runs on a single machine and don’t know usually about connectivity.
Each executable within a system is not an application, could be scripts but must deal with connectivity.
Fallacies by developers and architects
The network is reliable
Latency isn’t a problem
Bandwidth isn’t a problem
The network is secure
The topology won’t change
The administrator will know what to do
Transport cost isn’t a problem
The network is homogeneous
The system is atomic/monolithic
The system is finished
Business logic can and should be centralized
The network is reliable
Hardware can fail
Software crash and have bugs
Security may be a problem
Solutions
Networks are unreliable, so using different design and architecture solutions you can get around the problem or minimize it.
Retry and acknowledge
Synchronous situations
Store & Forward
Asynchronous situations
Transactions
Usually during integrations with multiple data sources
Notice: Once moving to a MQ related architecture you lose request/response synchronous way of functionality. Notice some MQ solutions do provide a two way communication, back and forth.
Testing:
Test with unreliable network configurations or by simulating them (your browser may include this in the developer tools)
Caching: This is not optimal but may offer solutions for situations where part of the entire system architecture is failing and with caching you can still provide data while the problem is being fixed
HTTP Caching headers
Redis
Application In-Memory cache
Avoid distribution of objects in your system, otherwise the above solutions have to be taken into consideration
Latency isn’t a problem
Latency measures the delay between an action and a response. Over the Internet, individual delays at the application or packet level can accumulate, as each element in the chain of communication is linked to others.
Latency can occur from several sources:
Your application code
The libraries and development methods
Like ORM with lazy-loading
We code as if latency is zero, we expect that data is immediately available
Network
Disk
Operating system
Modern day solutions rely more and more on network and in a distributed systems. Think of cloud based solutions, microservices and serverless computing.
Solutions
Use DTOs to pack your data and send in less frequent times over a netwrok
Minimizing the affects:
Asynchronous programming
Parallel programming
WebSockets
Network I/O
Use faster networking
Eliminate network hops. In clustered queuing and storage systems, data can be horizontally scaled across many host machines, which can help you avoid extra network round-trip connections.
Keep client and server processes close together, ideally within the same datacenter and on the same physical network switch.
If your application is running in the cloud, keep all processing in one availability zone.
Disk I/O
Avoid writing to disk. Instead use write-through caches or in-memory databases or grids: CDN, Redis etc
If you do need to write to disk, combine writes where possible The goal is to optimize algorithms to minimize the impact of disk I/O.
Use fast storage systems, such as SSDs
Operating environment / Operating System
Run your application on dedicated hardware so other applications can’t inadvertently consume system resources and impact your application’s performance.
Be wary of virtualization—even on your own dedicated hardware, hypervisors impose a layer of code between your application and the operating system.
Understand the nuances of the programming language environment used by your application, like garbage collection.
Your code
Inefficient algorithms are the most obvious sources of latency in code.
Multithreaded locks stall processing and thus introduce latency. Consider design patterns to avoid the issue.
Blocking operations cause long wait times, so use an asynchronous (nonblocking) programming model to better utilize hardware resources, such as network and disk I/O.
Generally
Don’t cross the network if you don’t have to and if you have to take as much data with you as possible
Inter-object communication shouldn’t cross the network; consider caching
Take into account geography; keep things as close as possible or use technologies like CDN
Bandwidth isn’t a problem
More bandwidth is available year by year but the amount of data grows faster.
Networks can congest and slow things down if lots of data is being transferred.
ORM eagerly fetching data
General Common Causes of Bandwidth Issues
Bandwidth Issues can be usually traced to certain specific activities. These activities are: large amounts of data, and extended duration:
Watching videos from Internet (YouTube, Netflix)
Video calls
Cloud storage
Large file transfers
Constant stream of data
Downloading files from internet
Solutions
Design your architecture within your system to have the options to split things into their own resources
Divide data source into different networks with their own network resources based on the needs of the data.
Avoid if possible eagerly fetching and lazy loading, setup restriction and limitations on how to use data
Have more than one domain model to resolve the forces of bandwidth and latency
Having different sets of object for different set of operations; like read, write, delete
The network is secure
Networks are not secure; many things can contribute to this:
Human beings
Software bugs
Firewall configurations
Network configurations
Access Rights
A virus or Trojan
Physical devices like USB memory sticks, DVDs, CDs
Today a systems topology is very complex than what is was in the past. This is due to several factors like cloud computing, going serverless, complex architecture solutions etc.
Servers can go down, they are moved to a different location like a different subnet. With cloud computing the change of network topology has become a constant thing; consider kuberbetes and docker for example.
Solutions
Testing to see if everything is working; early in the development and in production also
Don’t hard-code IP addresses and check to see that this hasn’t happened accidentally
Consider using resilient protocols (multicast)
Discovery mechanisms can work, but hard to get right
For async full-duplex communications be aware of locking threads if clients disconnect and the server is trying to reach them
Simulating failures and problems
Be aware of modern day devops and practices where topology related details are configurable and automated; this can cause problems if the changes are not taken into consideration
Run performance tests with resources going down or being slow going up
Run performance tests early and often; before going to production
In very rare cases, if you failed the above then you may as a last resort run all of your system part on a monolithic one gigantic server to avoid problems discussed. Notice: Not recommended, do these things above early on.
The administrator will know what to do
No single person can know everything
People change jobs, positions or retire
Documentation is hard to keep up to date
Multiple admins may cause actual admin problems managing things coherently; people making changes and updates that may cause problems that people are not aware of
Usually large amount of configurations to keep track of and with automation people have a false sense of security
Configuration problems can lead to time consuming debugging and problem solving
Upgrades = Downtime; delaying upgrades causes upgrades to pile up and cause system breaks when a major big upgrade occurs
Solutions
Automate deployments and upgrades; making things easier and faster to upgrade and reducing upgrade errors
Be aware of testing multiple versions running in parallel in you system
Serialization before crossing the network (and de-serialization on the other side) takes time.
In the cloud, it can be a big cost factor.
We don’t usually measure the impact of serialization and de-serialization
Encryption and decryption add a compute cost factor
Hardware network infrastructure has upfront and ongoing costs
Solutions
Test the impact of serialization and de-serialization
Test to see how expensive your system is; see how much money your system will eat up undress load; especially in the cloud
Avoid “chatting” over the network
You need to make trade-offs between infrastructure costs and development costs – upfront vs. ongoing
The network is homogeneous
Today there are many more programming languages(Java, JS, C#, Ruby), database types (SQL, NoSQL, Graph), API types (REST, WebAPI)
All these add to complexity and interoperability
Being “Open” software doesn’t necessarily mean that it integrates well
Semantic interoperability will be hard; example star time Linux time vs SQL Server time
Solutions
Hard to come up with specific solutions due to the complexity of the issue. You could start my trying to minimize amount of technologies within your system and keep thing heterogeneous if possible.
The system is atomic/monolithic
Change in one part of the system affects other parts; maintenance is hard. This concerns more the logic part, the code; not so much the physical part of the system.
Tests in any form may give you false positives that everything is working as it should
Integration through the DB creates coupling
Code logic that is vastly different (different domains) become tightly coupled through shared database structure and data.
Changes to the database at this point adds complexity to the point where the DB schema is not allowed to be changed; then you start going around things like adding JSON or XML into a database column
Based on the above, over time things are going to get even more coupled and harder to maintain.
If the system wasn’t designed to scale out to multiple machines, doing so may actually hurt performance
Solutions
Internal loose coupling
Modularization
Design for scale out in advance, or you just may end up being stuck with scale up.
The system is finished
Maintenance costs over the lifetime of a system are greater than its development costs
A project may start small like a proof of concept or an mvp but after the release the project will end up with constant new needs, requirements and additions that start to balloon. This becomes a problem if this is not anticipated and planned for.
Release dates may be unrealistic or change and be unknown to developers due to business requirements; the new release dates do not realistically take into consideration past, present and future requirements
Under these circumstances, after the release date the actual work begins because of all the implementations that where done in a hurry
Refactoring
Bug
Tech debt
Re-architecture
Scalability problems
New features
It is more expensive to fix bugs after a “release” in maintenance
Adding more features to an existing code base requires more time and skills than adding them at the start of a project
Adding new version to an existing released version is more harder than at the start of a project
Building a project and maintaining one requires the same amount of skills level and more when maintaining a project
Software development is different than for example constructing a building then after it is finished to maintain it which would require a less set of skills than the actual construction
Senior developers put themselves at the start of a project because:
It’s more fun to build something from scratch
They can do it
And it looks good for business
On the previous note; it is hard to design a code base or project so that it is maintainable in the future by others in ease and low cost
This is hard to do especially if the people who end up maintaining the code are not as skilled or knowledgeable as the initial developers
This usually also end up later to a situation where a rewrite is suggested but the above problem may still be present in many situations
The system is never “finished” as long as there is work done on it, in any form
Estimates are hard to give and people don’t necessarily know how to do it
Solutions
Projects are a poor model for software development. Long-lived products are better
Think of a project as a product, the mindset will be different with a product. Software should be a product with a long term viability, it should live for a long time.
Design for long term viability which might require at start more effort
You need to have a team that can work together in the long term so that the knowledge and skills is retained by the team not the individuals
There’s no such thing as a “maintenance programmer”
Beware the rewrite that will solve everything
Give estimates taking these things into consideration:
Have a well-formed team that has worked in the past and has the suitable knowledge and skills
The team is not working on anything else
An estimate has a time range and a percentage of confidence that the work will take place within this time range; “I’m C% confident work will take between T1 & T2”. This is to avoid misunderstanding with business where you say that it might be possible and they interpret that it is possible
Confidence percentage will go higher the smaller the work load is divided into
Consider a “negotiation” way of giving estimations by testing and creating a prof of concepts and then giving estimates
If there still is problems or difference of opinion on estimates; then ask the business to prioritize, now they have to think what they really want and truly understanding the scope and risks
Avoid taking literal example from organizations or methods that do not correspond to your particular situation, like Google, Facebook, Apple and methods they may use.
This is because you need to understand the surrounding details about the project and organization you are working with and apply things that work for your situation.
Things that work for a large project may not work at all for a medium sized or small sized project.
Learn from big things and see what parts can be applied and used wisely
Business logic can and should be centralized
As developers we strive to centralize logic to avoid error, bugs and forgetting to fix things everywhere when the rules/requirements change
There are a lot of dependencies between entities, validations, services that end up creating coupling between code
re-usable code allows high amount of use, then high amount of dependencies = tight coupling
Solutions
One possible solution for re-usable code creating tight coupling is to create the re-usable code into specific domains or layers within the code. Example:
Have specific entities, services and validations for a data access layer and have the DAL return the entities that are used in the DAL layers
In the next layers that has business logic lets say; again have specific entities, services and validations for the business logic layer; then map the DAL entities to the BLL entities based on your needed logic.
Consider incorporating functional programming patterns or practices within your code to avoid re-usable code problems
Code traceability: Add tags to implemented code specific to a business logic or feature which you can at a later date trace bag to see where these changes have taken place and what they where
I am currently working on a project where I have to convert some VB.NET code to C#. One of the problems is some of the VB:s features that are not supported on C#, like parameterized properties.
I also wanted to be able to create C# code that is usable in VB in the same way it was earlier. Since parameterized properties are not supported in C# the first impression was to create getter and setter functions but this is clunky in VB and breaks old code which assumes accessing a variable/data in a certain manner.
After some wondering and not accepting defeat by C# I came up with a solution.
It requires the following steps:
Create a new class that represents the property
Use this keyword with [] operator to assign a get and set property (the class will function as the property itself)
To use functionality from the parent class you force the property class with a reference to the parent class. After this, you are able to access functionalities from the parent class as long as the visibility is set to public.
Here is the code:
public class PropertyName
{
private PartenClass refPartenClass = null;
public PropertyName(refPartenClass parentClass)
{
if (parentClass == null)
throw new Exception("parent class parameter can not be null");
this.refPartenClass = parentClass;
}
public string this[string path, String DefaultValue = ""]
{
get
{
PartenClass item = null;
item = this.refPartenClass.LocateItem(path, false);
if (item == null)
{
return DefaultValue;
}
else if (string.IsNullOrEmpty(item.ItemValue))
{
return DefaultValue;
}
else
{
return item.ItemValue;
}
}
set
{
PartenClass item = null;
item = this.refPartenClass.LocateItem(path, true);
item.ItemValue = value;
}
}
}
This is then how you use it:
Create a declaration of the property class as a property itself
Instantiate it during the ParentClass constructor and pass the parent as reference
public class ParentClass
{
public PropertyName Value { get; set; }
private void InitializeParametrizedProperties()
{
this.Value = new PropertyName(this);
}
public ParentClass()
{
this.InitializeParametrizedProperties();
}
}
Btw, nothing stops you from overloading the this[] definition to accept a different kind of parameters and return values. Just be careful with the same kind of parameter definitions. The compiler won’t know what to call.
This is a PowerShell script that you can use to change a content type of a library or list to another one. This script can identify between Document Sets and library or list items.
param (
[string]$WebsiteUrl = "http://portal.spdev.com/",
[string]$OldCTName = "DSTestCT",
[string]$NewCTName = "DSTestCT"
)
if ( (Get-PSSnapin -Name MicroSoft.SharePoint.PowerShell -ErrorAction SilentlyContinue) -eq $null )
{
Add-PsSnapin MicroSoft.SharePoint.PowerShell
}
function Reset-ListContentType ($WebUrl, $ListName, $OldCTName, $NewCTName)
{
$web = $null
try
{
$web = Get-SPWeb $WebUrl
$list = $web.Lists.TryGetList($ListName)
$oldCT = $list.ContentTypes[$OldCTName]
$isChildOfCT = $list.ContentTypes.BestMatch($rootNewCT.ID).IsChildOf($rootNewCT.ID);
if($oldCT -ne $null -and $isChildOfCT -eq $false)
{
$hasOldCT = $true
$isFoldersCTReseted = Reset-SPFolderContentType –web $web -list $list –OldCTName $OldCTName –NewCTName $NewCTName
Reset-SPFileContentType –web $web -list $list –OldCTName $OldCTName –NewCTName $NewCTName
Remove-ListContentType –web $web -list $list –OldCTName $OldCTName –NewCTName $NewCTName
if($hasOldCT -eq $true)
{
Add-ListContentType –web $web -list $list –OldCTName $OldCTName –NewCTName $NewCTName
if($isFoldersCTReseted -eq $true)
{
Set-SPFolderContentType –web $web -list $list –OldCTName $OldCTName –NewCTName $NewCTName
}
}
}
}catch
{
}
finally
{
if($web)
{
$web.Dispose()
}
}
}
function Remove-ListContentType ($web, $list, $OldCTName, $NewCTName)
{
$oldCT = $list.ContentTypes[$OldCTName]
$isChildOfCT = $list.ContentTypes.BestMatch($oldCT.ID).IsChildOf($oldCT.ID);
if($isChildOfCT -eq $true)
{
$list.ContentTypes.Delete($oldCT.ID)
}
$web.Dispose()
return $isChildOfCT
}
function Add-ListContentType ($web, $list, $OldCTName, $NewCTName)
{
$list.ContentTypes.Add($rootNewCT)
$web.Dispose()
}
function Reset-SPFolderContentType ($web, $list, $OldCTName, $NewCTName)
{
#Get web, list and content type objects
$isFoldersCTReseted = $false
$isChildOfCT = $list.ContentTypes.BestMatch($rootNewCT.ID).IsChildOf($rootNewCT.ID);
$oldCT = $list.ContentTypes[$OldCTName]
$folderCT = $list.ContentTypes["Folder"]
$newCT = $rootNewCT
$newCTID = $newCT.ID
#Check if the values specified for the content types actually exist on the list
if (($oldCT -ne $null) -and ($newCT -ne $null))
{
$list.Folders | ForEach-Object {
if ($_.ContentType.ID.IsChildOf($rootNewCT.ID) -eq $false -and $_.ContentType.ID.IsChildOf($oldCT.ID) -eq $true -and $_.Folder.ProgID -eq "Sharepoint.DocumentSet")
{
Write-Host "Found a document set: " $_.Name "Processing document set"
$item = $list.GetItemById($_.ID);
$item["ContentTypeId"] = $folderCT.Id
$item.Update()
$isFoldersCTReseted = $true
}
}
}
$web.Dispose()
return $isFoldersCTReseted
}
function Set-SPFolderContentType ($web, $list, $OldCTName, $NewCTName)
{
#Get web, list and content type objects
$folderCT = $list.ContentTypes["Folder"]
$newCT = $list.ContentTypes[$NewCTName]
#Check if the values specified for the content types actually exist on the list
if (($newCT -ne $null))
{
$list.Folders | ForEach-Object {
if ($_.ContentType.ID.IsChildOf($newCT.ID) -eq $false -and $_.ContentType.ID.IsChildOf($folderCT.ID) -eq $true -and $_.Folder.ProgID -eq "Sharepoint.DocumentSet")
{
$item = $list.GetItemById($_.ID);
$item["ContentTypeId"] = $newCT.Id
$item.Update()
}
}
}
$web.Dispose()
}
function Reset-SPFileContentType ($web, $list, $OldCTName, $NewCTName)
{
#Get web, list and content type objects
$isChildOfCT = $list.ContentTypes.BestMatch($rootNewCT.ID).IsChildOf($rootNewCT.ID);
$oldCT = $list.ContentTypes[$OldCTName]
$folderCT = $list.ContentTypes["Folder"]
$newCT = $rootNewCT
$newCTID = $newCT.ID
#Check if the values specified for the content types actually exist on the list
if (($oldCT -ne $null) -and ($newCT -ne $null))
{
$list.Folders | ForEach-Object {
if ($_.ContentType.ID.IsChildOf($rootNewCT.ID) -eq $false -and $_.ContentType.ID.IsChildOf($oldCT.ID) -eq $true)
{
$_["ContentTypeId"] = $folderCT.Id
$_.Update()
}
}
#Go through each item in the list
$list.Items | ForEach-Object {
Write-Host "Item present CT ID :" $_.ContentType.ID
Write-Host "CT ID To change from :" $oldCT.ID
Write-Host "NEW CT ID to change to:" $rootNewCT.ID
#Check if the item content type currently equals the old content type specified
if ($_.ContentType.ID.IsChildOf($rootNewCT.ID) -eq $false -and $_.ContentType.ID.IsChildOf($oldCT.ID) -eq $true)
{
#Check the check out status of the file
if ($_.File.CheckOutType -eq "None")
{
Change the content type association for the item
$item = $list.GetItemById($_.ID);
$item.File.CheckOut()
write-host "Resetting content type for file: " $_.Name "from: " $oldCT.Name "to: " $newCT.Name
$item["ContentTypeId"] = $newCTID
$item.UpdateOverwriteVersion()
Write-Host "Item changed CT ID :" $item.ContentType.ID
$item.File.CheckIn("Content type changed to " + $newCT.Name, 1)
}
else
{
write-host "File" $_.Name "is checked out to" $_.File.CheckedOutByUser.ToString() "and cannot be modified"
}
}
else
{
write-host "File" $_.Name "is associated with the content type" $_.ContentType.Name "and shall not be modified"
}
}
}
else
{
write-host "One of the content types specified has not been attached to the list"$list.Title
return
}
$web.Dispose()
}
$web = Get-SPWeb $WebsiteUrl
$rootWeb = $web.Site.RootWeb;
$rootNewCT = $rootWeb.AvailableContentTypes[$NewCTName]
Foreach ($list in $web.Lists) {
Write-Host $list.BaseType
if($list.Hidden -eq $false -and $list.BaseType -eq "DocumentLibrary")
{
Write-Host "Processing list: " $list.Title
Reset-ListContentType –WebUrl $WebsiteUrl –ListName $list.Title –OldCTName $OldCTName –NewCTName $NewCTName
}
}
$web.Dispose()
Ok, this is probably a quite simple thing to do and maybe everyone knows it BUT just in case… :).
Well, you need two things a C# code(or some other way to ouput HTML) to generate and anchor link by which to display or hide a given HTML element.
In this case, let’s say that you have a DIV element. You assign as the ID value the client ID which your control receives from ASP .NET(notice that the unique ID or the ID but the ClientID, it is HTML friendly).
Then you would do something like the code below. You are calling a JavaScript function which receives as parameters the full ID for the DIV element and the start of the DIV full id.
The idea is to be able to identify all of the elements which are under this same hide and show logic while still being able to identify a single item.
Exmaple ID:
id =”cars_ASP.NETUserControlClientID_carnumber”
The bolded part would be used to identify ALL of the items which need be processed at the same time. While the non-bolded value(carnumber) is the identifier for a single item.
The C# code below creates an anchor calling the function below the C# code. Notice the href definition “javascript:void(0)“, this is to avoid page jump in certain browsers and their versions.
String.Format("<a href=\"javascript:void(0)\" onclick=\"showInfo('{0}','{2}')\">{1}</a>", "HTML Element ID to find unique counter or ID" + this.ClienID, "The link title/name", this.ClienID);
function showInfo(id, controlUniqueID) {
// Search for all items and hide them
$('*[id*=' + controlUniqueID + ']').each(function () {
$(this).hide();
});
// Next open only the item which you want to see
if ($('#' + id).css('display') == 'none') {
$('#' + id).show();
}
else {
$('#' + id).hide();
}
}
Here is a code sample on how to copy a file from one site collection to another one. Also how to update a file uploaded in the target destination and how to remove it from the source location.
private static void CheckWorkspaceForDocumentsToBeArhieved(String workspaceURL, string recordCenterURL)
{
Console.WriteLine("UpdateStatusOnMetadataPageOnWorkspace");
Uri siteUrl = new Uri(workspaceURL);
Uri recordCenterUrl = new Uri(recordCenterURL);
string realm = TokenHelper.GetRealmFromTargetUrl(siteUrl);
var token = TokenHelper.GetAppOnlyAccessToken(TokenHelper.SharePointPrincipal, siteUrl.Authority, realm).AccessToken;
using (var ctxWorkspace = TokenHelper.GetClientContextWithAccessToken(siteUrl.ToString(), token))
{
string realmRC = TokenHelper.GetRealmFromTargetUrl(recordCenterUrl);
var tokenRC = TokenHelper.GetAppOnlyAccessToken(TokenHelper.SharePointPrincipal, recordCenterUrl.Authority, realm).AccessToken;
using (var ctxWorkspaceRC = TokenHelper.GetClientContextWithAccessToken(recordCenterUrl.ToString(), token))
{
ListCollection lists = ctxWorkspace.Web.Lists;
var sourceWeb = ctxWorkspace.Web;
ctxWorkspace.Load(sourceWeb);
ctxWorkspace.Load(lists);
ctxWorkspace.ExecuteQuery();
foreach (List list in lists.Where(o => o.BaseType == BaseType.DocumentLibrary))
{
if (list.Hidden || list.IsApplicationList || list.IsCatalog)
continue;
if (!list.FieldExistsByName("Archieve") && !list.FieldExistsByName("Secret"))
continue;
Microsoft.SharePoint.Client.CamlQuery camlQuery = new CamlQuery();
camlQuery.ViewXml = @"<View Scope='RecursiveAll'><Query><Where>
<Eq>
<FieldRef Name='Archieve' />
<Value Type='Boolean'>1</Value>
</Eq>
</Where>
<ViewFields>
<FieldRef Name='Archieve' />
<FieldRef Name='FileRef' />
<FieldRef Name='Secret' />
<FieldRef Name='LinkFilename' />
<FieldRef Name='Title' />
</ViewFields></Query></View>";
var documents = list.GetItems(camlQuery);
ctxWorkspace.Load(documents);
ctxWorkspace.ExecuteQuery();
List<int> filesToRemove = new List<int>();
foreach (ListItem document in documents)
{
var file = document.File;
ctxWorkspace.Load(document);
ctxWorkspace.Load(file);
var fileStream = file.OpenBinaryStream();
ctxWorkspace.ExecuteQuery();
var isSecret = document.FieldValues.SingleOrDefault(o => o.Key.Contains("Secret"));
bool isSecretDocument = false;
if(isSecret.Value != null)
isSecretDocument = (bool)isSecret.Value;
FileCreationInformation fci = new FileCreationInformation();
using (var streamReader = new MemoryStream())
{
fileStream.Value.CopyTo(streamReader);
// This step is done to avoid an error with the copying the data to the target location
streamReader.Seek(0, System.IO.SeekOrigin.Begin);
// This variable is used in FileCreationInformation to avoid too large files data problem. More info: https://msdn.microsoft.com/en-us/library/office/dn904536.aspx
fci.ContentStream = streamReader;
fci.Url = file.Name;
fci.Overwrite = true;
List targetLibrary = null;
if (isSecretDocument)
{
targetLibrary = ctxWorkspaceRC.Web.Lists.GetByTitle("Secret");
}
else
{
targetLibrary = ctxWorkspaceRC.Web.Lists.GetByTitle("Public");
}
ctxWorkspaceRC.Load(targetLibrary);
ctxWorkspaceRC.ExecuteQuery();
var folder = targetLibrary.RootFolder.EnsureFolder("folder name");
ctxWorkspaceRC.Load(folder);
var uploadedFile = targetLibrary.RootFolder.Files.Add(fci);
ctxWorkspaceRC.Load(uploadedFile, f => f.ListItemAllFields);
ctxWorkspaceRC.ExecuteQuery();
var newDocumentItemInTargetLibrary = targetLibrary.GetItemById(uploadedFile.ListItemAllFields.Id);
ctxWorkspaceRC.Load(newDocumentItemInTargetLibrary);
ctxWorkspaceRC.ExecuteQuery();
newDocumentItemInTargetLibrary["field to update"] = sourceWeb.Title;
newDocumentItemInTargetLibrary["field to update"] = sourceWeb.Description;
newDocumentItemInTargetLibrary.Update();
if(uploadedFile.CheckOutType != CheckOutType.None)
uploadedFile.CheckIn("custom operation", CheckinType.MajorCheckIn);
ctxWorkspaceRC.ExecuteQuery();
filesToRemove.Add(document.Id);
}
}
foreach(int fileItemID in filesToRemove)
{
for (int x = documents.Count - 1; x >= 0; x-- )
{
if (documents[x].Id == fileItemID)
{
documents[x].DeleteObject();
}
}
}
ctxWorkspace.ExecuteQuery();
}
}
}
}
In this part and final (for the moment) on Genetic Algorithms I will introduce some new topic, operators and ways of thinking and improving on your Genetic Algorithm.
Most of the topics discussed here I gathered from the book AI Game Programming Wisdom 2 in the section regarding Genetic Algorithms.
I do have to warn though because I am moving at the moment to the neural network so I only created the base codes based on the books and C++ source codes for my Unity project to be used later. So there are some problems at the moment with the code but the code samples can work as an example to you to modify and use how it best fits you.
“Can be great for retaining population diversity, and are particularly useful where fitness landscape might contain multiple peaks or where it is essential to protect new innovation within a population.” (Rabin, 2003)
Explicit fitness sharing
“Is a mechanism where individuals with similar genetic properties are grouped together. Each individual’s score is then divided by the number of genomes within its group.
Newfitness = oldfitness / numberofneighboors
This punishes the fitness scores of individuals who have many similar neighbors, thereby preventing any one group from growing too large and taking over the population.” (Rabin, 2003)
Speciation
“Goes one step further by separating the genomes into groups in the same way as explicit fitness sharing, but this time individuals are only allowed to breed with members of their own species. Typically, a species is killed when either its size decreases to zero or its fitness has not increased within a user-defined number of generations. This means that the individuals that would normally have died out early in the evolution of a population remain active for much longer, protected among their species members. Because of the protection you can experiment with much larger mutation rates than normal.” (Rabin, 2003)
The compatibility function
“To determine if one genome should belong in the same species as another, you must use a function that compares two genes strings and returns a compatibility distance. If the distance is below a user defined threshold, then the genomes are considered to be in the same species. This compatibility function varies considerably depending on the encoding representation.
Dist= (x-y) / n
N = the number of genes in each chromosome
X and y= represent two different individuals in the population
Each iteration of the genetic algorithm, the compatibility function is used to test every individual against the first member in each species. If the distance is within a user defined threshold, then the individual is added to the appropriate species. Id the individual is incompatible with all the current species, then a new species is created and the individual is added to that.” (Rabin, 2003)
Sample code on the species class which will hold different species:
public class Species
{
//this is the genome all other genomes in the population are
//compared to see if they should be included in this species or not
Host SampleGenome;
List<Host> Members;
//the number of generations that has passed without the fittest
//member in the species showing an improvement.
int numGensNoImprovement;
float expectedNumberOfOffspring;
float bestEverFitness;
//the combined fitness scores of every member
float totalFitness;
//it's often useful to have an identity number for the species
int id;
public Species(Host firstMember, int id)
{
this.Members = new List<Host>();
this.id = id;
this.SampleGenome = firstMember;
this.expectedNumberOfOffspring = 0;
this.numGensNoImprovement = 0;
Members.Add(firstMember);
bestEverFitness = firstMember.Fitness();
totalFitness += firstMember.Fitness();
}
public void AddGenome(Host genome)
{
Members.Add(genome);
totalFitness += genome.Fitness();
if (genome.Fitness() > bestEverFitness)
{
bestEverFitness = genome.Fitness();
//fitness has improved so this can be reset
numGensNoImprovement = 0;
}
}
///
<summary>
/// selects a genome from the species using tournament selection.
/// this method uses tournament selection to spawn genomes from the
/// species.
///
/// As it is set presently it uses a high selection pressure.
/// </summary>
/// <returns></returns>
public Host SpawnGenome()
{
//first chose the number in the tournament. selection_pressure must be
//between zero and 1.0
int NumInComp = 0; float selection_pressure = 0.5F;
while (NumInComp < 1)
{
NumInComp = (int)(NumMembers() * selection_pressure);
selection_pressure += 0.1F;
}
int winner = 0;
for (int i = 0; i < NumInComp; ++i) { int ThisTry = RandomProvider.RND.Next(0, Members.Count() - 1); if (Members[ThisTry].Fitness() > Members[winner].Fitness())
{
winner = i;
}
}
return Members[winner];
}
///
<summary>
/// makes sure the sample genome is always the genome with the highest fitness found so far
/// </summary>
public void UpdateSampleGenome()
{
if (Members.Count > 0)
{
if (Members.Last().Fitness() > SampleGenome.Fitness())
{
SampleGenome = Members.Last();
}
else
{
Members[0] = SampleGenome;
}
}
++numGensNoImprovement;
}
public void Clear()
{
SampleGenome = Members[0];
expectedNumberOfOffspring = 0.0F;
Members.Clear();
totalFitness = 0.0F;
}
public Host Sample() { return SampleGenome; }
public float ExpectedOffspring() { return expectedNumberOfOffspring; }
///
<summary>
/// adjusts the fitness scores so they are set to the value of their expected number of offspring
/// </summary>
/// <param name="totalFitnessForPopoulation"></param>
/// <param name="PopSize"></param>
public void SetExpectedOffspring(float totalFitnessForPopoulation, int PopSize)
{
//check that we have some fitness scores to work with
if (totalFitnessForPopoulation == 0.0) return;
expectedNumberOfOffspring = 0.0F;
for (int gen = 0; gen < Members.Count(); ++gen)
{
float ExpectedForThisIndividual =
(Members[gen].Fitness() / totalFitnessForPopoulation) * PopSize;
expectedNumberOfOffspring += ExpectedForThisIndividual;
}
}
public int GenerationsNoImprovement() {return numGensNoImprovement;}
public bool Empty() {return (Members.Count() == 0);}
public float BestEverFitness() {return bestEverFitness;}
public float TotalFitness() {return totalFitness;}
///
<summary>
/// this method applies explicit fitness sharing by dividing each member's
/// fitness by the number of members in the species
/// </summary>
public void FitnessShare()
{
float NewTotal = 0.0F;
//divide each member's fitness by the number in the species
for (int m = 0; m < Members.Count(); ++m)
{
Members[m].SetFitness(Members[m].Fitness() / NumMembers());
NewTotal += Members[m].Fitness();
}
totalFitness = NewTotal;
}
public int ID() {return id;}
int NumMembers() {return Members.Count();}
public bool IsLessThan(Species species)
{
return (this.BestEverFitness() < species.BestEverFitness());
}
}
The helper function to be used with the Species class.
///
<summary>
/// Separate the population into species. This separates the individuals into species of similar genomes.
/// </summary>
public void Speciate(ref List<Host> population)
{
//first clear the existing members and kill off any non developing
//species
this.Species.Clear();
//now separate the population into species
for (int gen = 0; gen < population.Count; ++gen)
{
bool bAdded = false;
foreach(Species curSpecies in this.Species)
{
//calculate the compatibility score
double cs = Compatibility(population[gen], curSpecies.Sample());
//if the compatibility score is less than our tolerance then
//this genome is added to the species
if (cs < CompatibilityTolerance) { curSpecies.AddGenome(population[gen]); bAdded = true; break; } }//next species if (!bAdded) { //not compatible with any current species so create a new //species Species.Add(new Species(population[gen], NextSpeciesID++)); } }//next genome //update all the species to make sure their sample member is set //to the best genome found so far. Kill off any empty species //foreach (Species curSpecies in this.Species) for(int x = this.Species.Count -1; x >= 0; --x)
{
var curSpecies = this.Species[x];
curSpecies.UpdateSampleGenome();
if ((curSpecies.Empty() ||
(curSpecies.GenerationsNoImprovement() > GenerationsAllowedWithoutImprovement)) &&
(Species.Count > 1))
{
Species.RemoveAt(x);
}
}
}
///
<summary>
/// this allocates a compatibility score between two genomes. If the
/// score is below a certain threshold then the two genomes are </summary>
/// considered to be of the same species<param name="g1"></param>
/// <param name="g2"></param>
/// <returns></returns>
public float Compatibility(Host g1, Host g2)
{
if (g1.DNA.Genes.Count != g2.DNA.Genes.Count) return 0;
float RunningTotal = 0.0F;
for (int gene = 0; gene < g1.DNA.Genes.Count; ++gene)
{
//RunningTotal += Mathf.Abs(g1.Genes[gene] - g2.Genes[gene]);
RunningTotal += Vector2.Distance(g1.DNA.Genes[gene], g2.DNA.Genes[gene]);
}
return RunningTotal / g1.DNA.Genes.Count;
}
///
<summary>
/// this method calculates the amount of offspring each species
/// should produce.</summary>
/// <param name="AmountNeeded"></param>
public void CalculateExpectedOffspring(int AmountNeeded)
{
//first calculate the total fitness of all active genomes
float TotalFitness = 0.0F;
foreach (Species curSpecies in this.Species)
{
//apply fitness sharing first
curSpecies.FitnessShare();
TotalFitness += curSpecies.TotalFitness();
}
//now it is necessary to calculate the expected amount of offspring
//from each species
double expec = 0.0;
foreach (Species curSpecies in this.Species)
{
curSpecies.SetExpectedOffspring(TotalFitness, AmountNeeded);
expec += curSpecies.ExpectedOffspring();
}
}
///
<summary>
/// this sorts the species and assigns a color to each one.
/// The color is just cosmetic to be used as a visual aid.</summary>
public void SortAndAssignVisualAid()
{
if (this.Species.Count < 0) return; this.Species.OrderByDescending(o => o.BestEverFitness());
}
Co-evolution
“If you have different species competing with each other, they have to work harder at solving the problem. If the fitness of one is somehow inversely proportional to the fitness of the other, you have competition. This can greatly increase the speed and quality of the evolution process.” (Rabin, 2003)
The Michalewicz method
“Contains several mutation and crossover operators that are used in combination to fully exploit the characteristics of real value encoding GAs.
Mutation operators
Boundary mutation: With probability p, this operator changes the value of a gene to either its minimum possible value, Gmin, or its maximum possible value Gmax.
Replace mutation: With probability p, this operator resets a gene to a uniform random between Gmin and Gmax.
Non-Uniform mutation: With probability p, this operator adjusts a gene’s size by a small random amount, the limits of which decrease over time.
Crossover operators
Arithmetical crossover: This simply averages the values of the genes at each locus.
Simple crossover: this operator is the same as a single point crossover.
Heuristics crossover: Given the parents G1 and G2 this operator creates an offspring according to Equation: Child = r(G2 – G1) + G2
Variable G2 is the fitter of the two parents, and r is a random number between 0 and 1.
Alone these operators produce bad results but combined together they have a tendency to produce fitter offspring’s.
Tip: Using adaptive probability distribution for the operators can be speedier and lose little in performance. This means that each operator uses the same equal probability.” (Rabin, 2003)
/// <summary>
/// displaces a gene's value by a small random amount (limited by
/// MutationDelta)
/// </summary>
/// <param name="genes"></param>
public void MutateUniform(ref List<Vector2> genes)
{
for (int i = 0; i < genes.Count; i++)
{
//flip this bit?
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) < MutationRate)
{
float angle = UnityEngine.Random.Range(0, AIConstants.TWO_PI);
genes[i] = ((new Vector2(UnityEngine.Mathf.Cos(angle), UnityEngine.Mathf.Sin(angle))) * UnityEngine.Random.Range(0, (float)RandomProvider.RandomClamped() * this.MutationDelta));
}
}
}
/// <summary>
/// displaces the genes value by an amount described by a normal distribution
/// </summary>
/// <param name="genes"></param>
public void MutateGaussian(ref List<Vector2> genes)
{
for (int i = 0; i < genes.Count; i++)
{
//flip this bit?
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) < MutationRate)
{
//genes[i] =
/*
//do we mutate this gene?
if (RandFloat() < m_dMutationRate)
{
gen.vecGenes[gene] += RandGaussian(0.0, 0.1);
//make sure the value stays within the desired lims
Clamp(gen.vecGenes[gene], 0.0, 1.0);
*/
}
}
}
/// <summary>
/// Sets a gene to either the min or max possible value (0 or 1 in this
/// implementation)
/// </summary>
/// <param name="genes"></param>
public void MutateBoundary(ref List<Vector2> genes)
{
for (int i = 0; i < genes.Count; i++)
{
float angle = UnityEngine.Random.Range(0, AIConstants.TWO_PI);
//flip this bit?
Vector2 tempVector = new Vector2(UnityEngine.Mathf.Cos(angle), UnityEngine.Mathf.Sin(angle)) * UnityEngine.Random.Range(0, AIConstants.maxforce);
if (tempVector.x < mutationBoundryValueMin.x && tempVector.y < mutationBoundryValueMin.y)
{
genes[i] = Vector2.zero;
}
else
{
genes[i] = this.mutationBoundryValueMax;
}
}
}
public void MutateMichalewicz(ref List<Vector2> genes)
{
float chance = (float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1);
if (chance <= 0.333)
{
MutateBoundary(ref genes);
}
else if (chance >= 0.667)
{
MutateUniform(ref genes);
}
else
{
MutateReplace(ref genes);
}
}
/// <summary>
/// the Michalewicz crossover operator choses one of three different
/// crossover operators based on an even probability
/// </summary>
/// <param name="mum"></param>
/// <param name="dad"></param>
/// <param name="baby1"></param>
/// <param name="baby2"></param>
public void CrossoverMichalewicz(ref List<Vector2> mum, ref List<Vector2> dad, ref List<Vector2> baby1, ref List<Vector2> baby2)
{
if (((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) > CrossoverRate) || (mum == dad) || mum.Count <= 0 || dad.Count <= 0)
{
baby1 = mum;
baby2 = dad;
return;
}
float chance = (float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1);
if (chance <= 0.333)
{
CrossoverTwoPoint(ref mum, ref dad, ref baby1, ref baby2);
}
else if (chance >= 0.667)
{
CrossoverAverage(ref mum, ref dad, ref baby1, ref baby2);
}
else
{
CrossoverHeuristic(ref mum, ref dad, ref baby1);
CrossoverHeuristic(ref mum, ref dad, ref baby2);
}
}
/// <summary>
/// This crossover operator simply averages the genes at each locus
/// </summary>
/// <param name="mum"></param>
/// <param name="dad"></param>
/// <param name="baby1"></param>
/// <param name="baby2"></param>
public void CrossoverAverage(ref List<Vector2> mum, ref List<Vector2> dad, ref List<Vector2> baby1, ref List<Vector2> baby2)
{
for (int gen = 0; gen < mum.Count; ++gen)
{
baby1.Add((mum[gen] + dad[gen]) * 0.5F);
baby2.Add((mum[gen] + dad[gen]) * 0.5F);
}
}
/// <summary>
/// this operator creates a child whos genes are biased towards the
/// fitter parent.
///
/// The heuristic used is r(V1-V2) + V1 where V1 is the fitter parent
/// </summary>
/// <param name="mum"></param>
/// <param name="dad"></param>
/// <param name="baby1"></param>
/// <param name="baby2"></param>
public void CrossoverHeuristic(ref List<Vector2> fittest, ref List<Vector2> not_so_fit, ref List<Vector2> baby1)
{
float rate = (float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1);
//iterate down the length of the genome using the heuristic
for (int gene = 0; gene < fittest.Count; ++gene)
{
Vector2 NewGeneValue = fittest[gene] + rate *
(fittest[gene] - not_so_fit[gene]);
baby1.Add(NewGeneValue);
}
}
Genetic programming
“Genetic programming is a powerful extension to evolutionary computing. They allow considerable flexibility, allowing the application programmer to develop solutions of more diverse structure that those provided by genetic algorithms. Grammatical evolution is one way to implement genetic programming while minimizing the computation expense related to the creation and subsequent elimination of nonsense organism.” (Rabin, 2003)
Growth in Genetic Algorithms
Idea: Enhancements through flexibility
Growth
Neutral networks
Variable-length genome
Speciation
Co-evolution.
Growth
Look at nature for examples on how to give more “life” to Gas and not look to static created by a mechanical code. In nature things grow rather than are strictly parametrized. Look at this growth methods from biology, chemistry etc. (Rabin, 2003)
Environment
Such example would be to look at the environment something lives in and how they affect one another. What factors, things and properties of both the environment and the entity play into the role of environment based growth? When do the growth stop for a living organism? Take into consideration how the growth is going to be terminated. Look again into nature. (Rabin, 2003)
Neutral network
A concept that increases evolvability. Have some junk DNA switched on inside an important host. This junk DNA can then be subject to the same rules of evolution. This might not produce great results at first but every now and then might help be of advantage. Increases the probability of finding a better solution.
Implementation ideas:
Scanning through the genome with a logic that determines an interaction values by a single mutation, for example.
Variable-length genomes: Have unequal crossover points. Both parents have their own crossover points so a child can have a genome much bigger or smaller than its two parents, which allows evolution to control the complexity of the solutions.
Another example would be to duplicate a gene once it is identified or to just make a random duplications during crossover.
This would mean that the system cannot be a simple parametrization system.
(Rabin, 2003)
Comparing parametrization and growth
Aspect
Parametrization
Growth
Encoding scheme
+Intuitive (explicit)
-Incomprehensible
+/- Environment independent
+/- Environment dependence
+ Immediate mapping
· Needs growth period
Evolvability
· Restricted/fixed complexity
+ Potentially limitless
· Designer biased
+ Evolution free to play
· Mutation effect fixed
+ Neutral networks
· Complexity DNA length
+ Controllable mutation
+/- Just optimization
+ Specialization
(Rabin, 2003)
Few helper functions
/// <summary>
/// this function calculates the variance of the population
/// </summary>
/// <param name="hosts"></param>
/// <returns></returns>
float CalculateVariance(ref List<Host> hosts)
{
float RunningTotal = 0.0F;
//first iterate through the population to calculate the standard
//deviation
for (int gen = 0; gen < hosts.Count; ++gen)
{
RunningTotal += (hosts[gen].DNA.Fitness - this.AverageFitnessScore) *
(hosts[gen].DNA.Fitness - this.AverageFitnessScore);
}
return RunningTotal / hosts.Count;
}
/// <summary>
/// calculates the fittest and weakest genome and the average/total
/// fitness scores.
/// </summary>
/// <param name="source"></param>
public void CalculateBestWorstAverageTotalFitnessScore(ref List<Host> source)
{
this.TotalFitnessScore = 0;
this.BestFitnessScore = 0;
this.WorstFitnessScore = float.MaxValue;
foreach (Host host in source)
{
float normalizedFitnessScore = host.DNA.Fitness;
if (normalizedFitnessScore < this.WorstFitnessScore)
this.WorstFitnessScore = normalizedFitnessScore;
if (normalizedFitnessScore > this.BestFitnessScore)
this.BestFitnessScore = normalizedFitnessScore;
this.TotalFitnessScore += normalizedFitnessScore;
}
this.AverageFitnessScore = this.TotalFitnessScore / source.Count;
}
OK, in the last post I wrote some basics on Genetic Algorithms and I showed an example which solved a text “puzzle”. In this post I will go a bit more in-depth into Genetic Algorithms and show you an example of a Genetic Algorithm solving a “best” possible path to different targets while avoiding obstacles.
I will start by showing the target seeking Genetic Algorithm in action. Then I will go through the code explaining the important parts. After that I will spend some time on general GA related topics and give some explanations and sample codes.
This post is more about gathering as much details as possible into on location. This works as notes for me and hopefully is someone else is looking at this post he or she might find something useful. This is a work in progress so I apologize for any possible errors in this post. I’ll fix them when I become aware of them and when I can.
Updated 16.9.2015: Made some additions to the descriptions to existing operators and scaling techniques.
The sample application is rather simple. It has the following tasks to perform.
The object which is the focus of our attention is to find target
Arrange the targets by how close they are from the initial starting position
Then chose the closest target
Next calculate the best possible path to the target. Things to take into consideration:
Avoiding obstacles
Finding the fastest route with least amount of steps
Favoring paths which last location is closest to a target
Favoring paths which travel the least amount of space
When the best possible path has been found to the target using the path.
Go to the next target until no more targets are left and repeat steps 4-6
Here are the link to the source codes for those whom are interesting in poking around :). I warn though that the code and the solution is raw and unpolished. It does the job when I am learning GAs at the moment:
Sample unity project for those who like to use unity: Download
The GA solution most important piece of codes are:
The main class named GA which is attached to an object in unity and start the GA operations to find an optimal path: Source
The Host class (also known as chromosome or phenotype) class which is responsible calculations on the DNA data: Source
The Genome class(also known as genotype or DNA) this is the actual data container for an optimal path: Source
The update function
Under the hood the following things happen:
First certain values are initialized to help a quick and solid paths finding using GA
Generate a starting population and calculate initial fitness scores.
Start the cycles for solving best path to a target
Check to see if there is a fittest path that has reached the target without hitting any obstacles.
If this is so then draw the path and move the object to the target and go to the next target.
If there is no fittest path yet solved then start solving the best path
start the generation processing
Start selection process based on fitness score
Create offspring’s to populate with parents data
Do a crossover
Mutate
Add the offspring’s to the mating pool
Repeat the step until max population size has been reached for the mating pool
Assign the mating pool as the new population
Calculate new fitness score based on the new population
Perform a possible scaling operation
Draw the current generations best path
As you can see the sample GA application follow similar steps found in GA algorithms. Later I’ll go more deep into the details of GA specific functionalities and how to apply them.
Sample code
For now take a look at the update function which performs these GA related operation:
float step = 0;
// Check to see if the target has been reached
if (FittestGenome != null && this.HitTarget && !this.HitObstacle)
{
if (pathPosition < (this.FittestGenome.DNALocations.Genes.Count))
{
{
Debug.Log("target reached and no more calculations will be done. this path is to be drawn");
// Move the object to the next position
pathPosition++;
if (pathPosition >= this.FittestGenome.DNALocations.Genes.Count - 1)
return;
step = speed * Time.deltaTime;
// Some simple waypoints drawing on the screen for the fittest genome
if (pathPosition > 0 && (pathPosition - 1) < this.FittestGenome.DNA.Genes.Count - 1)
{
this.previousPosition = this.FittestGenome.DNALocations.Genes[pathPosition - 1];
Debug.DrawLine(previousPosition, this.FittestGenome.DNALocations.Genes[pathPosition], Color.red, 60);
waypoints.Add(GameObject.CreatePrimitive(PrimitiveType.Cube));
waypoints[waypoints.Count - 1].transform.position = this.FittestGenome.DNALocations.Genes[pathPosition];
waypoints[waypoints.Count - 1].transform.localScale *= 0.05F;
}
StartCoroutine(MoveObject(transform, transform.position, this.FittestGenome.DNALocations.Genes[pathPosition], time, pathPosition, this.FittestGenome.DNALocations.Genes.Count - 1));
}
return;
}
// If the target has reached it's position then do not do anything else or reset the data
else if (pathPosition >= this.FittestGenome.DNALocations.Genes.Count)
{
var d = Vector2.Distance(transform.position, this.FittestGenome.DNALocations.Genes[pathPosition - 1]);
if (this.myRigidbody.IsSleeping() && Vector2.Distance(transform.position, this.FittestGenome.DNALocations.Genes[pathPosition - 1]) < 1)
this.ResetGAData();
return;
}
}
else
{
List<Host> matingPool = new List<Host>();
int NewBabies = 0;
var host = this.Hosts;
while (NewBabies < PopSize)
{
Host mum = MonteCarloSelection();
Host dad = MonteCarloSelection();
Host baby1 = new Host(0, this.transform.position, sqrMinimumExtent, layerMask);
Host baby2 = new Host(0, this.transform.position, sqrMinimumExtent, layerMask);
this.Crossover(ref mum.DNA.Genes, ref dad.DNA.Genes, ref baby1.DNA.Genes, ref baby2.DNA.Genes);
Mutate(ref baby1.DNA.Genes);
Mutate(ref baby2.DNA.Genes);
matingPool.Add(baby1);
matingPool.Add(baby2);
NewBabies += 2;
}
this.Hosts = matingPool;
this.UpdateFitnessScores();
var hosts = this.Hosts;
this.FitnessScaleRankingToFloatRangeZeroToOne(ref hosts);
++this.Generation;
this.DrawFailedPaths();
step = speed * Time.deltaTime;
}
The fitness score calculation
But currently I will only show you the calculations which define the fitness values and plays the major role in determining a valid path to a target.
In this sample there are two classes which hold the actual data for the paths. One is named as genome which holds the travel data. The other is named Host which holds the genome data and operates on it.
Sample code
public Vector2 CalcualteEndLocationOfHost(Vector2 targetLocation)
{
Color rayColor = Color.yellow;
float rayLenght = 0.5F;
this.EndLocation = this.StartLocation;
for(int x = 0; x &amp;lt; this.DNA.Genes.Count; ++x) { // Update a new position for the host acceleration += this.DNA.Genes[x]; velocity += acceleration; Vector2 previousPosition = this.EndLocation; this.EndLocation += velocity; // Check to see if the new position has moved to another location Vector2 movementThisStep = this.EndLocation - previousPosition; float movementSqrMagnitude = movementThisStep.sqrMagnitude; if (movementSqrMagnitude &amp;gt; sqrMinimumExtent)
{
float movementMagnitude = Mathf.Sqrt(movementSqrMagnitude);
// Check to see if the host has hit something
RaycastHit2D hitInfo = Physics2D.Raycast(previousPosition, movementThisStep, movementMagnitude, layerMask.value);
//Debug.DrawRay(previousPosition, movementThisStep, rayColor, rayLenght);
var distanceToTarget = Vector2.Distance(targetLocation, this.EndLocation);
this.DistanceTraveled += distanceToTarget;
//if (!hitInfo.collider)
{
this.DNALocations.Genes.Add(this.EndLocation);
}
// If a hit occured check to see is it the target or some other obstacle
if(hitInfo.collider)
{
// this is must the the target
if (distanceToTarget &amp;lt; 0.2 &amp;amp;&amp;amp; !this.HitObstacle)
{
rayColor = Color.green;
//this.DNALocations.Genes.Add(this.EndLocation);
this.HitTarget = true;
break;
}
// We have hit an obstacle this stop processing
else
{
rayColor = Color.red;
this.HitObstacle = true;
this.obstaclesHit++;
//break;
}
}
if(!this.HitTarget)
this.finnishTime++;
}
acceleration *= 0;
}
return this.EndLocation;
}
public float CalculateFitnessAndDistance(Vector2 targetLocation)
{
if(this.EndLocation != this.StartLocation)
{
this.DistanceToTarget = Vector2.Distance(targetLocation, this.EndLocation);
float distanceToTargetFromStartLocation = Vector2.Distance(targetLocation, this.StartLocation);
// The target has been reached and make sure we are not dividing by zero.
if (this.DistanceToTarget == 0)
this.DistanceToTarget = 1;
// This means only the target was hit and make sure that we do not end up diving by zero
if (this.obstaclesHit == 0)
this.obstaclesHit = 1;
/* The fitness score is based on four things.
1. The time it takes a path to reach its target, if time is not taken into consideration the slowest and safest path will win
2. the distance from the path, if distance is not taken into consideration the result would be a long path
3. and how many obstacles hit along the path, if obstacles are not taken into consideration the GA will not know if a path is good or bad even if it hits obstacles and how bad it is. The more obstacles a path hits the more unlikely it is to get reproduced.
4. The travel distance. The smaller the distance is the better the fitness score will be.
In each of the values the lower the time, or the distance or the obstacles hit the better the fitness score is. That is the score will be closer to 1 or above it for the best possible path.
The higher any value is the more likely it is to not be the most optimal path. The worst path is the one that hits obstacles.
*/
var calculation = (this.finnishTime) * this.DistanceToTarget * this.obstaclesHit * this.DistanceTraveled;
this.DNA.Fitness = 1 / calculation;
}
// We have not reached the target yet, this is another obstacle and the fitness must be reduced. We want to penalize the path for any obstacles. This is not what we want in the population for possible solutions.
if(this.HitObstacle)
{
this.DNA.Fitness *= 0.1F;
}
// Award the path for hitting the target.
if(this.HitTarget)
{
this.DNA.Fitness += 1;
}
return this.DNA.Fitness;
}
The genome code above is rather simple. All it does is store the data and initializes the genome with random vectors pointing to direction where to go next.
The host data is the actual place where the main fitness calculation takes place based on the genome data.
Now take a look at the function: CalcualteEndLocationOfHost. This function is called before the fitness calculation function named: CalculateFitnessAndDistance. In the CalcualteEndLocationOfHost function pre fitness calculation operations are done such as determining if the object has hit obstacles, how many of them, has it hit the target and also calculates the path for the object.
The fitness score is then calculated in the CalculateFitnessAndDistance function. The fitness score is based on four things:
The time it takes a path to reach its target, if time is not taken into consideration the slowest and safest path will win
the distance from the path, if distance is not taken into consideration the result would be a long path
and how many obstacles hit along the path, if obstacles are not taken into consideration the GA will not know if a path is good or bad even if it hits obstacles and how bad it is. The more obstacles a path hits the more unlikely it is to get reproduced.
The travel distance. The smaller the distance is the better the fitness score will be.
In each of the values the lower the time, or the distance or the obstacles hit the better the fitness score is. That is the score will be closer to 1 or above it for the best possible path.
The higher any value is the more likely it is to not be the most optimal path. The worst path is the one that hits obstacles. This is by the way a crude fitness score calculation method. There is definitely room for improvement. Currently it does the job.
This is basically it. There are of course more details such as the mutation operators, crossover and selection operators but I will cover these later since these are common to Genetic Algorithms and not specific to this path finding GA.
Also as with the previous GA example solving the text puzzle the same GA related parameters need to be defined and tuned for optimal performance. These are:
The Population size
Crossover rate (this is a new one, simply determined if a crossover should be performed by a random floating point value between 0 and 1.)
Mutation rate
Genome genes amount(in other words how many steps in a path)
Images on the path solving application
Now it’s time to look at some cool visual images on how the path finding operates.
In this first image you can see how the very first generation has no specific knowledge where to go yet. It goes all over the place in a circular path. The yellow lines represent path steps taken which do not collide with obstacles while red lines represent the path hitting an obstacle.
The next image shows you close to the very last generation in the path solving GA. Here you can see that the algorithms has converged towards the target and is just about to find the proper path based on the fitness score.
In this last image you can see the algorithm in play later just searching for possible best path (the green lines). While the red lines represent paths traveled by accepted paths.
Genetic algorithms explained
Here I have gathered some explanations from other sources to help you grasp and get a better idea of GAs. These also work for me as notes for later usage.
What is a Genetic Algorithm
“Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetics. As such they represent an intelligent exploitation of a random search used to solve optimization problems. Although randomised, GAs are by no means random, instead they exploit historical information to direct the search into the region of better performance within the search space. The basic techniques of the GAs are designed to simulate processes in natural systems necessary for evolution, specially those follow the principles first laid down by Charles Darwin of “survival of the fittest.”. Since in nature, competition among individuals for scanty resources results in the fittest individuals dominating over the weaker ones.” http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html
“Concisely stated, a genetic algorithm (or GA for short) is a programming technique that mimics biological evolution as a problem-solving strategy. Given a specific problem to solve, the input to the GA is a set of potential solutions to that problem, encoded in some fashion, and a metric called a fitness function that allows each candidate to be quantitatively evaluated. These candidates may be solutions already known to work, with the aim of the GA being to improve them, but more often they are generated at random.
The GA then evaluates each candidate according to the fitness function. In a pool of randomly generated candidates, of course, most will not work at all, and these will be deleted. However, purely by chance, a few may hold promise – they may show activity, even if only weak and imperfect activity, toward solving the problem.
These promising candidates are kept and allowed to reproduce. Multiple copies are made of them, but the copies are not perfect; random changes are introduced during the copying process. These digital offspring then go on to the next generation, forming a new pool of candidate solutions, and are subjected to a second round of fitness evaluation. Those candidate solutions which were worsened, or made no better, by the changes to their code are again deleted; but again, purely by chance, the random variations introduced into the population may have improved some individuals, making them into better, more complete or more efficient solutions to the problem at hand. Again these winning individuals are selected and copied over into the next generation with random changes, and the process repeats. The expectation is that the average fitness of the population will increase each round, and so by repeating this process for hundreds or thousands of rounds, very good solutions to the problem can be discovered.
As astonishing and counterintuitive as it may seem to some, genetic algorithms have proven to be an enormously powerful and successful problem-solving strategy, dramatically demonstrating the power of evolutionary principles. Genetic algorithms have been used in a wide variety of fields to evolve solutions to problems as difficult as or more difficult than those faced by human designers. Moreover, the solutions they come up with are often more efficient, more elegant, or more complex than anything comparable a human engineer would produce. In some cases, genetic algorithms have come up with solutions that baffle the programmers who wrote the algorithms in the first place!” http://www.talkorigins.org/faqs/genalg/genalg.html
Genetic Algorithms Overview
“GAs simulate the survival of the fittest among individuals over consecutive generation for solving a problem. Each generation consists of a population of character strings that are analogous to the chromosome that we see in our DNA. Each individual represents a point in a search space and a possible solution. The individuals in the population are then made to go through a process of evolution.
GAs are based on an analogy with the genetic structure and behavior of chromosomes within a population of individuals using the following foundations:
Individuals in a population compete for resources and mates.
Those individuals most successful in each ‘competition’ will produce more offspring than those individuals that perform poorly.
Genes from `good’ individuals propagate throughout the population so that two good parents will sometimes produce offspring that are better than either parent.
[Start] Generate random population of n chromosomes (suitable solutions for the problem)
[Fitness] Evaluate the fitness f(x) of each chromosome x in the population
[New population] Create a new population by repeating following steps until the new population is complete
[Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)
[Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.
[Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
[Accepting] Place new offspring in a new population
[Replace] Use new generated population for a further run of algorithm
[Test] If the end condition is satisfied, stop, and return the best solution in current population
“A population of individuals are maintained within search space for a GA, each representing a possible solution to a given problem. Each individual is coded as a finite length vector of components, or variables, in terms of some alphabet, usually the binary alphabet {0,1}. To continue the genetic analogy these individuals are likened to chromosomes and the variables are analogous to genes. Thus a chromosome (solution) is composed of several genes (variables). A fitness score is assigned to each solution representing the abilities of an individual to `compete’. The individual with the optimal (or generally near optimal) fitness score is sought. The GA aims to use selective `breeding’ of the solutions to produce `offspring’ better than the parents by combining information from the chromosomes.
The GA maintains a population of n chromosomes (solutions) with associated fitness values. Parents are selected to mate, on the basis of their fitness, producing offspring via a reproductive plan. Consequently highly fit solutions are given more opportunities to reproduce, so that offspring inherit characteristics from each parent. As parents mate and produce offspring, room must be made for the new arrivals since the population is kept at a static size. Individuals in the population die and are replaced by the new solutions, eventually creating a new generation once all mating opportunities in the old population have been exhausted. In this way it is hoped that over successive generations better solutions will thrive while the least fit solutions die out.
New generations of solutions are produced containing, on average, better genes than a typical solution in a previous generation. Each successive generation will contain more good `partial solutions’ than previous generations. Eventually, once the population has converged and is not producing offspring noticeably different from those in previous generations, the algorithm itself is said to have converged to a set of solutions to the problem at hand.” http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html
Implementation Details
“
Based on Natural Selection
After an initial population is randomly generated, the algorithm evolves the through three operators:
Selection which equates to survival of the fittest;
Crossover which represents mating between individuals;
Mutation which introduces random modifications.
Selection Operator
key idea: give preference to better individuals, allowing them to pass on their genes to the next generation.
The goodness of each individual depends on its fitness.
Fitness may be determined by an objective function or by a subjective judgement.
Crossover Operator
Prime distinguished factor of GA from other optimization techniques
Two individuals are chosen from the population using the selection operator
A crossover site along the bit strings is randomly chosen
The values of the two strings are exchanged up to this point
The two new offspring created from this mating are put into the next generation of the population
By recombining portions of good individuals, this process is likely to create even better individuals
Mutation Operator
With some low probability, a portion of the new individuals will have some of their bits flipped.
Its purpose is to maintain diversity within the population and inhibit premature convergence.
Mutation alone induces a random walk through the search space
Mutation and selection (without crossover) create a parallel, noise-tolerant, hill-climbing algorithms
Effects of Genetic Operators
Using selection alone will tend to fill the population with copies of the best individual from the population
Using selection and crossover operators will tend to cause the algorithms to converge on a good but sub-optimal solution
Using mutation alone induces a random walk through the search space.
I think this quote is rather good at explaining uses for Genetic Algorithms:
“Genetic algorithms has been used for difficult problems (such as NP-hard problems), for machine learning and also for evolving simple programs. They have been also used for some art, for evolving pictures and music.
Advantage of GAs is in their parallelism. GA is travelling in a search space with more individuals (and with genotype rather than phenotype) so they are less likely to get stuck in a local extreme like some other methods.
They are also easy to implement. Once you have some GA, you just have to write new chromosome (just one object) to solve another problem. With the same encoding you just change the fitness function and it is all. On the other hand, choosing encoding and fitness function can be difficult.
Disadvantage of GAs is in their computational time. They can be slower than some other methods. But with todays computers it is not so big problem.
To get an idea about problems solved by GA, here is a short list of some applications:
Nonlinear dynamical systems – predicting, data analysis
Designing neural networks, both architecture and weights
“The first and most important point is that genetic algorithms are intrinsically parallel. Most other algorithms are serial and can only explore the solution space to a problem in one direction at a time, and if the solution they discover turns out to be suboptimal, there is nothing to do but abandon all work previously completed and start over. However, since GAs have multiple offspring, they can explore the solution space in multiple directions at once. If one path turns out to be a dead end, they can easily eliminate it and continue work on more promising avenues, giving them a greater chance each run of finding the optimal solution.However, the advantage of parallelism goes beyond this. Consider the following: All the 8-digit binary strings (strings of 0’s and 1’s) form a search space, which can be represented as ******** (where the * stands for “either 0 or 1”). The string 01101010 is one member of this space. However, it is also a member of the space 0*******, the space 01******, the space 0******0, the space 0*1*1*1*, the space 01*01**0, and so on. By evaluating the fitness of this one particular string, a genetic algorithm would be sampling each of these many spaces to which it belongs. Over many such evaluations, it would build up an increasingly accurate value for the average fitness of each of these spaces, each of which has many members. Therefore, a GA that explicitly evaluates a small number of individuals is implicitly evaluating a much larger group of individuals – just as a pollster who asks questions of a certain member of an ethnic, religious or social group hopes to learn something about the opinions of all members of that group, and therefore can reliably predict national opinion while sampling only a small percentage of the population. In the same way, the GA can “home in” on the space with the highest-fitness individuals and find the overall best one from that group. In the context of evolutionary algorithms, this is known as the Schema Theorem, and is the “central advantage” of a GA over other problem-solving methods (Holland 1992, p. 68; Mitchell 1996, p.28-29; Goldberg 1989, p.20).
Due to the parallelism that allows them to implicitly evaluate many schema at once, genetic algorithms are particularly well-suited to solving problems where the space of all potential solutions is truly huge – too vast to search exhaustively in any reasonable amount of time. Most problems that fall into this category are known as “nonlinear”. In a linear problem, the fitness of each component is independent, so any improvement to any one part will result in an improvement of the system as a whole. Needless to say, few real-world problems are like this. Nonlinearity is the norm, where changing one component may have ripple effects on the entire system, and where multiple changes that individually are detrimental may lead to much greater improvements in fitness when combined. Nonlinearity results in a combinatorial explosion: the space of 1,000-digit binary strings can be exhaustively searched by evaluating only 2,000 possibilities if the problem is linear, whereas if it is nonlinear, an exhaustive search requires evaluating 21000 possibilities – a number that would take over 300 digits to write out in full.Fortunately, the implicit parallelism of a GA allows it to surmount even this enormous number of possibilities, successfully finding optimal or very good results in a short period of time after directly sampling only small regions of the vast fitness landscape (Forrest 1993, p. 877). For example, a genetic algorithm developed jointly by engineers from General Electric and Rensselaer Polytechnic Institute produced a high-performance jet engine turbine design that was three times better than a human-designed configuration and 50% better than a configuration designed by an expert system by successfully navigating a solution space containing more than 10387 possibilities. Conventional methods for designing such turbines are a central part of engineering projects that can take up to five years and cost over $2 billion; the genetic algorithm discovered this solution after two days on a typical engineering desktop workstation (Holland 1992, p.72).
Another notable strength of genetic algorithms is that they perform well in problems for which the fitness landscape is complex – ones where the fitness function is discontinuous, noisy, changes over time, or has many local optima. Most practical problems have a vast solution space, impossible to search exhaustively; the challenge then becomes how to avoid the local optima – solutions that are better than all the others that are similar to them, but that are not as good as different ones elsewhere in the solution space. Many search algorithms can become trapped by local optima: if they reach the top of a hill on the fitness landscape, they will discover that no better solutions exist nearby and conclude that they have reached the best one, even though higher peaks exist elsewhere on the map.Evolutionary algorithms, on the other hand, have proven to be effective at escaping local optima and discovering the global optimum in even a very rugged and complex fitness landscape. (It should be noted that, in reality, there is usually no way to tell whether a given solution to a problem is the one global optimum or just a very high local optimum. However, even if a GA does not always deliver a provably perfect solution to a problem, it can almost always deliver at least a very good solution.) All four of a GA’s major components – parallelism, selection, mutation, and crossover – work together to accomplish this. In the beginning, the GA generates a diverse initial population, casting a “net” over the fitness landscape. (Koza (2003, p. 506) compares this to an army of parachutists dropping onto the landscape of a problem’s search space, with each one being given orders to find the highest peak.) Small mutations enable each individual to explore its immediate neighborhood, while selection focuses progress, guiding the algorithm’s offspring uphill to more promising parts of the solution space (Holland 1992, p. 68).However, crossover is the key element that distinguishes genetic algorithms from other methods such as hill-climbers and simulated annealing. Without crossover, each individual solution is on its own, exploring the search space in its immediate vicinity without reference to what other individuals may have discovered. However, with crossover in place, there is a transfer of information between successful candidates – individuals can benefit from what others have learned, and schemata can be mixed and combined, with the potential to produce an offspring that has the strengths of both its parents and the weaknesses of neither. This point is illustrated in Koza et al. 1999, p.486, where the authors discuss a problem of synthesizing a lowpass filter using genetic programming. In one generation, two parent circuits were selected to undergo crossover; one parent had good topology (components such as inductors and capacitors in the right places) but bad sizing (values of inductance and capacitance for its components that were far too low). The other parent had bad topology, but good sizing. The result of mating the two through crossover was an offspring with the good topology of one parent and the good sizing of the other, resulting in a substantial improvement in fitness over both its parents.The problem of finding the global optimum in a space with many local optima is also known as the dilemma of exploration vs. exploitation, “a classic problem for all systems that can adapt and learn” (Holland 1992, p. 69). Once an algorithm (or a human designer) has found a problem-solving strategy that seems to work satisfactorily, should it concentrate on making the best use of that strategy, or should it search for others? Abandoning a proven strategy to look for new ones is almost guaranteed to involve losses and degradation of performance, at least in the short term. But if one sticks with a particular strategy to the exclusion of all others, one runs the risk of not discovering better strategies that exist but have not yet been found. Again, genetic algorithms have shown themselves to be very good at striking this balance and discovering good solutions with a reasonable amount of time and computational effort.
Another area in which genetic algorithms excel is their ability to manipulate many parameters simultaneously (Forrest 1993, p. 874). Many real-world problems cannot be stated in terms of a single value to be minimized or maximized, but must be expressed in terms of multiple objectives, usually with tradeoffs involved: one can only be improved at the expense of another. GAs are very good at solving such problems: in particular, their use of parallelism enables them to produce multiple equally good solutions to the same problem, possibly with one candidate solution optimizing one parameter and another candidate optimizing a different one (Haupt and Haupt 1998, p.17), and a human overseer can then select one of these candidates to use. If a particular solution to a multiobjective problem optimizes one parameter to a degree such that that parameter cannot be further improved without causing a corresponding decrease in the quality of some other parameter, that solution is called Pareto optimal or non-dominated (Coello 2000, p. 112).
Finally, one of the qualities of genetic algorithms which might at first appear to be a liability turns out to be one of their strengths: namely, GAs know nothing about the problems they are deployed to solve. Instead of using previously known domain-specific information to guide each step and making changes with a specific eye towards improvement, as human designers do, they are “blind watchmakers” (Dawkins 1996); they make random changes to their candidate solutions and then use the fitness function to determine whether those changes produce an improvement.The virtue of this technique is that it allows genetic algorithms to start out with an open mind, so to speak. Since its decisions are based on randomness, all possible search pathways are theoretically open to a GA; by contrast, any problem-solving strategy that relies on prior knowledge must inevitably begin by ruling out many pathways a priori, therefore missing any novel solutions that may exist there (Koza et al. 1999, p. 547). Lacking preconceptions based on established beliefs of “how things should be done” or what “couldn’t possibly work”, GAs do not have this problem. Similarly, any technique that relies on prior knowledge will break down when such knowledge is not available, but again, GAs are not adversely affected by ignorance (Goldberg 1989, p. 23). Through their components of parallelism, crossover and mutation, they can range widely over the fitness landscape, exploring regions which intelligently produced algorithms might have overlooked, and potentially uncovering solutions of startling and unexpected creativity that might never have occurred to human designers. One vivid illustration of this is the rediscovery, by genetic programming, of the concept of negative feedback – a principle crucial to many important electronic components today, but one that, when it was first discovered, was denied a patent for nine years because the concept was so contrary to established beliefs (Koza et al. 2003, p. 413). Evolutionary algorithms, of course, are neither aware nor concerned whether a solution runs counter to established beliefs – only whether it works.
“The first, and most important, consideration in creating a genetic algorithm is defining a representation for the problem. The language used to specify candidate solutions must be robust; i.e., it must be able to tolerate random changes such that fatal errors or nonsense do not consistently result.There are two main ways of achieving this. The first, which is used by most genetic algorithms, is to define individuals as lists of numbers – binary-valued, integer-valued, or real-valued – where each number represents some aspect of a candidate solution. If the individuals are binary strings, 0 or 1 could stand for the absence or presence of a given feature. If they are lists of numbers, these numbers could represent many different things: the weights of the links in a neural network, the order of the cities visited in a given tour, the spatial placement of electronic components, the values fed into a controller, the torsion angles of peptide bonds in a protein, and so on. Mutation then entails changing these numbers, flipping bits or adding or subtracting random values. In this case, the actual program code does not change; the code is what manages the simulation and keeps track of the individuals, evaluating their fitness and perhaps ensuring that only values realistic and possible for the given problem result.In another method, genetic programming, the actual program code does change. As discussed in the section Methods of representation, GP represents individuals as executable trees of code that can be mutated by changing or swapping subtrees. Both of these methods produce representations that are robust against mutation and can represent many different kinds of problems, and as discussed in the section Some specific examples, both have had considerable success.This issue of representing candidate solutions in a robust way does not arise in nature, because the method of representation used by evolution, namely the genetic code, is inherently robust: with only a very few exceptions, such as a string of stop codons, there is no such thing as a sequence of DNA bases that cannot be translated into a protein. Therefore, virtually any change to an individual’s genes will still produce an intelligible result, and so mutations in evolution have a higher chance of producing an improvement. This is in contrast to human-created languages such as English, where the number of meaningful words is small compared to the total number of ways one can combine letters of the alphabet, and therefore random changes to an English sentence are likely to produce nonsense.
The problem of how to write the fitness function must be carefully considered so that higher fitness is attainable and actually does equate to a better solution for the given problem. If the fitness function is chosen poorly or defined imprecisely, the genetic algorithm may be unable to find a solution to the problem, or may end up solving the wrong problem. (This latter situation is sometimes described as the tendency of a GA to “cheat”, although in reality all that is happening is that the GA is doing what it was told to do, not what its creators intended it to do.) An example of this can be found in Graham-Rowe 2002, in which researchers used an evolutionary algorithm in conjunction with a reprogrammable hardware array, setting up the fitness function to reward the evolving circuit for outputting an oscillating signal. At the end of the experiment, an oscillating signal was indeed being produced – but instead of the circuit itself acting as an oscillator, as the researchers had intended, they discovered that it had become a radio receiver that was picking up and relaying an oscillating signal from a nearby piece of electronic equipment!This is not a problem in nature, however. In the laboratory of biological evolution there is only one fitness function, which is the same for all living things – the drive to survive and reproduce, no matter what adaptations make this possible. Those organisms which reproduce more abundantly compared to their competitors are more fit; those which fail to reproduce are unfit.
In addition to making a good choice of fitness function, the other parameters of a GA – the size of the population, the rate of mutation and crossover, the type and strength of selection – must be also chosen with care. If the population size is too small, the genetic algorithm may not explore enough of the solution space to consistently find good solutions. If the rate of genetic change is too high or the selection scheme is chosen poorly, beneficial schema may be disrupted and the population may enter error catastrophe, changing too fast for selection to ever bring about convergence.Living things do face similar difficulties, and evolution has dealt with them. It is true that if a population size falls too low, mutation rates are too high, or the selection pressure is too strong (such a situation might be caused by drastic environmental change), then the species may go extinct. The solution has been “the evolution of evolvability” – adaptations that alter a species’ ability to adapt. For example, most living things have evolved elaborate molecular machinery that checks for and corrects errors during the process of DNA replication, keeping their mutation rate down to acceptably low levels; conversely, in times of severe environmental stress, some bacterial species enter a state of hypermutation where the rate of DNA replication errors rises sharply, increasing the chance that a compensating mutation will be discovered. Of course, not all catastrophes can be evaded, but the enormous diversity and highly complex adaptations of living things today show that, in general, evolution is a successful strategy. Likewise, the diverse applications of and impressive results produced by genetic algorithms show them to be a powerful and worthwhile field of study.
One type of problem that genetic algorithms have difficulty dealing with are problems with “deceptive” fitness functions (Mitchell 1996, p.125), those where the locations of improved points give misleading information about where the global optimum is likely to be found. For example, imagine a problem where the search space consisted of all eight-character binary strings, and the fitness of an individual was directly proportional to the number of 1s in it – i.e., 00000001 would be less fit than 00000011, which would be less fit than 00000111, and so on – with two exceptions: the string 11111111 turned out to have very low fitness, and the string 00000000 turned out to have very high fitness. In such a problem, a GA (as well as most other algorithms) would be no more likely to find the global optimum than random search.The resolution to this problem is the same for both genetic algorithms and biological evolution: evolution is not a process that has to find the single global optimum every time. It can do almost as well by reaching the top of a high local optimum, and for most situations, this will suffice, even if the global optimum cannot easily be reached from that point. Evolution is very much a “satisficer” – an algorithm that delivers a “good enough” solution, though not necessarily the best possible solution, given a reasonable amount of time and effort invested in the search. The Evidence for Jury-Rigged Design in Nature FAQ gives examples of this very outcome appearing in nature. (It is also worth noting that few, if any, real-world problems are as fully deceptive as the somewhat contrived example given above. Usually, the location of local improvements gives at least some information about the location of the global optimum.)
One well-known problem that can occur with a GA is known as premature convergence. If an individual that is more fit than most of its competitors emerges early on in the course of the run, it may reproduce so abundantly that it drives down the population’s diversity too soon, leading the algorithm to converge on the local optimum that that individual represents rather than searching the fitness landscape thoroughly enough to find the global optimum (Forrest 1993, p. 876; Mitchell 1996, p. 167). This is an especially common problem in small populations, where even chance variations in reproduction rate may cause one genotype to become dominant over others.The most common methods implemented by GA researchers to deal with this problem all involve controlling the strength of selection, so as not to give excessively fit individuals too great of an advantage. Rank, scaling and tournament selection, discussed earlier, are three major means for accomplishing this; some methods of scaling selection include sigma scaling, in which reproduction is based on a statistical comparison to the population’s average fitness, and Boltzmann selection, in which the strength of selection increases over the course of a run in a manner similar to the “temperature” variable in simulated annealing (Mitchell 1996, p. 168).Premature convergence does occur in nature (where it is called genetic drift by biologists). This should not be surprising; as discussed above, evolution as a problem-solving strategy is under no obligation to find the single best solution, merely one that is good enough. However, premature convergence in nature is less common since most beneficial mutations in living things produce only small, incremental fitness improvements; mutations that produce such a large fitness gain as to give their possessors dramatic reproductive advantage are rare.
Finally, several researchers (Holland 1992, p.72; Forrest 1993, p.875; Haupt and Haupt 1998, p.18) advise against using genetic algorithms on analytically solvable problems. It is not that genetic algorithms cannot find good solutions to such problems; it is merely that traditional analytic methods take much less time and computational effort than GAs and, unlike GAs, are usually mathematically guaranteed to deliver the one exact solution. Of course, since there is no such thing as a mathematically perfect solution to any problem of biological adaptation, this issue does not arise in nature.
Genetic algorithm operators, selectors, ranking and encoding
Here you will find a brief explanation of some of the main operators, selectors, ranking and encoding in Genetic Algorithms. You will also find some code samples where I have some to provide. These code samples are in C#.
Selection
Here is a good quote from the book (Buckland, 2002):
“Selection is how you choose individuals from the population to provide a gene base from which the next generation of individuals is created. This might mean individuals are selected and placed into the new generation without modification ala elitism, as we discussed in the last chapter, but usually it means the chosen genomes are selected to be parents of offspring which are created through the processes of mutation and recombination. How you go about choosing the parents can play a very important role in how efficient your genetic algorithm is. Unlike choosing a soccer team, if you choose the fittest individuals all the time, the population may converge too rapidly at a local minima and get stuck there. But, if you select individuals at random, then your genetic algorithm will probably take a while to converge (if it ever does at all). So, the art of selection is choosing a strategy which gives you the best of both worlds—something that converges fairly quickly yet enables the population to retain its diversity.”
Steady State Selection
“Steady state selection works a little like elitism, except that instead of choosing a
small amount of the best individuals to go through to the new generation, steady
state selection retains all but a few of the worst performers from the current population.
The remainder are then selected using mutation and crossover in the usual
way. Steady state selection can prove useful when tackling some problems, but most
of the time it’s inadvisable to use it.
” (Buckland, 2002)
Sample code
Fitness Proportionate Selection
“Selection techniques of this type choose offspring using methods which give individuals
a better chance of being selected the better their fitness score. Another way
of describing it is that each individual has an expected number of times it will be
chosen to reproduce. This expected value equates to the individual’s fitness divided
by the average fitness of the entire population. So, if you have an individual with a
fitness of 6 and the average fitness of the overall population is 4, then the expected
number of times the individual should be chosen is 1.5.” (Buckland, 2002)
Sample code
Elitism
” elitism is a way of guaranteeing that the fittest members of a population are retained for the next generation. … select n copies of the top m individuals of the population to be retained. I often find that retaining about 2-5% of the population size gives me good results. … you will discover that using elitism works well with just about every other technique described in this chapter (except stochastic universal sampling) ” (Buckland, 2002)
” Elitism helps the population to converge on a solution faster than if it is not present. It is easy to code.A typical figure for N best to be added to the population is around 1 – 10 % of the population size. Can be as high as 20 %. Too much elitism can cause the population to converge too quickly.” AI Game Programming Wisdom 2
Sample code
public void GrabNBest(int NBest, int numCopies, ref List&amp;lt;Host&amp;gt; source, ref List&amp;lt;Host&amp;gt; target)
{
var hostToGrab = source.OrderByDescending(o =&amp;gt; o.DNA.Fitness).Take(NBest);
foreach (Host host in hostToGrab)
for (int x = 0; x &amp;lt; numCopies; ++x)
target.Add(host);
}
Roulette Wheel Selection
“A common way of implementing fitness proportionate selection is roulette wheel selection, as I have already discussed. This technique does have its drawbacks, however. Because roulette wheel selection is based on using random numbers and because the population sizes of genetic algorithms are typically small (sizes between 50 and 200 are common), the number of children allocated to each individual can be far from its expected value. Even worse, it’s probable that roulette wheel selection could miss the best individuals altogether! This is one of the reasons elitism is a good idea when utilizing roulette wheel selection—it ensures you never lose the best individuals to chance.” (Buckland, 2002)
Sample code
public Host RouletteWheelSelection()
{
double fSlice = (float)RandomProvider.GetRandomNumber(RandomProvider.RND,0 ,1) * TotalFitnessScore;
double cfTotal = 0;
int SelectedGenome = 0;
for (int i = 0; i &amp;lt; PopSize; ++i) { cfTotal += this.Hosts[i].DNA.Fitness; if (cfTotal &amp;gt; fSlice)
{
SelectedGenome = i;
break;
}
}
return this.Hosts[SelectedGenome];
}
Stochastic Universal Sampling
“Stochastic Universal Sampling (SUS for short) is an attempt to minimize the problems of using fitness proportionate selection on small populations. Basically, instead of having one wheel which is spun several times to obtain the new population, SUS uses n evenly spaced hands, which are only spun once.
If you use SUS in your own genetic algorithms, it is inadvisable to use elitism with it because this tends to mess up the algorithm.” (Buckland, 2002)
“Stochastic universal sampling (SUS) is a technique used in genetic algorithms for selecting potentially useful solutions for recombination. It was introduced by James Baker.[1]
SUS is a development of fitness proportionate selection (FPS) which exhibits no bias and minimal spread. Where FPS chooses several solutions from the population by repeated random sampling, SUS uses a single random value to sample all of the solutions by choosing them at evenly spaced intervals. This gives weaker members of the population (according to their fitness) a chance to be chosen and thus reduces the unfair nature of fitness-proportional selection methods.
Other methods like roulette wheel can have bad performance when a member of the population has a really large fitness in comparison with other members. Using a comb-like ruler, SUS starts from a small random number, and chooses the next candidates from the rest of population remaining, not allowing the fittest members to saturate the candidate space.
Described as an algorithm, pseudocode for SUS looks like:
SUS(Population, N) F := total fitness of Population N := number of offspring to keep P := distance between the pointers (F/N) Start := random number between 0 and P Pointers := [Start + i*P | i in [0..(N-1)]]
return RWS(Population,Pointers)
RWS(Population, Points) Keep = [] i := 0 for P in Points while fitness sum of Population[0..i] < P i++
add Population[i] to Keep
return Keep
Where Population[0..i] is the set of individuals with array-index 0 to (and including) i.
Here RWS() describes the bulk of fitness proportionate selection (also known as “roulette wheel selection”) – in true fitness proportional selection the parameter Points is always a (sorted) list of random numbers from 0 to F. The algorithm above is intended to be illustrative rather than canonical.
” (Wikipedia, 2015)
Sample code
public void SUSSelection(ref List&amp;lt;Host&amp;gt; source, ref List&amp;lt;Host&amp;gt; target)
{
//this algorithm relies on all the fitness scores to be positive so
//these few lines check and adjust accordingly (in this example
//Sigma scaling can give negative fitness scores
if (WorstFitnessScore &amp;lt; 0)
{
//recalculate
for (int gen = 0; gen &amp;lt; source.Count; ++gen)
{
source[gen].DNA.Fitness += Mathf.Abs(WorstFitnessScore);
}
CalculateBestWorstAverageTotalFitnessScore(ref source);
}
int curGen = 0;
double sum = 0;
//NumToAdd is the amount of individuals we need to select using SUS.
//Remember, some may have already been selected through elitism
int NumToAdd = this.PopSize - target.Count;
//calculate the hand spacing
float PointerGap = this.TotalFitnessScore / (float)NumToAdd;
float ptr = UnityEngine.Random.Range(0, 1) * PointerGap;
while (target.Count &amp;lt; NumToAdd) { for (sum += source[curGen].DNA.Fitness; sum &amp;gt; ptr; ptr += PointerGap)
{
target.Add(source[curGen]);
if (target.Count == NumToAdd)
{
return;
}
}
++curGen;
}
}
Tournament Selection
“This technique is very efficient to implement because it doesn’t require any of the preprocessing or fitness scaling sometimes required for roulette wheel selection and other fitness proportionate techniques (discussed later in the chapter). Because of this, and because it’s a darn good technique anyway, you should always try this method of selection with your own genetic algorithms. The only drawback I’ve found is that tournament selection can lead to too quick convergence with some types of problems.” (Buckland, 2002)
“Tournament selection is a good alternative to fitness proportionate selection with or without scaling. This technique is very fast due to lack of complex calculations.
Chose a random number of individuals at random and them choosing the fittest among them. This is repeated until the next generation of individuals are generated. The higher the number of selected individuals the higher pressure. The lower the more diverse the population will be. Typical selection number is between 2-10%.” AI Game Programming Wisdom 2
Sample code
public Host TournamentSelection(int N)
{
double BestFitnessSoFar = 0;
int ChosenOne = 0;
//Select N members from the population at random testing against
//the best found so far
for (int i = 0; i &amp;lt; N; ++i) { int ThisTry = RandomProvider.RND.Next(0, this.PopSize - 2); if (this.Hosts[ThisTry].DNA.Fitness &amp;gt; BestFitnessSoFar)
{
ChosenOne = ThisTry;
BestFitnessSoFar = this.Hosts[ThisTry].DNA.Fitness;
}
}
//return the champion
return this.Hosts[ChosenOne];
}
Crossover
Crossover involves creating a child out of the genetic code of two parents.
Partially-Mapped Crossover
”PMX Crossover is a genetic algorithm operator. For some problems it offers better performance than most other crossover techniques. Basically, parent 1 donates a swath of genetic material and the corresponding swath from the other parent is sprinkled about in the child. Once that is done, the remaining alleles are copied direct from parent 2.
Randomly select a swath of alleles from parent 1 and copy them directly to the child. Note the indexes of the segment.
Looking in the same segment positions in parent 2, select each value that hasn’t already been copiedto the child.
For each of these values:
Note the index of this value in Parent 2. Locate the value, V, from parent 1 in this same position.
Locate this same value in parent 2.
If the index of this value in Parent 2 is part of the original swath, go to step i. using this value.
If the position isn’t part of the original swath, insert Step A’s value into the child in this position.
public void CrossoverPMX(ref List&amp;lt;Vector2&amp;gt; mum, ref List&amp;lt;Vector2&amp;gt; dad, ref List&amp;lt;Vector2&amp;gt; baby1, ref List&amp;lt;Vector2&amp;gt; baby2)
{
//just return dependent on the crossover rate or if the
//chromosomes are the same.
if (((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; CrossoverRate) || (mum == dad) || mum.Count &amp;lt;= 0 || dad.Count &amp;lt;= 0) { baby1 = mum; baby2 = dad; return; } baby1 = mum; baby2 = dad; int maxItemsCount = mum.Count; if (mum.Count &amp;gt; dad.Count)
maxItemsCount = dad.Count;
//first we choose a section of the chromosome
int begin = RandomProvider.RND.Next(0, maxItemsCount - 2);
int end = begin;
//find an end
while (end &amp;lt;= begin)
{
end = RandomProvider.RND.Next(0, maxItemsCount - 1);
}
//now we iterate through the matched pairs of genes from beg
//to end swapping the places in each child
for (int pos = begin; pos &amp;lt; end + 1; ++pos) { //these are the genes we want to swap Vector2 gene1 = mum[pos]; Vector2 gene2 = dad[pos]; if (gene1 != gene2) { //find and swap them in baby1 int posGene1 = baby1.IndexOf(gene1); int posGene2 = baby1.IndexOf(gene2); if (posGene1 &amp;gt;= 0)
baby1[posGene1] = gene2;
if (posGene2 &amp;gt;= 0)
baby2[posGene2] = gene1;
//and in baby2
posGene1 = baby2.IndexOf(gene1);
posGene2 = baby2.IndexOf(gene2);
if (posGene1 &amp;gt;= 0)
baby2[posGene1] = gene2;
if (posGene2 &amp;gt;= 0)
baby2[posGene2] = gene1;
}
}//next pair
}
Order-Based Crossover
“To perform order-based crossover, several elements are chosen at random from one parent and then the order of those elements is imposed on the respective elements in the other parent.” (Buckland, 2002)(With slight text alternation from book example to a more general description).
“Order 1 Crossover
Order 1 Crossover is a fairly simple permutation crossover. Basically, a swath of consecutive alleles from parent 1 drops down, and remaining values are placed in the child in the order which they appear in parent 2.
Step 1: Select a random swath of consecutive alleles from parent 1. (underlined)
Step 2: Drop the swath down to Child 1 and mark out these alleles in Parent 2.
Step 3: Starting on the right side of the swath, grab alleles from parent 2 and insert them in Child 1 at the right edge of the swath. Since 8 is in that position in Parent 2, it is inserted into Child 1 first at the right edge of the swath. Notice that alleles 1, 2 and 3 are skipped because they are marked out and 4 is inserted into the 2nd spot in Child 1.
Step 4: If you desire a second child from the two parents, flip Parent 1 and Parent 2 and go back to Step 1.
Order 1 Performance Order 1 crossover is perhaps the fastest of all crossover operators because it requires virtually no overhead operations. On a generation by generation basis, edge recombination typically outperforms Order 1, but the fact that Order 1 runs between 100 and 1000 times faster usually allows the processing of more generations in a given time period.
public void CrossoverOBX(ref List&amp;lt;Vector2&amp;gt; mum, ref List&amp;lt;Vector2&amp;gt; dad, ref List&amp;lt;Vector2&amp;gt; baby1, ref List&amp;lt;Vector2&amp;gt; baby2)
{
//just return dependent on the crossover rate or if the
//chromosomes are the same.
if (((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; CrossoverRate) || (mum == dad) || mum.Count &amp;lt;= 0 || dad.Count &amp;lt;= 0)
{
baby1 = mum;
baby2 = dad;
return;
}
baby1 = mum;
baby2 = dad;
List&amp;lt;Vector2&amp;gt; tempGenes = new List&amp;lt;Vector2&amp;gt;();
List&amp;lt;int&amp;gt; positions = new List&amp;lt;int&amp;gt;();
//first chosen city position
int Pos = RandomProvider.RND.Next(0, mum.Count - 2);
//keep adding random cities until we can add no more
//record the positions as we go
while (Pos &amp;lt; mum.Count)
{
positions.Add(Pos);
tempGenes.Add(mum[Pos]);
// next gene
Pos += RandomProvider.RND.Next(1, mum.Count - Pos);
}
//so now we have n amount of cities from mum in the tempCities
//vector we can impose their order in dad.
int cPos = 0;
for (int cit = 0; cit &amp;lt; baby2.Count; ++cit)
{
for (int i = 0; i &amp;lt; tempGenes.Count; ++i)
{
if (baby2[cit] == tempGenes[i])
{
baby2[cit] = tempGenes[cPos];
++cPos;
break;
}
}
}
//now vice versa. Choose the same positioned cities from dad and impose
//their order in mum
tempGenes.Clear();
cPos = 0;
//first grab the cities from the same positions in dad
for (int i = 0; i &amp;lt; positions.Count; ++i)
{
tempGenes.Add(dad[positions[i]]);
}
//and impose their order in mum
for (int cit = 0; cit &amp;lt; baby1.Count; ++cit)
{
for (int i = 0; i &amp;lt; tempGenes.Count; ++i)
{
if (baby1[cit] == tempGenes[i])
{
baby1[cit] = tempGenes[cPos];
++cPos;
break;
}
}
}
}
Position-Based Crossover
” This is similar to Order-Based Crossover, but instead of imposing the order of the elements, this operator imposes the position. ” (Buckland, 2002)
Basically this mean that we swap positions between the parents when passing the elements down to the offspring’s.
Sample code
public void CrossoverPBX(ref List&amp;lt;Vector2&amp;gt; mum, ref List&amp;lt;Vector2&amp;gt; dad, ref List&amp;lt;Vector2&amp;gt; baby1, ref List&amp;lt;Vector2&amp;gt; baby2)
{
//just return dependent on the crossover rate or if the
//chromosomes are the same.
if (((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; CrossoverRate) || (mum == dad) || mum.Count &amp;lt;= 0 || dad.Count &amp;lt;= 0)
{
baby1 = mum;
baby2 = dad;
return;
}
//initialize the babies with minus values so we can tell which positions
//have been filled later in the algorith
for (int x = 0; x &amp;lt; mum.Count; ++x)
{
baby1.Add(Vector2.zero);
baby2.Add(Vector2.zero);
}
int l = baby2.Count;
//holds the positions of the chosen elements
List&amp;lt;int&amp;gt; positions = new List&amp;lt;int&amp;gt;();
//first chosen city position
int Pos = RandomProvider.RND.Next(0, mum.Count - 2);
//keep adding random cities until we can add no more
//record the positions as we go
while (Pos &amp;lt; mum.Count)
{
positions.Add(Pos);
// next gene
Pos += RandomProvider.RND.Next(1, mum.Count - Pos);
}
//now we have chosen some cities it's time to copy the selected cities
//over into the offspring in the same position.
for (int pos = 0; pos &amp;lt; positions.Count; ++pos)
{
//baby1 receives from mum
baby1[positions[pos]] = mum[positions[pos]];
//baby2 receives from dad
baby2[positions[pos]] = dad[positions[pos]];
}
//fill in the blanks. First create two position markers so we know
//whereabouts we are in baby1 and baby2
int c1 = 0, c2 = 0;
for (int pos = 0; pos &amp;lt; mum.Count; ++pos)
{
//advance position marker until we reach a free position
//in baby2
while ((baby2[c2] != Vector2.zero) &amp;amp;&amp;amp; (c2 &amp;lt; mum.Count -1)) { ++c2; } Vector2 tempElement = mum[pos]; //baby2 gets the next city from mum which is not already //present if (!baby2.Exists(o =&amp;gt; o == tempElement))
{
baby2[c2] = mum[pos];
}
//now do the same for baby1
while ((baby1[c1] != Vector2.zero) &amp;amp;&amp;amp; (c1 &amp;lt; mum.Count -1)) { ++c1; } tempElement = dad[pos]; //baby1 gets the next city from dad which is not already //present if (!baby1.Exists( o =&amp;gt; o == tempElement))
{
baby1[c1] = dad[pos];
}
}
}
Single-Point Crossover
“It simply cuts the genome at some random point and then switches the ends between parents. It is very easy and quick to implement and is generally effective to some degree with most types of problems.” (Buckland, 2002)
Sample code
public void Crossover(ref List&amp;lt;Vector2&amp;gt; mum, ref List&amp;lt;Vector2&amp;gt; dad, ref List&amp;lt;Vector2&amp;gt; baby1, ref List&amp;lt;Vector2&amp;gt; baby2)
{
if (((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; CrossoverRate) || (mum == dad) || mum.Count &amp;lt;= 0 || dad.Count &amp;lt;= 0)
{
baby1 = mum;
baby2 = dad;
return;
}
int cp = RandomProvider.RND.Next(0, this.GeneLength - 1);
for (int i = 0; i &amp;lt; cp; i++)
{
baby1.Add(mum[i]);
baby2.Add(dad[i]);
}
for (int i = cp; i &amp;lt; mum.Count; i++)
{
baby1.Add(dad[i]);
baby2.Add(mum[i]);
}
}
Two-Point Crossover
“Instead of cutting the genome at just one point, two-point crossover (you guessed it) cuts the genome at two random points and then swaps the block of genes between those two points. … Two-point crossover is sometimes beneficial because it can create combinations of genes that single-point crossover simply cannot provide. With single point, the end genes are always swapped over and this may not be favorable for the problem at hand. Two-point crossover eliminates this problem. ” (Buckland, 2002)
“There’s no need to limit the amount of crossover points you can have. Indeed, for some types of encoding, your genetic algorithm may perform better if you use multiple crossover points. The easiest way of achieving this is to move down the length of the parents, and for each position in the chromosome, randomly swap the genes based on your crossover rate. For some types of problems, multi-point crossover works very well, but on others it can jumble up the genes too much and act more like an over enthusiastic mutation operator. Common values for the crossover rate using this type of crossover operator are between 0.5 and 0.8.” (Buckland, 2002)
Sample code
public void CrossoverMultiPoint(ref List&amp;lt;Vector2&amp;gt; mum, ref List&amp;lt;Vector2&amp;gt; dad, ref List&amp;lt;Vector2&amp;gt; baby1, ref List&amp;lt;Vector2&amp;gt; baby2)
{
if ((mum == dad) || mum.Count &amp;lt;= 0 || dad.Count &amp;lt;= 0)
{
baby1 = mum;
baby2 = dad;
return;
}
//iterate down the length of the genomes swapping genes
//depending on the crossover rate
for (int gen = 0; gen &amp;lt; mum.Count; ++gen)
{
if (((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;lt; CrossoverRate))
{
//swap the genes
baby2.Add(mum[gen]);
baby1.Add(dad[gen]);
}
else
{
//don't swap the genes
baby1.Add(mum[gen]);
baby2.Add(dad[gen]);
}
}
}
Alternative crossover methods
Alternating-Position Crossover
Maximal-Preservation Crossover
Edge-Recombination Crossover
Subtour-Chunks Crossover
Intersection Crossover
Mutation
Scramble Mutation (SM)
Choose two random points and “scramble” the elements located between them.
Sample code
public void MutateSM(ref List&amp;lt;Vector2&amp;gt; genes)
{
//return dependent upon mutation rate
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; MutationRate) return;
//first we choose a section of the chromosome
const int MinSpanSize = 3;
//these will hold the beginning and end points of the span
int beg = 0, end = 0;
ChooseSection(ref beg, ref end, genes.Count - 1, MinSpanSize);
int span = end - beg;
//now we just swap randomly chosen genes with the beg/end
//range a few times to scramble them
int NumberOfSwapsRqd = span;
while (--NumberOfSwapsRqd &amp;gt;= 0)
{
//choose two loci within the range
int pos1 = beg + RandomProvider.RND.Next(0, span);
int pos2 = beg + RandomProvider.RND.Next(0, span);
//exchange them
if (pos1 != pos2)
{
Vector2 temp = genes[pos1];
genes[pos1] = genes[pos2];
genes[pos2] = temp;
}
}//repeat
}
Exchange Mutation Operator
The Exchange Mutation operator does this by choosing two genes in a chromosome and swapping them.
Sample code
public void MutateEM(ref List&amp;lt;Vector2&amp;gt; genes)
{
//return dependent upon mutation rate
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; MutationRate) return;
//choose first gene
int pos1 = RandomProvider.RND.Next(0, genes.Count - 1);
//choose second
int pos2 = pos1;
while (pos1 == pos2)
{
pos2 = RandomProvider.RND.Next(0, genes.Count - 1);
}
Vector2 temp = genes[pos1];
genes[pos1] = genes[pos2];
genes[pos2] = temp;
}
Displacement Mutation
Select two random points, grab the chunk of chromosome between them, and then reinsert at a random position displaced from the original.
Sample code
public void MutateDM(ref List&amp;lt;Vector2&amp;gt; genes)
{
//return dependent upon mutation rate
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; MutationRate) return;
//first we choose a section of the chromosome
const int MinSpanSize = 3;
//these will hold the beginning and end points of the span
int beg = 0, end = 0;
ChooseSection(ref beg, ref end, genes.Count - 1, MinSpanSize);
// Calculate how many items to get and later remove
int span = end - beg;
//hold on to the section we are moving
List&amp;lt;Vector2&amp;gt; aSectionOfGenes = genes.GetRange(beg, span);
//erase from current position
genes.RemoveRange(beg, span);
int newInsertPosition = RandomProvider.RND.Next(0, genes.Count - 1);
genes.InsertRange(newInsertPosition, aSectionOfGenes);
}
“Normal mutation”(Don’t know the proper name for this one 🙂 )
Goes through every genes in a chromosome and based on the mutation rate a new gene data is generated at that position.
Sample code
public void Mutate(ref List&amp;lt;Vector2&amp;gt; genes)
{
for (int i = 0; i &amp;lt; genes.Count; i++)
{
//flip this bit?
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;lt; MutationRate)
{
float angle = UnityEngine.Random.Range(0, AIConstants.TWO_PI);
genes[i] = ((new Vector2(UnityEngine.Mathf.Cos(angle), UnityEngine.Mathf.Sin(angle))) * UnityEngine.Random.Range(0, AIConstants.maxforce));
}
}
}
Insertion Mutation
This is a very effective mutation and is almost the same as the DM operator, except here only one gene is selected to be displaced and inserted back into the chromosome. In tests, this mutation operator has been shown to be consistently better than any of the alternatives mentioned here.
Sample code
public void MutateIM(ref List&amp;lt;Vector2&amp;gt; genes)
{
//return dependent upon mutation rate
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; MutationRate) return;
int newInsertPosition = RandomProvider.RND.Next(0, genes.Count - 1);
//erase from current position
Vector2 gene = genes[newInsertPosition];
genes.RemoveAt(newInsertPosition);
newInsertPosition = RandomProvider.RND.Next(0, genes.Count - 1);
genes.Insert(newInsertPosition, gene);
}
Inversion Mutation
Select two random points and reverse the elements between them.
Sample code
public void MutateIVM(ref List&amp;lt;Vector2&amp;gt; genes)
{
//return dependent upon mutation rate
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; MutationRate) return;
//first we choose a section of the chromosome
const int MinSpanSize = 3;
//these will hold the beginning and end points of the span
int beg = 0, end = 0;
ChooseSection(ref beg, ref end, genes.Count - 1, MinSpanSize);
// Calculate how many items to reverse
int span = end - beg;
genes.Reverse(beg, span);
}
Displaced Inversion Mutation
Select two random points, reverse the element order between the two points, and then displace them somewhere along the length of the original chromosome. This is similar to performing IVM and then DM using the same start and end points.
Sample code
public void MutateDIVM(ref List&amp;lt;Vector2&amp;gt; genes)
{
//return dependent upon mutation rate
if ((float)RandomProvider.GetRandomNumber(RandomProvider.RND, 0, 1) &amp;gt; MutationRate) return;
//first we choose a section of the chromosome
const int MinSpanSize = 3;
//these will hold the beginning and end points of the span
int beg = 0, end = 0;
ChooseSection(ref beg, ref end, genes.Count - 1, MinSpanSize);
// Calculate how many items to reverse
int span = end - beg;
genes.Reverse(beg, span);
//hold on to the section we are moving
List&amp;lt;Vector2&amp;gt; aSectionOfGenes = genes.GetRange(beg, span);
//erase from current position
genes.RemoveRange(beg, span);
int newInsertPosition = RandomProvider.RND.Next(0, genes.Count - 1);
genes.InsertRange(newInsertPosition, aSectionOfGenes);
}
Encoding
Encoding is simply the way a problem is presented in such a way that the computer can solve the problem desired by someone.
Scaling Techniques
“Although using selection on the raw (unprocessed) fitness scores can give you a genetic algorithm that works (it solves the task you’ve designed it for), often your genetic algorithm can be made to perform better if the fitness scores are scaled in some way before any selection takes place. “ (Buckland, 2002)
Rank Scaling
“Rank scaling can be a great way to prevent too quick convergence, particularly at the start of a run when it’s common to see a very small percentage of individuals outperforming all the rest. The individuals in the population are simply ranked according to fitness, and then a new fitness score is assigned based on their rank. … Once the new ranked fitness scores have been applied, you select individuals for the next generation using roulette wheel selection or a similar fitness proportionate selection method. … This technique avoids the possibility that a large percentage of each new generation is being produced from a very small number of highly fit individuals, which can quickly lead to premature convergence. In effect, rank scaling ensures your population remains diverse. The other side of the coin is that the population may take a lot longer to converge, but often you will find that the greater diversity provided by this technique leads to a more successful result for your genetic algorithm. “ (Buckland, 2002)
“A cheap and easy method of retaining population diversity, while slowing down convergence. The drawback due to low variance is that it might take a long type to converge upon a solution. Used with elitism is a good approach.” AI Game Programming Wisdom 2
Sample code
public void FitnessScaleRanking(ref List&amp;lt;Host&amp;gt; pop)
{
// Arrange the population according to the highest fitness score currently
var population = pop.OrderByDescending(o =&amp;gt; o.DNA.Fitness);
// The highest ranking value will be the max count of hosts in the population, while the minimum is for the least fittest memeber will have the fit score of one.
int populationSize = pop.Count;
foreach(Host host in population)
{
// Apply a new fittness score based on the raking value which is determined by the population size.
host.DNA.Fitness = populationSize;
// Go to the next ranking value for the next host
populationSize--;
}
}
Another example of ranking with the ranking value converted into a float value ranging from 0 to 1.
public void FitnessScaleRanking(ref List&amp;amp;lt;Host&amp;amp;gt; pop)
{
public void FitnessScaleRankingToFloatRangeZeroToOne(ref List<Host> pop)
{
// Arrange the population according to the highest fitness score currently
var population = pop.OrderByDescending(o => o.DNA.Fitness);
// The highest ranking value will be the max count of hosts in the population, while the minimum is for the least fittest memeber will have the fit score of one.
int populationSize = pop.Count;
foreach (Host host in population)
{
// Apply a new fittness score based on the raking value which is determined by the population size.
host.DNA.Fitness = Mathf.Abs((1 / (float)populationSize) - 1);
// Go to the next ranking value for the next host
populationSize--;
}
}
Sigma Scaling
“If you use raw fitness scores as a basis for selection, the population may converge too quickly, and if they are scaled as in rank selection, the population may converge too slowly. Sigma scaling is an attempt to keep the selection pressure constant over many generations. At the beginning of the genetic algorithm, when fitness scores can vary wildly, the fitter individuals will be allocated less expected offspring. Toward the end of the algorithm, when the fitness scores are becoming similar, the fitter individuals will be allocated more expected offspring.” (Buckland, 2002)
“Keeping selection pressure constant over many generations.”
AI Game Programming Wisdom 2
Sample code
public void FitnessScaleSigma(ref List&amp;lt;Host&amp;gt; pop)
{
float RunningTotal = 0;
//first iterate through the population to calculate the standard
//deviation
for (int gen = 0; gen &amp;lt; pop.Count; ++gen)
{
RunningTotal += (pop[gen].DNA.Fitness - this.AverageFitnessScore) *
(pop[gen].DNA.Fitness - this.AverageFitnessScore);
}
float variance = RunningTotal / (float)this.PopSize;
//standard deviation is the square root of the variance
Sigma = Mathf.Sqrt(variance);
//now iterate through the population to reassign the fitness scores
for (int gen = 0; gen &amp;lt; pop.Count; ++gen)
{
float OldFitness = pop[gen].DNA.Fitness;
pop[gen].DNA.Fitness = (OldFitness - this.AverageFitnessScore) /
(2 * Sigma);
}
//recalculate values used in selection
this.CalculateBestWorstAverageTotalFitnessScore(ref pop);
}
Boltzmann Scaling
”… sometimes you may want the selection pressure to vary. A common scenario is one in which you require the selection pressure to be low at the beginning so that diversity is retained, but as the genetic algorithm converges closer toward a solution, you want mainly the fitter individuals to produce offspring.
One way of achieving this is by using Boltzmann scaling. This method of scaling uses a continuously varying temperature to control the rate of selection. …
Each generation, the temperature is decreased by a small value, which has the effect of increasing the selection pressure toward the fitter individuals.” (Buckland, 2002)
“Sometimes, it’s preferable for selection pressure to be low at the begging and high toward the end. This ensures that your population remains diverse at the commencement of the algorithm. As the algorithm converges toward a solution, the fitter individuals are given preference.” AI Game Programming Wisdom 2
Sample code
public void FitnessScaleBoltzmann(ref List&amp;lt;Host&amp;gt; pop)
{
//reduce the temp a little each generation
this.BoltzmannTemperature -= BOLTZMANN_DT;
//make sure it doesn't fall below minimum value
if (this.BoltzmannTemperature &amp;lt; BOLTZMANN_MIN_TEMP)
{
this.BoltzmannTemperature = BOLTZMANN_MIN_TEMP;
}
//first calculate the average fitness/Temp
float divider = this.AverageFitnessScore / this.BoltzmannTemperature;
//now iterate through the population and calculate the new expected
//values
for (int gen = 0; gen &amp;lt; pop.Count; ++gen)
{
float OldFitness = pop[gen].DNA.Fitness;
pop[gen].DNA.Fitness = (OldFitness / this.BoltzmannTemperature) / divider;
}
//recalculate values used in selection
this.CalculateBestWorstAverageTotalFitnessScore(ref pop);
}
Optimization tips
Data handling
To minimize the impact of accessing data in memory and also memory management I would recommend that you get to know these two concepts:
I am currently working on a client side application (console app) that needs to connect to a O365 site and do some stuff with CSOM.
In this post I will write some basic things you need to do in order to achieve a connection to an O365 site. In my case I am working with a console app but this can be a powershell app also.
At first I recommend you look at this piece of code which you need to use to achieve the communication and authentication with O365:
A basic C# code to connecto O365 and do get a list and its items would look like this:
Uri siteUri = new Uri(ConfigurationManager.AppSettings["SiteCollectionRequests_SiteUrl"]);
//Get the realm for the URL
string realm = TokenHelper.GetRealmFromTargetUrl(siteUri);
//Get the access token for the URL.
//Requires this app to be registered with the tenant
string accessToken = TokenHelper.GetAppOnlyAccessToken(TokenHelper.SharePointPrincipal, siteUri.Authority, realm).AccessToken;
//Get client context with access token
using (var ctx = TokenHelper.GetClientContextWithAccessToken(siteUri.ToString(), accessToken))
{
// Set the time out as high as possible
ctx.RequestTimeout = int.MaxValue;
List list = ctx.Web.Lists.GetByTitle(ConfigurationManager.AppSettings["SiteCollectionRequests_List"]);
CamlQuery camlQuery = new CamlQuery();
camlQuery.ViewXml = "your CAML query here";
ListItemCollection listItems = list.GetItems(camlQuery);
ctx.Load(listItems);
ctx.ExecuteQuery();
var itemsCount = listItems.Count;
So before you can use your code you need to do two things in your target site:
Register you client side app. This basically means that in your app config you need to set a clientID and a client secret. Without these values no proper authentication and authorization can done.
Register you app in O365 by using the following URL and replacing the hostname and adding your target site: http://<SharePointWebsite>/_layouts/15/AppRegNew.aspx
After you register the app you need to need to specify in a target site what kind of privileges the app has. In the code sample above you would need to specify at least read rights.
To provide privileges the following url: http://<SharePointWebsite>/_layouts/15/AppInv.aspx
Notice that for this step you need to provide a XML describing the privileges request. The simples way for me was to start up Visual Studio, create an app and define the rights request through the GUI and copy & pasting the XML from the AppMenifest.xml. It would look something like this:
Also the last thing which you need to check is to have a proper configurations in the app.config(if you are using an console app):
<appSettings>
<add key="ClientId" value="client id obtained after registration" />
<add key="ClientSecret" value="client secrect after registration" />
<add key="SiteCollectionRequests_SiteUrl" value="yoursiteurl"/>
<add key="SiteCollectionRequests_List" value="listname" />
</appSettings>
Also notice that the ClientId and the ClientSecret have to be provided in the app.config for the TokenHelper.cs class to work. The class will search for these settings values automatically.
This is a sample how you can enable the enterprise keywords field for a library. The funny part is that in SharePoint2013 just adding the field did not work properly. What I had to do was to use ILSpy to get an idea what the SharePoint code is doing. I found out that for some reason(in this case atleast) the exterprise keyword field need to exist so that the list content type is altered to use the enterprise keyword field. Only then I could use the field in the library properly. Don’t know why :), didn’t find out :).
public static string FeatureProperty_ListsToApplyLogic = “ListsToApplyLogic”; private System.Guid guidKeywords = new System.Guid(“{23F27201-BEE3-471e-B2E7-B64FD8B7CA38}”);
public override void FeatureActivated(SPFeatureReceiverProperties properties) { SPWeb web = properties.Feature.Parent as SPWeb; if (web == null) return; String listsToApplyLogic = null; if(properties.Feature.Properties[FeatureProperty_ListsToApplyLogic] != null) listsToApplyLogic = properties.Feature.Properties[FeatureProperty_ListsToApplyLogic].Value as String;
if (String.IsNullOrEmpty(listsToApplyLogic)) return;