Singly Linked List

July 8, 2017

A Linked List is a fundamental data structure that allows us to easily add new items at any point in the list. Like an array, a linked list is so fundamental that it’s often used internally to hold state for other large data structures. For example, a hash table can be implemented using a linked list internally to store its’ entries.

All various forms of a linked list make use of nodes, which we can simply define as:

public class LinkNode<TElement>
{
    public TElement Element { get; set;}
    public LinkNode<TElement> Next { get; set; }
}

Nodes for all the various forms of a linked list share the Element property. What differs is the pointers, such as Next. The simplest of linked lists, the Singly Linked List, keeps a single pointer per node that points to the next node in the list.

At the minimum, a reference to the head of the linked list must be kept since a linked list is composed of reference type objects. It’s recommended to keep a reference to the tail as well such that we can easily add a new item to the very end of the list. Without a reference to the tail, we would need to traverse the entire list just to add it.

Read More

Linear Probe Map (Open Addressing Scheme)

July 7, 2017

Open Addressing is another method to handle collisions. One disadvantage of using separate chaining is that it requires the use of an auxiliary data structure, which costs more space. Using open addressing, we store entries directly into a slot. This requires the load factor is always 1 or less, as we can’t store more entries than the number of slots that exist in the internal array.

The open addressing scheme is a general term that describes a collision handling approach that stores entries directly into the internal array. The simplest method is linear probing, which works as follows:

Setting an entry

  • a hash code is obtained, then compressed as usual
  • the entry is attempted to be stored at the compressed index
  • if the slot is available (null), store it there
  • if it’s taken, we look at the next slot. If available, store it. If not, go to the next. Rinse and repeat.

Getting an entry

  • get the compressed index of the key
  • walk through the internal array starting at the compressed index. If you find an entry with the key, it exists so return it. If you run into an empty slot first, it doesn’t exist.
Read More

Chain Hash Map (Separate Chaining Scheme)

July 6, 2017

Separate Chaining is a method of collision handling where entries are stored in a bucket. Rather than having a slot in the internal array store just a single entry, we can now store a variable amount. This handles collisions by allowing different hashed keys to be stored in the same slot, although we still want to minimize the number of entries in a bucket. Ideally we want to store, at most, a single entry per bucket.

The Load Factor

The load factor, a significantly important statistic for hash tables, is defined as a simple ratio:

n / k

where n is the number of entries, and k is the number of buckets. As the value of the load factor increases, so does the number of collisions, which ultimately results in slower run times for the map operations. Many experiments and studies have been done that suggest these load factors:

  • For Separate Chaining schemes, a load factor of 0.9 or lower
  • For Open Addressing schemes, a load factor of 0.5 or lower (explored in the next post)

What’s a bucket?

A bucket is simply an auxiliary data structure that is used to store multiple entries in that slot. Linked lists and simple maps are commonly used, but it could really be anything that can store multiple items.

Read More

Hash Tables, Hash Codes and Compression Functions

June 29, 2017

Hash tables are an implementation of a map data structure, and they happen to be one of the most efficient around. Whereas the unsorted map I implemented previously logs mostly linear run times for most of a map’s core operations, hash tables are able to reliably achieve constant run times by cleverly making use of the simple, yet fundamental, array data structure.

An array is always able to get an element in constant time if you know the index. This is because you’ll already have a reference to the array in memory, and so the index is used as an offset from the head to grab the value in a snap. So, in order to use an array to implement a hash table, we need to ensure that

  1. the key we use is an integer
  2. and that it fits within the bounds of the array

However, a map is generally implemented generically, such that any data type can be used as a key. This causes an issue with point 1 above if we need a map with a non-integer key.

Read More

Introduction to Maps and an implementation of an Unsorted Map

June 18, 2017

A map, also referred to as an associative array, is one of the most useful and commonly used data structures around. The items stored in a map are Entries, which simply store a key and a value (or in the context of a C# Dictionary, a KeyValuePair<TKey, TValue>).

The keys are required to be unique, such that a given key will always map to the same value. Unlike a standard array, which maps an integer index to some value, a map’s keys are not restricted to integers. You’ll find common use-cases of the keys being strings, Guids, and other data types. Consequently, a good implementation of a map will be generic, giving the programmer flexibility for the types they’d like to use in the key-value mappings.

Use-Cases and Examples

As mentioned, a key must be unique, so any real world scenario where something can be uniquely identified by some thing (a key) would benefit from a map data type if needed.

  • A social security number maps the number uniquely to a specific person
  • A car’s license plate number maps the alphanumeric key to a specific vehicle
  • A username or handle in a social network maps that account to a specific person
  • A birthdate does NOT map that particular date to a specific person, since surely more than one person has a birthdate on that day.

The simplest rule for when a map data type should be used is if you find yourself iterating through some array or list multiple times looking for something. When you’re searching an array for something based on a key (as opposed to its’ integer index), the operation runs in O(n) time because the thing you’re looking for may be at the very end and you have to walk through the entire array one at a time to find it.

Read More

Better Validation System in .NET Core, with examples using Vanilla Javascript and Angular

June 13, 2017

This post is a thorough guide into implementing a global error handling system in .NET for a web application. The frontend uses two examples using vanilla javascript and Angular, mainly to showcase how validation errors can be handled more elegantly using the binding capabilities of modern frontend libraries. While my example uses Angular, the concepts shown are very generic and can be applied to any other frontend libraries such as React, Vue, etc.

We’ll start be examining one of the most common forms of validations used in .NET, and why it becomes problematic as your application’s validation needs begin to grow. We’ll also look into why validations using Javascript is preferred, except for a few select scenarios.

Code samples are provided through a project in my github repo that can be downloaded, built, and run so you can easily follow along, or use as a base to start off your own app with a nice validation system built-in.

Read More

.NET Core Unit Testing (Complete guide with sample project and exercises/solutions)

June 11, 2017

This post is a thorough guide into starting unit tests in .NET, utilizing two popular libraries in Moq and xUnit.

Before I started doing software development professionally, writing unit tests was usually an afterthought or something I neglected while working on my personal projects. After working for a while at a job that necessitates writing unit tests, the benefits of unit test coverage has become undeniably clear.

Among many other reasons, the tests help catch regressions in your codebase (something that used to work before stopped working at some point after code has been changed), they provide confidence that refactoring or adding/changing code won’t break existing code. With that said, I can admit that for new and very tiny personal projects, the benefits aren’t as profound. You most likely will have the small codebase and product understood well enough, such that when you make a change you’ll also be conscious of other places that are affected.

Read More