[LINQ via C# series]
[LINQ to Objects in Depth series]
LINQ to Objects queries .NET objects in local memory of current application or service. Its data source and the queries are represented by IEnumerable
Local sequential LINQ query
LINQ to Objects is local, sequential query. Local means it queries data is instances of .NET types available in the memory of current .NET application or service, not remote data like rows in a data table managed by a separate database. In .NET, IEnumerable
Iterator pattern and foreach statement
If a sequence type is defined following the standard iterator pattern of object-oriented programming, the objects in the sequence can be pulled by C# foreach statement. Iterator pattern consists of a sequence (also called container of items, or aggregate of elements) and an iterator:
internal abstract class Sequence
{public abstract Iterator GetEnumerator(); // Must be public.}internal abstract class Iterator{public abstract bool MoveNext(); // Must be public.public abstract object Current { get; } // Must be public.}
And their generic version is:
internal abstract class GenericSequence<T>
{public abstract GenericIterator<T>GetEnumerator(); // Must be public.}internal abstract class GenericIterator<T>{public abstract bool MoveNext(); // Must be public.public abstract T Current { get; } // Must be public.}
The above types and members demonstrate the minimum requirements of iterator pattern for foreach statement. The sequence must have a GetEnumerator factory method to output an iterator (It can be virtually read as GetIterator). And iterator must have a MoveNext method to output a bool value to indicate whether there is a value that can be pulled. If the output is true, iterator’s Current property getter can be called to pull that value. Now the values in above non-generic and generic sequences can be access with C# foreach statement:
internal static void ForEach<T>(Sequence sequence, Action<T> process)
{foreach (T value in sequence){process(value);}}internal static void ForEach<T>(GenericSequence<T> sequence, Action<T> process){foreach (T value in sequence){process(value);}}
The foreach statement is declarative syntactic sugar. It is compiled to the following imperative control flow to get iterator from sequence, and poll iterator until there is no value available:
internal static void CompiledForEach<T>(Sequence sequence, Action<T> process)
{Iterator iterator = sequence.GetEnumerator();try{while (iterator.MoveNext()){T value = (T)iterator.Current;process(value);}}finally{(iterator as IDisposable)?.Dispose();}}internal static void CompiledForEach<T>(GenericSequence<T> sequence, Action<T> process){GenericIterator<T>iterator = sequence.GetEnumerator();try{while (iterator.MoveNext()){T value = iterator.Current;process(value);}}finally{(iterator as IDisposable)?.Dispose();}}
Apparently, the generic version of definition is preferred, because the non-generic iterator’s Current property outputs object type, which has to be explicitly cast to the expected type specified in the foreach statement and could be a chance of failure. To demonstrate the iterator pattern implementation, a sequence of values can be defined as a singly linked list, where each value is stored in a linked list node:
internal class LinkedListNode<T>
{internal LinkedListNode(T value, LinkedListNode<T> next = null) =>(this.Value, this.Next) = (value, next);public T Value { get; }public LinkedListNode<T> Next { get; }}
Then iterator can be implemented to traverse along the linked list nodes. When there is a next node, MoveNext outputs true, and Current can output the next value. Notice the iterator is stateful:
internal class LinkedListIterator<T> : GenericIterator<T>
{private LinkedListNode<T> node; // State.internal LinkedListIterator(LinkedListNode<T>head) =>this.node = new LinkedListNode<T>(default, head);public override bool MoveNext(){if (this.node.Next != null){this.node = this.node.Next; // State change.return true;}return false;}public override T Current => this.node.Value;}
Finally, the sequence can be simply implemented as an iterator factory:
internal class LinkedListSequence<T> : GenericSequence<T>
{private readonly LinkedListNode<T> head;internal LinkedListSequence(LinkedListNode<T> head) => this.head = head;public override GenericIterator<T> GetEnumerator() => new LinkedListIterator<T>(this.head);}
Now the values in the linked list sequence can be sequentially pulled with the foreach syntactic sugar, which is declarative and there is no need to specify the control flow or handle the state:
internal static void ForEach()
{LinkedListNode<int> node3 = new LinkedListNode<int>(3, null);LinkedListNode<int> node2 = new LinkedListNode<int>(2, node3);LinkedListNode<int> node1 = new LinkedListNode<int>(1, node2);LinkedListSequence<int> sequence = new LinkedListSequence<int>(node1);foreach (int value in sequence){value.WriteLine(); // 1 2 3}}
A general implementation of iterator pattern is discussed later in the LINQ to Objects query implementation chapter.
IEnumerable and IEnumerator interfaces
Initially, .NET Framework 1.0 provides IEnumerable and IEnumerator interfaces as the contract of iterator pattern:
namespace System.Collections
{public interface IEnumerable // Sequence.{IEnumerator GetEnumerator();}public interface IEnumerator // Iterator.{object Current { get; }bool MoveNext();void Reset(); // For COM interoperability.}}
They can be virtually read as iteratable sequence and iterator. .NET Framework’s sequence and collection types implement IEnumerable so that they can be used with foreach statement, like array, ArrayList, Queue, Stack, SortedList, etc. Then .NET Framework 2.0 was released with generics support. So IEnumerable
namespace System.Collections.Generic
{public interface IEnumerable<T> : IEnumerable // Sequence.{IEnumerator<T> GetEnumerator();}public interface IEnumerator<T> : IDisposable, IEnumerator // Iterator.{T Current { get; }}}
Since .NET Framework 2.0, sequence and collection types are usually provided as generic types, with IEnumerable
Later, .NET Framework 4.0 introduces covariance and contravariance for generic interface. As discussed in the covariance and contravariance chapter, T is covariant for both IEnumerable
namespace System.Collections.Generic
{public interface IEnumerable<out T> : IEnumerable // Sequence.{IEnumerator<T> GetEnumerator();}public interface IEnumerator<out T> : IDisposable, IEnumerator // Iterator.{T Current { get; }}}
This is how they are defined in .NET Standard.
foreach statement vs. for statement
Array is a special type. A concrete array T[] inherits System.Array type, which does not implement IEnumerable
namespace System
{public abstract class Array : ICollection, IEnumerable, IList, IStructuralComparable, IStructuralEquatable{}}
Instead, T[] directly implements IEnumerable
internal static void ForEach<T>(T[] array, Action<T> process)
{foreach (T value in array){process(value);}}
foreach statement with array is compiled into a for loop, accessing each value with index, which has better performance than the above control flow of getting iterator and polling its MoveNext method and Current property:
internal static void CompiledForEach<T>(T[] array, Action<T> process)
{for (int index = 0; index < array.Length; index++){T value = array[index];process(value);}}
And so is string. Since string is a sequence of characters, it implements IEnumerable
internal static void ForEach(string @string, Action<char> process)
{foreach (char value in @string){process(value);}}internal static void CompiledForEach(string @string, Action<char> action){for (int index = 0; index < @string.Length; index++){char value = @string[index];action(value);}}
LINQ to Objects queryable types
As discussed, most LINQ to Objects queries are extension methods for IEnumerable
· System.Collections.Generic.IEnumerable
o Microsoft.Collections.Immutable.IImmutableQueue
§ Microsoft.Collections.Immutable.ImmutableQueue
o Microsoft.Collections.Immutable.IImmutableStack
§ Microsoft.Collections.Immutable.ImmutableStack
o Microsoft.Collections.Immutable.IOrderedCollection
§ Microsoft.Collections.Immutable.ImmutableList
o System.Collections.Concurrent.IProducerConsumerCollection
§ System.Collections.Concurrent.ConcurrentBag
§ System.Collections.Concurrent.ConcurrentQueue
§ System.Collections.Concurrent.ConcurrentStack
o System.Collections.Concurrent.BlockingCollection
o System.Collections.Generic.ICollection
§ System.Collections.Generic.IDictionary<TKey, TValue>
§ System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>
§ System.Collections.Generic.Dictionary<TKey, TValue>
§ System.Collections.ObjectModel.ReadOnlyDictionary<TKey, TValue>
§ System.Dynamic.ExpandoObject
§ System.Collections.Generic.IList
§ System.ArraySegment
§ System.Collections.Generic.List
§ System.Collections.ObjectModel.Collection
§ System.Collections.ObjectModel.ObservableCollection
§ System.Collections.ObjectModel.KeyedCollection<TKey, TItem>
§ System.Collections.ObjectModel.ReadOnlyCollection
§ System.Collections.Generic.ISet
§ System.Collections.Generic.HashSet
§ System.Collections.Generic.SortedSet
o System.Collections.Generic.IReadOnlyCollection
§ System.Collections.Generic.IReadOnlyDictionary<TKey, TValue>
§ System.Collections.Generic.Dictionary<TKey, TValue>
§ System.Collections.ObjectModel.ReadOnlyDictionary<TKey, TValue>
§ Microsoft.Collections.Immutable.IImmutableDictionary<TKey, TValue>
§ Microsoft.Collections.Immutable.ImmutableDictionary<TKey, TValue>
§ Microsoft.Collections.Immutable.ImmutableSortedDictionary<TKey, TValue>
§ System.Collections.Generic.Dictionary<TKey, TValue>
§ System.Collections.ObjectModel.ReadOnlyDictionary<TKey, TValue>
§ System.Collections.Generic.IReadOnlyList
§ Microsoft.Collections.Immutable.IImmutableList
§ Microsoft.Collections.Immutable.ImmutableList
§ System.Collections.Generic.List
§ System.Collections.ObjectModel.Collection
§ System.Collections.ObjectModel.ReadOnlyCollection
§ Microsoft.Collections.Immutable.IImmutableSet
§ Microsoft.Collections.Immutable.IImmutableHashSet
§ Microsoft.Collections.Immutable.ImmutableHashSet
§ Microsoft.Collections.Immutable.IImmutableSortedSet
§ Microsoft.Collections.Immutable.ImmutableSortedSet
o System.Collections.Generic.LinkedList
o System.Collections.Generic.Queue
o System.Collections.Generic.SortedList<TKey, TValue>
o System.Collections.Generic.Stack
o System.Linq.IGrouping<TKey, TElement>
o System.Linq.ILookup<TKey, TElement>
§ System.Linq.Lookup<TKey, TElement>
o System.Linq.IOrderedEnumerable
o System.Linq.ParallelQuery
§ System.Linq.OrderedParallelQuery
o System.Linq.IQueryable
§ System.Linq.IOrderedQueryable
§ System.Linq.EnumerableQuery
§ System.Data.Linq.ITable
§ System.Data.Linq.Table
§ Microsoft.EntityFrameworkCore.DbSet
o System.String (implements IEnumerable
o T[] (array of T)
In the above list, LINQ to Objects queries cannot be directly used for ParallelQuery
For history reason, there are some types may only implement IEnumerable, but not implement IEnumerable