(Attempted) Learning

Saturday, February 11, 2012

Chapter 4: Heuristic Search

Heuristic - rule employed for choosing branches in a state space that are most likely to lead to an acceptable solution. It is, in short, an informed guess. Heuristics are used when:

inherent ambiguities remove the possibility of an exact solution
computational costs may be prohibitive.

4.1 Hill-Climbing and Dynamic Programming

4.1.1 Hill-Climbing

Hill-Climbing describes the search strategy of blindly selecting the "best" child of the current node for examination. Because it cannot backtrack, hill-climbing can be easily broken by local maxima or dead ends.

Samuel's 1959 checker's algorithm employed a variation of hill-climbing by attempting to self-improve through modifying its evaluation equation. It attached weighted coefficients to game features (piece advantage, location) and would modify these coefficients on the fly if it began to fare poorly.

4.1.2 Dynamic Programming (DP)

Dynamic Programming utilizes memoization; that is, it tracks solutions to sub-problems for later use in solving the given large problem. This can help boost memory performance by avoiding repetitive calculations. The following two text processing algorithms utilize DP:

1. Optimum global alignment of two strings

For example, the strings BAADDCABDDA and BBADCBA can be aligned as:

BAADDCABDDA

BBA DC B A

To do this we must either shift, insert, or replace characters in order to align the two strings. We can set up the strings in an array and go through each matching point to determine the cost of alignment at that point. Eventually we can move backward through this to find the cheapest alignment.

2. Minimum edit difference

Here we wish to turn one string into another, as cheaply as possible. We set up the two given strings in array fashion as before, but now we calculate the cost at each cell based on if we insert, delete, or replace. Again, we finish by moving through the array along the cheapest path.

For example, to turn intention into execution:
intention
ntention - delete i, cost 1
etention - replace n with e, cost 2
exention - replace t with x, cost 2
exenution - insurt u, cost 1
execution - replace n with c, cost 2

Total cost for this transformation is 1 + 2 + 2 + 2 + 1 + 2 = 8.

So why do Dynamic Programming? Because of its computation cost improvement. Whereas exhaustive search is exponential when comparing two strings (2^n or 3^n), DP costs n^2 or n^3 at worst.

4.2 The Best-First Search Algorithm

4.2.1 Implementing Best-First Search
best-first search utilizes lists to track states: open for states not visited and closed for the previously examined. Unlike depth-first and bread-first search though, best-first also re-orders its open list to keep the most promising states in front. This is termed a priority queue. It also tracks the shortest partial solution path.
In short, best-first search combines the benefits of a shortest-path solution with the flexibility of surviving local maxima and dead ends.

4.2.2 Implementing Heuristic Evaluation Functions
One way to compare similar heuristics is to use an evaluation function, f, such that:

f(n) = g(n) + h(n)

where:

g(n) measures the path length from n to the root
h(n) represents the heuristic estimate of the distance from n to the goal state

With function f we can compare heuristics while taking into account their probability of finding the shortest path. This comes into play if, say, we're comparing two states with identical heuristic evaluations. Whichever of these states is closer to the root will be more likely to lie on the shortest path.

h(n) also provides a breadth-first flavor to an algorithm since it will keep it from being misled too deeply down a search path. As a heuristic travels deeper and deeper into a graph the h(n) term will balloon, causing it to eventually backtrack and pursue a shallower avenue.

4.2.3 Heuristic Search and Expert Systems
Simple games are ideal for learning heuristics because:

They're large enough to benefit from heuristics
They're complex enough to merit testing by a variety of heuristics so there's a lot we can compare
They're easy to represent
States are relatively similar so heuristics can be uniformly applied

Confidence measures (-1 to 1), also known as certainty factors, can be used to quantify how "useful" a conclusion is. For example with the financial adviser system, we may conclude that someone with a lot of savings will want to invest heavily and not save much, but there's always the chance they'll want to save instead. To account for this we can give such a conclusion a certainty factor of 0.9 to show that there's a small chance it's incorrect.

Confidence measures come with their own complexities though; for example, if a conclusion is used as a premise to another expression, how should the confidence measure be factored in?

4.3 Admissibility, Monotonicity, and Informedness

We often wish to evaluate the performance of a heuristic along different metrics.

4.3.1 Admissibility Measures

Admissible heuristics are guaranteed to find the shortest path to a goal state.

A* algorithms utilize best-first-search along with an evaluation function in which h(n) <= minimal path cost from n to goal. For example, breadth-first search is an A* algorithm with h(n) == 0.

I find the notion of "h(n) <= minimal path cost from n to goal" to be confusing. Luger provides an example to help clarify this. He says that with the 8-puzzle, the heuristic of counting the number of tiles not in the goal position is h(n), while the minimal path cost, h*(n), is the number of moves required to move them to their goal position. By these definitions, h(n) < h*(n).

In other words h(n) is simply the output of our heuristic evaluation, while h*(n) is the actual algorithmic cost of reaching the goal state from our present position.

4.3.2 Monotonicity

Monotonicity is basically admissibility applied to all nodes. In other words, the heuristic is everywhere admissible - when a node is visited, it is reached along the shortest path.

4.3.3 Informedness
A heuristic h2(n) is more informed if h2(n) >= h1(n). h2(n) is "closer" to h* and thus is more accurate at searching the given space.

4.4 Using Heuristics in Games

4.4.1 Minimax
Minimax is used in two-player games. With minimax leaves are scored by heuristic and then selected values are propagated back up the tree. The value of a parent is determined by one of its branches, according to the following rule:

if the node is a max (i.e. its my turn) then the greatest value is propagated up.
If the node is a min (i.e. opponent's turn) then the smallest value is propagated up, because this is what the opponent will likely pick.

Note that minimax requires two passes over the search space: one to descend and apply the heuristic, and a second to propagate values back up.

4.4.2 Minimaxing to Fixed Ply Depth
We can use n-ply look-ahead to limit the memory usage of minimax. With this strategy we treat the nodes at level n as the leaves and only propagate n levels.
This method suffers from the horizon effect, wherein we can only look so far ahead and can get trapped by local minima or a clever opponent.

4.4.3 The Alpha-Beta Procedure
This method prunes the minimax search space. Two values, alpha and beta, are assigned so that disadvantageous branches are ignored. The alpha value goes with MAX nodes and won't decrease so we can prune MAX grandchildren less than it; the beta goes with MIN nodes and won't increase.

4.5 Complexity Issues
Branching factor - average number of children of any state.
Number of states at depth n is B^n.

(Attribution: these entries are my notes and reflections on George Luger's Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Some of what I write here may be direct from the text. In other words, if something you read strikes you as profound, original, or interesting, I probably didn't write it.)

Monday, January 30, 2012

Chapter 3: Structures and Strategies for State Space Search

3.1 Structures for State Space Search

3.1.1 Graph Theory

Problems can be represented as state space graphs. Graph Theory can then be used to analyze the structure and complexity of both the problem and the search procedures used to solve it.

This can help us answer questions like:

"Is solution guaranteed to be optimal?"

"What is complexity of search process in terms of time/memory usage?"

Definitions:

Graph is set of nodes and arcs/links connecting them.

Nodes are discrete states in process, e.g. configurations of game board or result of logical inference

Arcs are transitions between states; e.g. logical inferences or legal moves of game

Graph theory is the best tool for reasoning about structure of objects and relations, e.g. Euler and bridges of Konigsberg.

Graph theory and predicate calculus can both represent problems, but graph theory does better job showing structure of the problem. Concept of node degree is readily apparent with graphing, not at all with predicates.

3.1.2 Finite State Machine

FSM is a finite, directed, connected graph, having a set of states, a set of input values, and a state transition function that describes the effect that the elements of the input stream have on the states of the graph.

FSM's are used to recognize components of a formal language.

3.1.3 The State Space Representation of Problems

Not all states are trees - sometimes states can be reached through multiple paths. This should be taken into account when designing search algorithms. One should consider not only the solution path, but the best solution path.
GL provides several examples of state spaces; all are worth study:

Tic-Tac-Toe
The 8-Puzzle
The Traveling Salesman

Basic search methods can perform poorly if they are exhaustive; that is, if they lack methods of reducing the size of the search space. Search complexity can be reduced through tools like branch and bound and nearest neighbor. Branch and bound "cuts off" branches when a set bound is surpassed; e.g. when a branch length surpasses that of a solution path already known. Nearest neighbor chooses the next path element based on its given "proximity" to the current state, wherein proximity is an arbitrary measure for the problem domain.

3.2 STRATEGIES FOR STATE SPACE SEARCH

3.2.1 Data-Drive and Goal-Driven Search

Data-Driven Search (Forward Chaining) - takes facts of problem and applies rules or legal moves to produce new facts, leading to goal.

Suggested if:

All or most of data is given in problem statement; e.g. geology solver to determine what minerals at given site
Large number of potential goals but few rules/ways to use given facts. e.g. DENDRAL finding molecular structure based on formulae.
Difficult to form goal/hypothesis

Goal-Driven Search (Backward Chaining) - uses information about goal, finds rules required to produce goal and conditions necessary for these rules. These conditions become new subgoals.

Suggested if:

Goal/hypothesis is given or easily determined
There are large number of rules matching facts of system, which would lead to many conclusions. Starting with goal eliminates many wrong branches.
Problem data are not given and must be determined by solver. e.g. Doctor only orders test he thinks will confirm diagnosis; he doesn't order every test.

3.2.2 Implementing Graph Search

Backtrack Algorithm - stores unprocessed states, dead states, and nodes to keep track of which have been searched, which are bad, and which haven't been visited. Avoids repetitive looping.

3.2.3 Depth-First and Breadth-First Search

Along with search direction, algorithms need an order in which to examine states. Right now we'll look at depth-first and breadth-first search as examples of ordering.

Depth-First Search - analyzes children before siblings. The list of unprocessed nodes is maintained in a LIFO, or stack, fashion whereby newly-discovered nodes (children) are processed first. The above image illustrates the order in which nodes are processed in this stack-based search method:

Tree searched depth-first (source: Wikipedia)

Starting at 1, the stack of unprocessed nodes is [2, 7, 8]
The algorithm goes to node 2
The children of node 2 are added to the top of the stack: [3,6,7,8]
3 is searched next
The children of 3 are added to stack, which becomes: [4,5,6,7,8]
And so on...

We see that as more children are discovered, the siblings of previous nodes (7 and 8) are left sitting at the end of the stack.

The space usage of depth-first search is a linear function of the length of the path. So if nodes have an average of B children, the total space usage is B * n to go n levels deep. (p. 106)

Depth-first search may not find the shortest path to a goal state because the children on the path its searching may loop back to a goal on a previous level. This method would have no way of knowing this.

When to use depth-first:

if solution path is expected to be long. Depth-first search will get into deep space quickly and will not suck up memory by storing siblings.
When lots of branches exist (for same reason as above).

Breadth-First Search - orders states to be processed level by level, so siblings are analyzed before children. This ordering is FIFO and acts as a queue. The following image illustrates this method:

Tree searched breadth-first (source: Wikipedia)

The children of state 1 - 2,3,4 - are added to the list of unprocessed nodes: [2,3,4]
Node 2 is analyzed next.
It's children - 5 and 6 - are added to the queue: [3,4,5,6]
Node 3 comes next, and so on...

Breadth-first search will always find the shortest path to a goal, since it goes level by level. This is because the shortest path to a goal is the number of levels between it and the root.

The space usage of breadth-first search is an exponential function of the length of the path, since it adds all nodes on a level to the queue before moving to the next level. If nodes have an average of B children, there will be B^n states on level n.

Breadth-first search is ideal when:

a simple solution is known to exist
the branching factor of a graph is low

3.2.4 Depth-First Search with Iterative Deepening

Depth-first iterative deepening combines the small memory usage of depth-first with the thoroughness of breadth-first search. It uses increasing depth-bounds to slowly search the entire space, resulting in it always finding the shortest path to a goal while only using B*n storage space at any level.

All three search strategies discussed here have worst-case exponential time-complexity, which is why heuristics will be used to overcome such uninformed methods.

3.3 Using the State Space to Represent Reasoning with the Propositional and Predicate Calculus

3.3.1 State Space Description of a Logic System

With sound and complete inference rules, graph-based reasoning can be used to find correct conclusions. In short, logic systems can be mapped with graph theory and solved with search methods.

3.3.2 And/Or Graphs

Hypergraphs can be used to accurately depict and/or relations in graph theory. These graphs can have multiple arcs between nodes. For our purposes, and relations will be shown by connecting descendant arcs together with a line.

Note that when solving a node with an and relation, all and descendants must be solved/true.

3.3.3 Further Examples and Applications

Examples considered:

MACSYMA
goal-driven and/or search
Financial Advisor
English Language Parser and Sentence Generator

Take away:

Predicate calculus has no global variables - the extent of the meaning of any variable name is contained within the clause in which it appears (i.e. variables are parameters and don't have fixed memory locations)
Working through examples helps.

(Attribution: these entries are my notes and reflections on George Luger's Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Some of what I write here may be direct from the text. In other words, if something you read strikes you as profound, original, or interesting, I probably didn't write it.)

Sunday, January 29, 2012

Chapter 2: The Predicate Calculus

GL states that the "function of any representation scheme is to capture... the critical features of a problem domain and make that information accessible to a problem-solving procedure." (AI, 37)

In other words, representations are abstractions of a problem domain which can be used to solve or learn more about them. One such representation language, predicate calculus (PC), captures the properties and relations between things in a domain from which to reason. PC offers formal semantics and sound and complete inference rules.

2.1 The Propositional Calculus

I'm choosing to not outline this section as I mostly skimmed through it. It presents a general review of propositional calculus which can be found in any intro-level course on logic or C.S. math foundations.

2.2 The Predicate Calculus (PC)

PC allows us access to the individual components of assertions. Instead of "it rained on tuesday" being contained in the atomic component P of Propositional Calculus, we can make it into a predicate statement like weather(Tuesday, rain). In this format we now have a relationship between between the day and the weather. With inference rules we can then discover new expressions by manipulating previous ones.

So I guess predicate calculus lets us fit information, like a problem domain, into a formal system while still preserving the relations among its members. It also allows variables to be used and quantified. More on this next...

2.2.1 The Syntax of Predicates and Sentences

Symbol - irreducible syntactic element. Symbols are used to denote objects, properties, or relations They may represent either variables, constants, functions, or predicates. I guess symbols are the "atomic" elements of PC.
Improper Symbol - used to constructed well-formed expressions. These are parentheses, commas, and periods.
Function - map one or more elements in a set (domain) into a unique element of another set (range)
Atomic Sentence - predicate of arity n followed by n terms enclosed in parentheses and separated by commas
Predicate - names a relationship between zero or more objects in the world. True of False.

A predicate relation is defined by its name AND its arity.

2.2.2 A Semantics for the Predicate Calculus
Predicate calculus semantics provide a formal basis for determining the truth value of well-formed expressions. Remember: constants, variables, and predicates are "mapped" onto objects and relations in the domain of discourse. They represent them. The truth of relationships in the domain determines the truth of the corresponding expressions. So PC is just our way of "viewing" the domain; this is why it's so important to map it correctly - we want our expressions to accurately reflect what they represent.

Interpretation - An interpretation over a nonempty set (Domain) maps the elements of D onto constant, variable, predicate, and function symbols of a PC expression. As I recall, an interpretation must ensure all expressions in the given set are true.

In PC variables can be quantified existentially or universally. Under the given interpretation, a subset of the constants of D can be substituted for a variable.

PC is considered undecidable because it has universal quantification; i.e. it can be computationally impossible to test all the substitutions for a universally-quantified variable.

2.2.3 A "Blocks World" Example of Semantic Meaning
An example predicate calculus description of a blocks world:

on(c,a)
on(b,d)
ontable(a)
ontable(d)
clear(b)
clear(c)
hand_empty

and an example rule, describing when a block is clear:

\/X (!EY on(Y,X) -> clear(X))

This is read as: "For all X, if there is no Y such that Y is on X, then X is clear."

(Note: I haven't spent time figuring out how to use symbols in blogger so for now I'm using approximations.)

2.3 Using Inference Rules to Produce PC Expressions

2.3.1 Inference Rules

The semantics of PC allow us to logically infer new correct, consistent expressions from a set of true assertions.

"An expression X logically follows from a set of expressions S if every interpretation that satisfies S also satisfies X." In other words, X logically follows if the same set of interpretations are possible with it included alongside S. Inference rules are used to determine if an expression logically follows, since the number of interpretations may be infinite!

Inference rules:

produce new sentences based on the syntactic form of given logical assertions.
are sound if the sentences it produces all logically follow from the set its operating on.
are complete if it can produce every sentence that logically follows from a set. (e.g. Modus ponens)

2.3.2 Unification

Unification determines the substitutions needed to make two PC expressions match. For this to work, all variables must be universally quantified. Existentially quantified variables may be eliminated by replacing them with appropriate constants (to make them True).

Unify pseudo-code on page 69.

(Attribution: these entries are my notes and reflections on George Luger's Artificial Intelligence: Structures and Strategies for Complex Problem Solving. Some of what I write here may be direct from the text. In other words, if something you read strikes you as profound, original, or interesting, I probably didn't write it.)

Project: Reading, Reviewing, and Reflecting on My AI Course (Textbook)

This semester I'm taking a course on Artificial Intelligence taught by George Luger. Since lectures have never been my preferred method of learning (i.e. I tend to absorb little from them) I've decided to post my chapter notes to aid my learning as I read through his textbook, Artificial Intelligence: Structures and Strategies for Complex Problem Solving. My outlines won't be exhaustive; they'll primarily serve to solidify my understanding (in other words, don't get mad if I leave stuff out).

I'm doing this for three reasons:

To write. I've noticed as a coder that unless I go out of my way to find reasons to write (in English), I just won't do it. I swear my coding chops are 100% inversely related to my writing ability.
To internalize. There's no better way to learn than by teaching! And blogging is about as close to teaching as I can get right now. (My wife wasn't interested in learning this stuff...)
To not suck. If I don't maintain this reading/note-taking endeavor which is, admittedly, by best way to learn, than I'm just another lazy student!

Right now the plan is to knock out one chapter per week to stay in tune with the actual course. These So, without further ado, here comes Chapter 2!

(This endeavor didn't materialize until we completed Chapter 2. Here's a summary of Chapter 1.)