Definitions

Binary Trees are data structures where each node has at most two children. Similar to lists, they are ADT's that can store information despite having different shapes from lists.

Binary Tree example 1
Binary Tree example 2

Note: For this program, operators must always be a parent node to either an operand(s) or another operator(s). Operators with higher precedence also appear further down in the tree.

Binary Tree example 3

AST (abstract syntax tree) is a binary tree representation of an expression. Since we will be dealing with binary operators exclusively, ASTs are suitable data structures for algebraic expressions. 

Tokens are the basic input units for the program, whether it be a symbol, letter, or number. These are separated by delimiters, or "marks" for the beginning or end of a token. These can marked by whitespace such as " ", "\n", or "\t" (space, new line, or tab).


Specification

This program reads algebraic expressions from cin with its solution directly underneath the echoed line written to cout. The steps within the program are:

1. Reading tokens
2. Conversion from infix expressions to postfix form
3. Construction of a binary tree that represents the equation in postfix form
4. Combining numeric operands via an operator 
5. Evaluation of any variables leading to simplification of the binary tree
6. Conversion back to infix form to be printed out to the user.


Input

The tokens for this program will hold either an operator, one side of a parenthesis, letter, integer, end of line marker, or period (".").  

An enum class will be used to distinguish the different types of data each token holds. 

Operators will be split into three categories in order to easily compare precedence so that the order of operations will still stand. 

- the enum attribute powep represents exponents
- the enum attribute mulop represents multiplication and division
- the enum attribute addop represents addition and subtraction

Parenthesis take precedence over operators. The left and right parentheses as well as the contents within the parenthesis represents one binary expression that can be further broken down to another binary expression or individual token. 

- the enum attribute lparen represents the left parenthesis
- the enum attribute rparen represents the right parenthesis

Variables are stored in data members within the internal structure of the program. They can hold either constants, variables, or a combination including operators. Each variable is represented by a single letter of the alphabet and can be overwritten if assigned a constant value multiple times. Since there are 26 possible variables (52 including both lowercase and uppercase forms), there are 26 objects that would be able to hold assigned values; each of these variables are initialized to themselves, ie: a = a, b = b, etc.

- the enum attribute variable represents variables

Integers will represent numbers/constants. While operations like addition, subtraction, and multiplication are straightforward, division will be done via integer division. 

- the enum attribute number represents integers

The character that marks the end of the line ('\n') will mark the end of a line

- the enum attribute eol represents the end of a line

The period (".") will mark the end of the input

- the enum attribute end represents periods

:= is a custom assignment symbol that will indicate assignment of a constant, variable, or combination to the value of a variable. Although it is not an official C++ operator, as an input, it will function like an assignment operator for variables in this program. 

Each input is placed on a single line, for example:

7 + 2
a := 3
a + 9
b := c - a
c := 6
b

This format can be achieve by a stringstream or direct line-by-line input from the user.

Output

From a user's perspective, when assigning a value to a variable, the program will only display the current value on a line underneath the assignment. For any variables included in an operation through addition, subtraction, multiplication, division, or exponents, the program will either display the value of any variables as a combination or variables and constants or retrieve a constant value that will be evaluated in combination with any other constants.  


The format will reflect the input but with each solution printed on the line directly beneath the input. In order to make the output more human-friendly, the words in and out followed by [n]: before the input/output will denote each expression. For example:


in   [1]: 7 + 2
out [1]: 9
in   [1]: a := 3
out [1]: 3
in   [1]: a + 9
out [1]: 12
in   [1]: b := c - a
out [1]: c - 3
in   [1]: c := 6
out [1]: 6
in   [1]: b
out [1]: 3

Error Handling

If an expression is outside of the format of two operands per operator, an exception will be thrown with a message explaining that the current list of expressions is unsolvable.

Any mathematical issues such as dividing by 0, parenthesis mismatches, etc will default to an error message that indicates to the user the problem is unsolvable.


Design

This program is composed of a few different classes with defined roles that interact to produce algebraic solutions to valid algebraic inputs.

TokType

TokType is an enum class that categorizes the current input into an operator, parenthesis, variable, number, or period.

enum attributes: nil (default value), addop, mulop, powop, variable, number, lparen, rparen, eol, end

Token

Token is a struct that is associated with the TokType enum class. It receives data from the input stream operator and constructs the token with the value passed along with its enum type.

Data members:

+type_: TokType

Stores the enum value that marks the type of token of the current object.

+value_: string

Stores the char or combination of char value in the form of a string.

The method is described below in UML notation for accessibility:

Constructor

+Token(tkn: TokType, val: string)

This parameterized constructor initializes its two data members with values passed from the input stream operator. Default values are nil TokType and an empty string.

Tokenizer

This class is responsible for taking the input stream and converting it to an input token stream. Since this class holds an input stream (construction allows an input stream object to be passed as an argument), the program will be reading what it believes are tokens despite the user input actually being a string of characters.

Data members:

-is_: input stream

Stores the input stream reference; a reference to the actual raw input the program is receiving. This attribute will perform the token reading within the ITokStream& operator>> function.

+theList_: vector<Token>

The list of tokens being constructed from the tokenizer object. This vector  will include the expressions in infix expression form as tokens are constructed but are passed as a reference to the toPostfix( ) method for transformation to postfix expression form.

The methods are described below in UML notation for accessibility:

Constructor

+ITokStream(inputStream: input stream)

Receives the input stream by passing data from the input stream

+ITokStream& operator>>(Token& rhs);

The overloaded token input stream operator reads each token. As the tokens are converted from infix expression to postfix, they are pushed back into theList_ vector.

Since most algebraic terms are a single character, we can categorize the token types each time a new character is read. However numbers my have more than one character so we need an algorithm that catches this case:

Algorithm for numbers:

ItokStream& operator>>(Token)
// Reading numbers
if tokentype is a number
continue reading from the stream until the next character is not a number
wrap the combination of characters into a string and then a token
end if
end operator

+explicit operator bool( );

This function tests whether there is an error on the input stream

-toPostfix(vector<Token>: theList);

Responsible for taking theList_ vector in infix expression form and converting it to postfix expression form.

Note: Each theList_ vector should be an individual line and not the entire stream (in case the user implements a stringstream).

Algorithm for postfix conversion:

Create vector postfixExpr
Create vector temp

If token t is an operand
push back t into postfixExpr
else if token t is an operator
put t in temp
else
discard any parenthesis tokens/unexpected tokens
end if

AST::Node

This struct is encapsulated within the Abstract Syntax Tree class. It holds a token as well as pointers to its two potential children. 

Data members:

+tok_: Token

A token object that has been parsed from input data; contains token type data along with a string value.

+left_: Node*

Pointer to the left child

+right_: Node*

Pointer to the right child

The methods are described below in UML notation for accessibility:

Constructor

+Node(Token t);

The constructor instantiates a node, setting its Token value to the Token data being passed

AST

Responsible for storing variable values and converting postfix expressions to infix, this class contains the Node object, a pointer to a root Node, a vector of Tokens passed from outside, and methods that handle infix conversions and variable simplification. 

Data members:

-root_: Node*

Pointer to the root node

-problems_: vector<Token>

Initialized as an empty vector that is filled when the AST object receives a vector in postfix expression form

Note: To avoid early assignment of variables if the user uses a stringstream, each problems_ vector should represent a single line and not the entire stringstream.

The methods are described below in UML notation for accessibility:

Constructors & Overloaded Operator

+AST(AST& treePtr);
+AST(vector<Token>& postfixExpr);
operator=(AST& treePtr);

The default constructor initializes the root node pointer to nullptr and the problems_  vector to an empty vector while the parameterized constructor instantiates a binary tree, passing a vector with converted postfix expression tokens.

Destructor

~AST( );

Responsible for clearing all nodes since they are dynamically allocated on the heap.

+simplify(vs: VariableStore) : AST

Returns a new AST with variables converted to their current values. Since recursion is a foundational approach with binary trees, every Node is also a binary tree therefore, the return value could be as simple as a single node. Both the problems_ vector and the VariableStore object are read and modified.

Algorithm for traversal:

if tree is not empty
traverse the left child recursively
traverse the right child recursively
get token value
end if

+toInfix( ) : string

Converts the current string from a postfix expression to an infix expression, pushing back the data into the Bridge class's solution_ string. This conversion is necessary before sending the the output form in a human-friendly manner.

VariableStore::VariableItem

This nested struct within a VariableStore object constructs a variable item object that holds a current integer value, variable, or combination (AST object).

Data members:

+var_: string

The variable key; name; identifier. 

+varUpper_: string

The variable key; name; identifier in uppercase form.

+val_: string

The variable value; could be as complex as a combination of integers and variables or as simple as a single character. Works with strings since tokens also contain strings.

VariableStore

This class holds 52 possible variables in an array of 26 slots that keep track of their current values.

Data members:

-variables_: VariableItem[]

Array that holds the 52 possible variables in 26 slots

-problems_: vector<Token>

Initialized as an empty vector that is filled when the AST object receives a vector in postfix expression form

Note: To avoid early assignment of variables if the user uses a stringstream, each problems_ vector should represent a single line and not the entire stringstream.

The methods are described below in UML notation for accessibility:

Constructor

+VariableStore( );

The constructor sets all identifiers and values to themselves: a = a, A = a, b = b, B = b, ... z = z, Z = z.

+getVar(variable: string) : VariableItem

Returns the value of a requested variable

+setVar(newVal: string, variable: string) : bool

Sets the value set in the first argument for the variable passed in the second argument

Friend Operator:

operator<<(os : ostream, string: sol): ostream

Responsible for printing the solution string in a readable, human-friendly manner.

Declared in the header file as a friend in order to access private data members


Implementation Plan

 There are multiple approaches that can be taken in building the program. The order taken with this plan reflects the path in which data flows from user input, transformed into a final output. Therefore, the plan will start with the input process.

1. TokType, Token, & Tokenizer

a. Create the enum class and Token struct.

TokType { }
Token( )

a.1. Set up the enum class and Token struct along with a driver file.

a.2. Use a temporary print method to ensure that Token can be initialized correctly, setting and retrieving TokType (along with string values).

b. Declare and implement the Tokenizer constructor.

Tokenizer( )

b.1. Declare the constructor, data members, and operations within the header file. Use an initializer list to have the input stream data member initialize at construction. 

b.2. Add a test output string within the constructor to test whether the Tokenizer( ) is holding an input stream object correctly. Ensure that the input stream object is being passed as a reference and that the data member is a reference to an input stream object.

c. Set up the ITokeStream operator.

ITokStream& operator>>( )

c.1. Create a local char variable that will be read by the input stream data member.

c.2. Test whether tokens can be read by first constructing a token in the driver file. In the method definition, assign the new constructed token's type to any of the enum types and the value to the local char variable that is being read. Write a line to print to cout to determine whether tokens can be read and assigned values.

c.3. Set up cases for specific algebraic characters. Any characters outside of the TokType values should be ignored. While most of the character types we need can be a single character, numbers may contain more than one character; refer to the design section. 

c.4. Construct a stringstream in the driver file with a series of possible possible problems. Try both the stringstream approach and directly inputting problems.

c.5. Test obscure and unexpected symbols, for example, ˆ ƒ ∂ @ ~. Ensure that these characters are ignored.

d. Test conditions for the bool operator in the driver file.

explicit operator bool( ) const

d.1. Set up conditions using the bool operator to read from cin in the driver file.

2. VariableStore

VariableStore::VariableItem

a. Create the VariableStore::VariableItem struct along with its data members. Declare an array of 26 VariableItem objects and begin implementing the default constructor.

VariableStore( )

a.1. The default constructor should initialize all 26 VariableItem objects to have a lowercase VariableItem.var_ value for each letter of the alphabet.

a.2. Print to cout (temporary code) to check whether both forms (upper and lowercase) as well as VariableItem.val_ are correctly assigned.

b. Implement set( ) method.

setVar( )

b.1. The first parameter determines the new value to be set while the second specifies which variable is being set. A temporary print method can be used to check whether these are correctly set. All invalid entries or variables should be ignored.

b.2. Confirm that the items are being correctly set from testing expected values such as combinations of constants and variables in the driver file. 

b.3. Test obscure values such as  ˆ ƒ ∂ @ ~ to make sure that an error message is displayed (if it interferes with the formatting of each AST). If all AST's are intact syntactically, test whether these obscure characters are being ignored.

c. Implement get( ) method.

getVar( )

c.1. Work with the set( ) method to test whether the correct values are being retrieved in the driver file.

3. AST::Node

a. Create the Node struct while declaring the AST class (before implementation).

AST::Node( )

a.1. Set up the data members along with a constructor that initializes tok_ to the Token passed via argument and the child nodes to nullptr.

4. AST

a. Create data members, constructor, copy constructor, and assignment operator.

AST( )
AST(AST& treePtr)
operator=(AST& treePtr)

a.1. Create the root node pointer and problems_ vector. The default constructor initialises the root node pointer to nullptr and the problems_  vector to an empty vector. 

a.2. Create the parameterized constructor that receives a vector of tokens. Push back the values of the passed vector into the problems_ vector via a loop. 

a.3. Create the copy constructor, ensuring that the copies made are deep copies and not simply copying pointers.

a.4. Implement the assignment operator.

b. Implement the destructor.

~AST( )

b.1. Since all binary trees will be dynamically allocated, the destructor will be responsible for deallocating all nodes within the trees.

c. Implement the simply( ) method.

simplify( )

c.1. Using a VariableStore object, testing the simplify( ) method requires looping through the entire problems_ vector, setting a new value to a specific variable when the := operator is used and/or substituting the numeric value of the variable in the problems_ vector.

c.1.1. Assigning variables using the := operator: 
Check that there are one or more expressions in the input with the := operator. Once simplify( ) is implemented, verify whether the newly assigned values are reflected in both the problems_ vector and the VariableStore object. Temporary print methods can be used for verification.

c.1.2. Substitution and storage:
Check that at least one variable in the input (the more the better). Also check whether there is at least one := included in the input. Use a temporary printer to check the problems_ vector before any variable assignments were made and after. Success is when the problems_ vector receives correct variable assignments that are properly stored in the VariableStore object.

d. Implement the toInfix( ) method.

toInfix( )

d.1. Reading the AST nodes in token form, this method should return a string of infix expressions. This can be tested by writing to cout. 

d.2. Implement call to Bridge::writeSolution( ), passing toInfix( ), ie Bridge::writeSolution(toInfix( )). Test the results by calling Bridge::getSolution( ) in the driver file.

5. ostream& operator<<( )

a. Overload the friend output stream operator.