METHOD AND SYSTEM FOR STORING PENDING CHANGES TO DATA
TECHNICAL FIELD
[0001] The present invention relates generally to a computer method and system for generating a computer program.
BACKGROUND
[0002] Computer programs are generally written in a high-level programming language (e.g., Java or C). Compilers are then used to translate the instructions of the high-level programming language into machine instructions, which can be executed by a computer. The compilation process is generally divided into 6 phases:
1. Lexical analysis
2. Syntactic analysis
3. Semantic analysis
4. Intermediate code generation
5. Code optimization
6. Final code generation
[0003] During lexical analysis, the source code of the computer program is scanned and components or tokens of the high-level language are identified. The compiler converts the source code into a series of tokens that are processed during syntactic analysis. For example, during lexical analysis, the compiler would identify the statement cTable=1.0; as the variable (cTable), the operator(=), the constant (1.0), and a semicolon. A variable, operator, constant, and semicolon are tokens of the high-level language.
[0004] During syntactic analysis (also referred to as "parsing"), the compiler processes the tokens and generates a syntax tree to represent the program based on the syntax (also referred to as "grammar") of the programming language. A syntax tree is a tree structure in which operators are represented by non-leaf nodes
and their operands are represented by child nodes. In the above example, the operator ("=") has two operands: the variable (cTable) and the constant (1.0). The term "parse tree" and "syntax tree" are used interchangeably in this description to refer to the syntax-based tree generated as a result of syntactic analysis. For example, such a tree may optionally describe the derivation of the syntactic structure of the computer program (e.g., may describe that a certain token is an identifier, which is an expression as defined by the syntax). Syntax-based trees may also be referred to as "concrete syntax trees," when the derivation of the syntactic structure is included, and as "abstract syntax trees," when the derivation is not included.
[0005] During semantic analysis, the compiler modifies the syntax tree to ensure semantic correctness. For example, if the variable (cTable) is an integer and the constant (1.0) is real, then during semantic analysis an real to integer conversion would be added to the syntax tree.
[0006] During intermediate code generation, code optimization, and final code generation, the compiler generates machine instructions to implement the program represented by the syntax tree. The machine instructions can then be executed by the computer.
[0007] To develop a computer program, a programmer typically uses a text- based editor to specify letters, numbers, and other characters that make up the source code for the computer program. The text-based editor may store these characters in the source code file using an ASCII format and delimiting each line by an end-of-line character. After the source code file is created, the programmer runs a compiler to compile the source code into the corresponding object code for the computer program. As the compiler proceeds through its lexical analysis, syntactic analysis, and semantic analysis phases using the source code as input, it may detect an error in the source code. If the programmer has specified a syntactically incorrect statement in the source code, then the compiler may stop its compilation and output an indication of the incorrect statement. For example, the syntax may specify that "==" is the "equal to" operator, but the programmer may have inadvertently used "=", which may be the "assignment" operator, where the equal to operator should have been used. Once the programmer is notified of the error, the
programmer would use the text-based editor to correct the error and recompile the source code. A programmer may need to repeat this cycle of editing and compiling the source code many times until the compiler can complete the compilation of the source code.
[0008] Structured editors have been developed to assist programmers in the specifying of the source code for a computer program. In addition to performing the functions of the text editor, a structured editor may perform lexical and syntactic analysis as the source code is being entered by the programmer (referred to as "eager parsing"). A structured editor typically maintains a hierarchical representation of the source code based on the hierarchy of the programming language syntax. This hierarchical representation may be a syntax tree. As a programmer enters the characters of the source code, the structured editor may perform lexical and syntactic analysis. If the structured editor detects a lexical or syntactic error, it typically notifies the programmer and requires correction before the programmer can continue entering the source code. For example, if a programmer entered the assignment operator, rather than the equal operator, the structured editor would require the programmer to immediately correct the error.
[0009] A system has been described for generating and maintaining a computer program represented as an intentional program tree, which is a type of syntax tree. (For example, U.S. Patent No. 5,790,832 entitled "Method and System for Generating and Displaying a Computer Program" and U.S. Patent No. 6,097,888 entitled "Method and System for Reducing an Intentional Program Tree Represented by High-Level Computational Constructs," which are hereby incorporated by reference.) The system provides a mechanism for directly manipulating nodes corresponding to syntactic elements by adding, deleting, and moving the nodes within an intentional program tree. An intentional program tree is one type of "program tree." A "program tree" is a tree representation of a computer program that includes operator nodes and operand nodes. A program tree may also include inter-node references that are not tree-like in nature such as a reference from a declaration node of an identifier to the node that defines that identifier's type. An abstract syntax tree and a concrete syntax tree are examples of a program tree. Once a program tree is generated, the system performs the steps of semantic
analysis, intermediate code generation, code optimization, and final code generation to effect the transformation of the computer program represented by the program tree into executable code.
[0010] That system also provides commands for selecting a portion of a program tree, for placing an insertion point in the program tree, and for selecting a type of node to insert at the insertion point. The system allows various commands to be performed relative to the currently selected portion and the current insertion point. For example, the currently selected portion can be copied or cut to a clipboard. The contents of the clipboard can then be pasted from the clipboard to the current insertion point using a paste command. Also, the system provides various commands (e.g., "Paste =") to insert new a node at the current insertion point.
[0011] The system displays the program tree to a programmer by generating a display representation of the program tree. A display representation format specifies the visual representation (e.g., textual) of each type of node that may be inserted in a program tree. The system may support display representation formats for several popular programming languages, such as C, Java, Basic, and Lisp. This permits a programmer to select, and change at any time, the display representation format that the system uses to produce a display representation of a program tree. For example, one programmer can select to view a particular program tree in a C display representation format, and another programmer can select to view the same program tree in a Lisp display representation format. Also, one programmer can switch between a C display representation format and a Lisp display representation format for a program tree.
[0012] The system also indicates the currently selected portion of the program tree to a programmer by highlighting the corresponding display representation of the program tree. Similarly, the system indicates the current insertion point to a programmer by displaying an insertion point mark (e.g., "|" or "Λ") within the displayed representation. The system also allows the programmer to select a new current portion or re-position the insertion point based on the display representation.
[0013] Structured editors allow source code to be selected and modified on a syntactic-element basis. For example, a structured editor may allow a programmer to select an identifier, the expression that contains the identifier (e.g., the identifier, binary operator, and the other operand), the statement that contains the expression, and the procedure that contains the statement. For example, given the following source code: void foo ( ) {a=b+10} a structured editor would allow the selection of the identifier "b," the selection of the expression "b+10," the selection of the statement "a=b+10," or the selection of the entire "foo" procedure. The structured editor might not allow the programmer to select only the identifier and its binary operator because that would be an incomplete syntactic element. For example, the programmer could not select only "b+."
[0014] Structured editors have not been widely adopted. This lack of adoption results primarily from the restriction that source code can only be modified on a syntactic element basis. Because a text-based editor does not have this restriction, programmers typically prefer to develop computer programs using text- based editors. Nevertheless, programmers would like to sometimes select the source code on a syntactic element basis. Therefore, it would be desirable to have a development environment that would allow the flexibility of a text-based editor while allowing the selection of source code on a syntactic element basis as provided by a structured editor.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] Figure 1 is a block diagram illustrating an inconsistent change to example source code.
[0016] Figure 2 is a block diagram illustrating an example layout of a file that contains source code and a pending change list.
[0017] Figure 3 is a diagram illustrating a program tree representation of example source code.
[0018] Figure 4 is a diagram illustrating a program tree representation of example new code.
[0019] Figure 5 is a diagram illustrating a program tree of example source code along with a pending change data structure in one embodiment.
[0020] Figure 6 is a diagram illustrating a program tree of example source code along with a pending change list in another embodiment.
[0021] Figure 7 is a block diagram illustrating a modification to example source code.
[0022] Figure 8 is a diagram illustrating an example program tree with a pending change.
[0023] Figure 9 is a block diagram illustrating components of a pending change system for source code in one embodiment.
[0024] Figure 10 is a flow diagram illustrating the change source code component of the change system in one embodiment.
[0025] Figure 11 is a flow diagram illustrating a display routine associated with an assignment operator in one embodiment.
DETAILED DESCRIPTION
[0026] A method and system for storing pending changes to data having a data model is provided. As described below, the method and system may be used in a program development environment to provide advantages of both text-based editors and structured editors. In one embodiment, a pending change system receives a change that is to be made to data. When the change is "committed," the change system determines whether the change to the data will result in changed data that is consistent with or conforms to a data model. For example, if the data is source code for a computer program, then the change will be consistent with the data model (i.e., syntax of the programming language) when the change will result in source code that is a syntactically correct. When the change system determines that the change is consistent, it updates the data accordingly. If, however, a change is not committed or committed but inconsistent, the change system stores the change in a pending change data structure associated with the data. A change that has been committed and is consistent with the data model is referred to as a "recognized change" or an "accepted change." A change that has not yet been committed or that has been committed but is inconsistent is referred to as an
"unrecognized change" or a "pending change." For example, if the change is to add a valid statement to source code, then before being committed the change is unrecognized and after being committed the change is recognized. If the change is to add an invalid statement, however, then the change is unrecognized whether or not it was committed. By storing pending changes in a pending change list, the change system preserves the consistency of the data, but allows the changes, albeit possibly inconsistent changes, to be tracked. When the data is later displayed, the change system can examine the pending change list and display an indication of the pending changes along with the data. So, for example, the display may list source code with the invalid statement inserted as indicated by a pending change. The change system can allow a user to correct an inconsistent change or to commit the pending change so that the change can be recognized. A pending change that is inconsistent may become consistent due to a change in the data that does not modify the pending change. For example, a statement may be invalid because it declares an identifier to have a certain type that has not yet been defined. If a statement is later added that defines that type, the pending change becomes consistent without modifying the pending change itself. In this way, the change system can ensure that the consistency of the data is maintained, while tracking pending changes that may be inconsistent so that the changes can be corrected later.
[0027] In one embodiment, the pending change system is implemented as part of a program development environment that uses techniques typically associated with structured editors and other techniques typically associated with text-based editors. Thus, the program development environment might be characterized as a lightweight-structured editor that combines advantages of structured editors (e.g., semantic element selection) and text-based editors (e.g., flexible data entry), but minimizes the disadvantages of structured editors and text- based editors. Because the pending change system allows for the storing of changes that are not syntactically correct, a programmer can indicate a change to source code that makes it syntactically incorrect and have that change saved for later display and correction, rather than being forced to correct it immediately.
[0028] The pending change system can associate pending changes with the underlying data in many ways. For example, if the data is source code stored in a source code file in a non-structured manner, then the pending change list might be appended to the end of the file. As another example, if the source code is stored in a program tree, then the pending change list can be stored as a node of the program tree with a child node representing each pending change. Each pending change may specify the type of change (e.g., insertion), the portion of the data to which the change applies (e.g., insertion point), and the substance of the change (e.g., text to be inserted). When the pending change system displays the data, it can also display an indication of the pending changes. A user can select to display either the data by itself or the data as modified by the pending changes. When the data is source code, the pending change system may simply display text whose insertion is pending at its insertion point within the source code or not display portions of the source code whose deletion is pending.
[0029] The pending change system can be used in many environments other than conventional program development environments. Such environments may maintain Extensible Markup Language ("XML") documents, Universal Modeling Language ("UML") models, data stored in a database, and so on. When used in an XML environment, the change system may use an XML schema as the data model and track pending changes in an XML document. When used in the UML environment, the change system can use UML notations and semantic defined by the UML standards specification as the data model for a UML model. When used in a database environment, the change system may use the database schema and additional validation rules as the data model. As a user starts to make changes to the database, the change system may store the changes that have not yet committed or are inconsistent in a change table. The change table identifies the database tables to be changed, the type of change, the new data, and so on.
[0030] To facilitate the editing of data, the pending change system may convert data stored in a structured manner to unstructured data so that it can be changed by a user in an unstructured manner. For example, if source code is stored in a program tree, a user may select a statement to edit in an unstructured manner. The change system would generate the text corresponding to the
statement and allow the user to edit the text using conventional text-based editor techniques. The change system may add an entry to the pending change list that indicates to replace the selected statement with the modified text. When the change is eventually recognized, the pending change system can update the program tree and remove the entry from the pending change list.
[0031] Figure 1 is a block diagram illustrating an inconsistent change to example source code. Block 101 illustrates the source code that is to be changed. Block 102 contains the new code that is to be inserted into the source code. The source code and new code may be stored in separate files. To effect the change, a user may select the new code and paste it into the source code. As shown in block 103, the user has selected to insert the new code before the closing parenthesis of the condition (i.e., "(a>10)") of the "if statement. The insertion of the new code at this insertion point would result in source code that is syntactically incorrect. A conventional structured editor would attempt to parse the new code as it is inserted and would require the programmer to correct the source code at that time. Using such a structured editor, the programmer could correct the source code by, for example, undoing the insertion or moving the new code to a different position. The change system, in contrast, does not require the programmer to correct the source code at that time. Rather, the change system maintains an unchanged copy of the source code, which is syntactically correct, and stores an indication of the pending change in the pending change list associated with the source code in a persistent manner. When the programmer later displays the source code, the source code along with the pending change can be displayed as shown in block 103. At that point, the change system can allow the programmer to correct the change and to commit the change.
[0032] Figure 2 is a block diagram illustrating an example layout of a file that contains source code and a pending change list. The source code file 201 contains the source code 202 stored as text and the pending change list 203. Alternatively, the source code could be stored using a program tree or some other structured means. The source code could also be stored in the source code file both as text and as a program tree. The pending change list contains the pending change that is represented by the insertion point within the source code along with the new code to
be inserted. The new code can be stored as text or in some other way. For example, if new code itself was syntactically correct, then it could be stored as a program tree.
[0033] A pending change list can have multiple entries with each entry representing a different pending change. Each pending change entry contains information that describes the pending change. A pending change can be categorized as an insertion, a deletion, or a replacement. A replacement could be represented as two pending changes, that is a deletion of the source code to be modified and an insertion of the modified source code as new code. Each pending change entry may include a change type (e.g., insertion or deletion). An entry for a deletion may identify the range to be deleted, and an entry for an insertion may identify the insertion point and the data to be inserted. The pending changes can be represented in other ways. For example, an insertion could be represented by storing information at the insertion point itself (e.g., a pending change node stored within a program tree). The pending changes can also be stored in various types of data structures such as a table, a linked list, a tree, and so on.
[0034] Figure 3 is a diagram illustrating a program tree representation of example source code. The source code corresponds to the source code of Figure 1. Each non-leaf node represents an operator, and each child node represents the operands of the operator. Node 301 represents a statement aggregation operator, and nodes 302 and 307 represent operands of the statement aggregation operator. Node 302 represents the assignment operator, and nodes 303 and 304 represent its operands. Node 303 represents the variable "a," and node 304 represents the addition operator of an expression. Nodes 305 and 306 represent the operands of node 304. Node 305 represents the variable "b," and node 306 represents the variable "c." Nodes 307-321 similarly represent the operators and operands associated with the "if statement of node 307. Figure 4 is a diagram illustrating a program tree representation of example new code. The new code corresponds to the new code of Figure 1. Nodes 401-413 represent the operators and operands of the new code.
[0035] Figure 5 is a diagram illustrating a program tree of example source code along with a pending change data structure in one embodiment. The source
code corresponds to the source code of Figure 3, and the change corresponds to the insertion of the new code of Figure 4. Nodes 301-321 represent the unchanged source code. Node 501 represents the pending change data structure. Node 501 may have a child node for each pending change. In this example, node 502 is the only child node because only one change is pending. Node 502 indicates that the change type is an insertion, which may be stored as an attribute of the node. Child nodes 503 and 401 are the operand nodes of the change "operator." Node 503 points to the insertion point within the source code (i.e., after node 310). Nodes 401-413 represent the new code to be inserted. Figure 6 is a diagram illustrating a program tree of example source code along with a pending change list in another embodiment. In this example, the new code is represented as text, rather than as a program tree, as indicated by node 504 and block 505, in contrast to nodes 401-413 of Figure 5.
[0036] Figure 7 is a block diagram illustrating a modification to example source code. In this example, the user has selected to modify the "if statement. The change system has converted the "if statement of the program tree into text, referred to as "liquefaction" or "unstructuring" of the structured representation. The change system then allows the user to modify the text using conventional non- structured text processing techniques. When the user has completed the modification, the change can be committed. The commitment can be signaled explicitly or implicitly. For example, the user can select a commit button to explicitly signal or select another statement to implicitly signal. In this case, since the condition portion of the "if statement is not syntactically correct, the change is stored in the pending change list. Figure 8 is a diagram illustrating an example program tree with a pending change. In this case, the unstructured text representing the modified text is stored in block 805 below the replace node 802. The source code itself contains nodes 301-321 to represent the entire unmodified source code.
[0037] Figure 9 is a block diagram illustrating components of a pending change system for source code in one embodiment. The pending change system 900 includes controller 901 , change source code component 902, parse source code component 903, display source code component 904, and source code store
905. The controller provides a user interface through which a user can create, store, and modify source code. The controller may implement traditional structured editor techniques in addition to techniques made available as a result of the pending change list. The change source code component is invoked to allow a user to change the source code and to save unrecognized changes in a pending change list. The parse source code component is invoked to perform the syntactic analysis to determine whether source code is consistent with the syntax of the programming language. The display source code component is invoked to display source code along with any pending changes. The source code store stores the source code of the computer programs. The source code store may be a conventional storage device accessed through a conventional file system.
[0038] The pending change system may be implemented on a computer that may include a central processing unit, memory, input devices (e.g., keyboard and pointing devices), output devices (e.g., display devices), and storage devices (e.g., disk drives). The memory and storage devices are computer-readable media that may contain instructions that implement the pending change system. In addition, the data structures and message structures may be stored or transmitted via a data transmission medium, such as a signal on a communications link. Various communications links may be used, such as the Internet, a local area network, a wide area network, or a point-to-point dial-up connection.
[0039] Figure 10 is a flow diagram illustrating the change source code component of the change system in one embodiment. The component displays the source code including indications of any pending changes, receives a change type command, and inputs the change. The component then determines whether the change is to be committed. If committed, then the component parses the source code with the change. If the source code with the change is syntactically correct, then the component effects the change. If the change is not syntactically correct or has not yet been committed, then the component stores the change in the pending change list associated with the source code. The component is passed a copy of the source code along with its associated pending change list. In block 1001 , the component invokes the display source code component to display the source code along with the pending changes. In decision block 1002, if the user indicates to
insert text, the component continues at block 1003, else the component continues to identify the type of change and to process accordingly as indicated by the ellipsis. A user can indicate the insertion command by selecting an insertion point and starting to enter text. A user can indicate the delete command by selecting a range of text and pressing the delete key on a keyboard. In block 1003, the component receives the new code as text to be inserted at the insertion point. In decision block 1004, if the user indicates to commit the change, then the component continues at block 1005, else the component continues at block 1008. In block 1005, the component parses the changed source code. In decision block 1006, if the changed source code was successfully parsed, then the component continues at block 1007, else the component continues at block 1008. In block 1007, the component updates the source code to reflect the insertion of the new code and completes. This updating may include changing the program tree representation of the source code that is stored in the source code store. In block 1008, the component stores the change in the pending change list associated with the source code and updates the source code store as appropriate and then completes.
[0040] Figure 11 is a flow diagram illustrating a display routine associated with an assignment operator in one embodiment. Each operator type may have an associated display routine. To display the source code, the change system invokes the display routine associated with the root node of the program tree. That display routine effects the display of that node by invoking the display routines for each of its operand nodes, which in turn invokes the display routines of its operands. Each display routine may invoke a display pending change component to apply the effect of any pending changes relating to the display of the source code for that node. For example, if a pending change is to change an assignment operator to an equal to operator within the condition of an "if statement, the routine would update the generated display representation accordingly. In block 1101 , the routine invokes the display routine for the left operand of the assignment operator. In block 1102, the routine displays the effect of any pending changes relating to the position immediately before the assignment operator. For example, the routine may display text corresponding to a pending insertion. In block 1103, the routine displays the assignment operator. In block 1104, the routine displays the effect of any pending
changes relating to the position immediately after the assignment operator. For example, if the assignment operator is within the range of a pending delete, then the routine may suppress the display of the assignment operator. In block 1105, the routine invokes the display routine for the right operand of the assignment operator and then returns.
[0041] Although the present invention has been described in terms of a preferred embodiment, it is not intended that the invention be limited to these embodiments. Modifications within the spirit of the invention will be apparent to those skilled in the art. For example, one skilled in the art will appreciate that changes can be specified in many different ways, such as via a keyboard, a mouse, a handwriting tablet (e.g., using gestures), voice recognition, and so on. The scope of the present invention is defined by the claims that follow.