Monday, December 17, 2012

Alpha-Beta and Heisenberg's Uncertainty Principle

"The lack of empirical support for the CAPM may be due to the inappropriateness of some assumptions made to facilitate the empirical analysis of the model." ~Jagannathan and Wang (1996)

In physics, complementarity is a basic principle of quantum theory closely identified with the Copenhagen interpretation, and refers to effects such as "wave–particle duality," in which different measurements made on a system reveal it to have either particle-like or wave-like properties.

Niels Bohr is usually associated with this concept, which he developed at Copenhagen with Heisenberg as a philosophical adjunct to the mathematics of quantum mechanics and in particular the Heisenberg uncertainty principle. The Heisenberg uncertainty principle states that certain pairs of physical properties, like position and momentum, cannot both be known with precision. That is, the more precisely one property is known, the less precisely the other property can be known.

In Chinese philosophy, "yīn yang" is used to describe how polar or seemingly contrary forces are interconnected and inter-dependent in the natural world, and how they give rise to each other in turn. Yin yang are complementary opposites within a greater whole. Everything has both yin and yang aspects, although yin or yang elements may manifest more strongly in different objects or at different times. Yin yang constantly interacts, never existing in absolute stasis as symbolized by the Taijitu symbol.

A similar paradox exists within the CAPM paradigm involving the relationship between the concept of "beta," as determined by the market portfolio, and "alpha," which loosely represents "a proxy for manager skill". As is inferred by our prior posting, "The CAPM Debate and the Search for 'True Beta'", the yin yang "whole" relates to the "True Beta" concept which Jagannathan and Wang (1996) theorized must encompass "the aggregate wealth portfolio of all agents in the economy".

Moreover, one could apply aspects of "beta" to the symbology associated with "yin," which is usually characterized as slow, diffuse, tranquil, femininity and night; and apply aspects of "alpha" to the symbolism of "yang," which by contrast is characterized as fast, hard, focused, masculinity and day.

Schneeweis (1999) investigates this alpha-beta paradox in his article, "Alpha, Alpha, Whose got the Alpha?" wherein he writes about the problem of measuring "alpha" by questioning "how to define the expected risk of the manager’s investment position".

When marketing "alpha" managers often assume "the reference benchmark is the appropriate benchmark and that the strategy has the same leverage as the benchmark". Unfortunately, "[w]ith the exception of a strategy that is designed to replicate the returns of the benchmark, the alpha generated by this approach is essentially meaningless". Hence, investors often mistakenly rely on a single-index model as the benchmark from which to gauge the factors "driving the return of the strategy," when often a "multi-factor model should be used to describe the various market factors that drive the return strategy".

The problem is that statistically it is "better to over-specify a model… than to under-specify. If the model is over-specified, many of the betas will simply be zero. However, if under-specified, there is the possibility of significant bias".

Which brings us back to the Heisenberg uncertainty principle...

Just like the physical properties of position and momentum cannot both be known with precision, the properties of "alpha" and "beta" also cannot be measured precisely. This statement can be interpreted in two different ways:

According to Heisenberg it is impossible to determine simultaneously both properties with any great degree of accuracy or certainty. However, according to Ballentine this is not a statement about the limitations of a researcher's ability to measure particular quantities of a system, but it is a statement about the nature of the system itself as described by the equations.

Alpha Alpha Whose Got the Alpha - Schneeweis


References:
Ballentine, L.E. The statistical interpretation of quantum mechanics, Rev. Mod. Phys. 42, 358–381 (1970).

Bohr, Niels. "Atomic Physics and Human Knowledge," p. 38.

Heisenberg, W. "Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik," In: Zeitschrift für Physik. 43 1927, S. 172–198.

Jagannathan, Ravi; McGrattan, Ellen R. (1995). "The CAPM Debate" Federal Reserve Bank of Minneapolis Quarterly Review, Vol. 19, No. 4, Fall 1995, pp. 2-17.

Jagannathan, Ravi; Wang, Zhenyu (1993). "The CAPM is Alive and Well" Research Department Staff Report 165. Federal Reserve Bank of Minneapolis.

Jagannathan, Ravi; Wang, Zhenyu (1996). "The Conditional CAPM and the Cross-Section of Expected Returns" Journal of Finance, Vol. 51, No. 1, March, pp. 3-53.

Schneeweis, Thomas (1999). "Alpha, Alpha, Whose got the Alpha?" University of Massachusetts, School of Management (October 5, 1999).

Sunday, December 9, 2012

Solving Data Governance by Scaling Agile/Scrum

"Do not repeat the tactics which have gained you one victory, but let your methods be regulated by the infinite variety of circumstances." ~Sun Tzu, The Art of War

Financial industry response to new regulations is always interesting and the Dodd-Frank Act is no exception. Peel back the brouhaha, however, and one discovers a mission statement you would think financial institutions would want to embrace:

“Reduce risk, increase transparency, promote market integrity.”[1]

The challenge, it seems, is implementing regulatory compliance while encouraging innovation, which at first glance represents a dichotomy. The cost of stricter regulations, however, is a tax on inefficiency. Effectively, compliance encourages firms to:
  • Assess both the design and operating effectiveness of internal controls 
  • Understand the flow of transactions including technological aspects 
  • Evaluate controls designed to prevent or detect fraud and other risks 
  • Rely on management’s work based on competency and transparency 
  • Scale requirements considering the size and complexity of the company 
Not surprisingly, Protiviti's 2011 Sarbanes-Oxley Compliance Survey found that, after the first year, most companies view that the benefits outweigh the costs, and are continuing to leverage compliance efforts to improve their organizations.[2]
  Does Regulation Substitute or Complement Governance

When all is said and done, regulatory requirements comes down to data management. Legislation like Sarbanes-Oxley and Dodd-Frank have ushered in the necessity of adopting a data governance program to align information accountabilities amongst stakeholders, and to foster intelligent collaboration between the business and technology.
“Data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise. Data governance ensures that data can be trusted and that people can be made accountable for any adverse event that happens because of low data quality. It is about putting people in charge of fixing and preventing issues with data so that the enterprise can become more efficient. Data governance also describes an evolutionary process for a company, altering the company’s way of thinking and setting up the processes to handle information so that it may be utilized by the entire organization. It’s about using technology when necessary in many forms to help aid the process. When companies desire, or are required, to gain control of their data, they empower their people, set up processes and get help from technology to do it.”[3] 
Key is providing checks and balances between those who create/collect information, and those who consume/analyze information. In any enterprise, much less a large institution, this is not an easy task.

Some stakeholders are concerned with operational systems and data; while others care mostly about analysis, reporting, and decision-making. In fact, the needs of stakeholders who are concerned about data quality and controlling access to information may conflict with stakeholders who want to increase the ability to acquire and share content, records, and reports. In addition, these needs must consider risk management, data security, and legal issues. To make matters more complicated, stakeholders tend to have different vernaculars to describe their assumptions, requirements, drivers, and constraints.

The question is how to best implement data governance within an organization? It is one thing for a company to desire or be required “to gain control of their data,” but it is all together another issue to “empower their people” and do it in practice.

The answer to the above question may exist in applying Agile/Scrum methodologies and scaling the agile mindset across the enterprise by implementing a matrix organization.

Agile/Scrum Basics and Empirical Control Processes

For those not familiar with the term, agile is a “lightweight” methodology based on iterative and incremental development, where requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. The approach evolved in the 1990s as a reaction against “heavyweight” methods, which are characterized as regimented and micromanaged, i.e., the traditional waterfall model of development.

Adaptive methods (change driven approach) focuses on adapting quickly to changing realities. Predictive methods (plan driven approach), in contrast, relies on analysis and planning, and often institutes a change control process to try and manage the project’s outcome. The irony with the latter method is that planning for delivery of requirements often takes place when the least is known about the best possible solution, and how to implement the desired outcome (ref: cone of uncertainty).

Figure 1. Iron Triangle Waterfall / Agile Paradigm Shift

That said, if both the problem and solution are known, a waterfall approach may be more suitable. However if there are unknowns, an agile approach allows incremental maturation and implementation. In reality, methods exist on a continuum from adaptive to predictive; therefore, the best method for managing a project depends on the context of the situation.

Scrum is one form of agile that works well when the problem is known but the solution is unknown, such as in software development. Kanban, on the other hand, generally works well in an operations environment where we know what skills are involved, but not the scope of the work itself. In situations where there are unknown unknowns, such as in research and development, Lean Startup helps facilitate experimentation and validated learning.

Figure 2A. The Spectrum of Process Complexity 
Figure 2B. Development Methods Based on Degrees of Uncertainty

In this article we focus on how to scale Agile/Scrum to implement a data governance program. Scrum itself is a framework within which people can address complex adaptive problems, while productively and creatively delivering products of the highest possible value. Scrum is lightweight, simple to understand, but difficult to master well.

Review of Scrum Basics [4]

Scrum consists of Scrum Teams and their associated roles, events, artifacts, and rules. It employs an iterative, incremental approach founded on three pillars—transparency, inspection, and adaptation—to optimize predictability and control risk.

The Scrum Team is made up of a Product Owner, the Development Team (6 ±3 members), and a Scrum Master. Teams are full-time, self-organizing, and cross-functional. Self-organizing teams choose how best to accomplish their work, rather than being directed by others. Cross-functional teams have all the competencies needed to accomplish the work without depending on others not part of the team. The team model in Scrum is designed to optimize flexibility, creativity, and productivity.

Core Scrum prescribes four events: Sprint Planning, Daily Scrum, Sprint Review, and Sprint Retrospective. The heart of Scrum is a Sprint, a time-box of one month or less during which a useable and potentially releasable product Increment is created.

Sprints have consistent durations throughout a development effort. A new Sprint starts immediately after the conclusion of the previous Sprint. Other than the Sprint itself, which is a container for all other events, each event in Scrum is a formal opportunity to inspect and adapt the product being developed or improve the work process.

Figure 3. Source: Roger W. Brown, MS, CSC, CST http://www.agilecrossing.com/ [5]

Scrum’s artifacts represent work that is useful in providing transparency and opportunities for inspection and adaptation. The Product Backlog is an ordered list of everything that might be needed in the product, and is the source of requirements for any changes to be made to the product. The Sprint Backlog is a subset of Product Backlog items (i.e., requirements) selected for the Sprint plus a plan for realizing the Sprint Goal.

The Sprint Backlog defines the work the Development Team will perform to turn Product Backlog items into a “Done” Increment. The Development Team tracks remaining work for every Daily Scrum and modifies the Sprint Backlog throughout the Sprint. In this way, the Product Increment emerges as the Development Team works through the plan and learns more about the work needed to achieve the Sprint Goal.

Elements of a Successful Data Governance Framework 

Because data management in an organization is a non-linear problem, data governance and stewardship lends itself to an agile method. For example, different stakeholders will have different priorities—some are mainly concerned with data quality or business intelligence, while others are focused on policy enforcement, access control, and/or setting standards.

In order to accomplish successful data governance, clarification of the “why, what, who, where, when, and how” needs to evolve through collaboration across the enterprise. Hence, a data governance program should be designed holistically, taking into consideration the interdependencies of data definitions, standards and rules, decision rights, accountabilities, and process controls. Within the context of a typical enterprise, the complexity of such a design requires a structure and methodology to guide the effort, and generate worthwhile results.

Program delivery structure 

Borrowing from Project Management Institute’s (PMI) concept of a project management office (PMO), a data governance office (DGO) serves as the data governance's organizational body with responsibility for overseeing the initiative and related communications. Because data governance may involve a collection of related projects managed in a coordinated way, the various project workstreams resemble what PMI describes as program management.

This is not to say we are advocating a traditional project management approach. As discussed in the next section Agile/Scrum can be scaled across an enterprise to achieve data governance and stewardship objectives. In this context, a Product Owner’s role is to maximize the value of work done on behalf of the DGO. To help guide the DGO’s efforts, core Scrum can be extended to include a vision statement, product roadmap and release planning. See The Roadmap to Value.[6]

Just like any other meaningful effort, a business case for data governance should articulate the benefits of the project, align the project to enterprise strategy, and justify use of resources. A Vision Statement is owned by the DGO/Product Owner and provides context for how current development supports the overall goal. In effect, it aligns the objectives of data governance with the business case and sets strategic expectations.

A Product Roadmap is a statement of intent that is credible and potentially achievable. It is a planned future laid out in broad strokes and outlines proposed release themes. Typically a Product Roadmap lists high level functionality targeted within a quarterly period, and extends two to four significant feature releases into the future. They are not, however, commitments. They are “desirements” for the future given what is known today.

Release Planning identifies the highest priority features and provides a focal point for the Scrum Team to mobilize around, called a Release Goal. Release Goals may consist of several Sprints with the Release Sprint coming last. Release Planning is the process by which a Product Backlog is managed, and sets up Stage 4 which is Sprint Planning.

Program content development 

While the above discussion provides a “structure” and methodology in which to deliver a data governance program, it does not provide a framework to develop and manage the “content” of such initiative.

Scrum is ideal for new product development such as software. Conversely, the “product features” that data governance is seeking to “release” primarily revolves around Business Process Management (BPM). And while technology systems may need to be developed, implemented, and/or upgraded in order to achieve the goals of a data governance and stewardship initiative, recall Steve Sarsfield’s (2009) definition:
“Data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise. Data governance ensures that data can be trusted and that people can be made accountable for any adverse event that happens because of low data quality. It is about putting people in charge of fixing and preventing issues with data so that the enterprise can become more efficient.” 
With Sarsfield’s definition in mind, the development of a data governance initiative is largely based on business analysis. The International Institute of Business Analysis’ (IIBA) “Business Analysis Body of Knowledge” (BABOK) provides a holistic and flexible framework around which a data governance initiative can be mapped.

Figure 5. Adapted from graphic based on IIBA BABOK diagram [7]

Leveraging the BABOK framework, the DGO designs, governs, and manages the data governance program through the establishment of a governance backlog, rules of engagement, performance controls, and communications management.

A Governance Backlog consists of Epics (e.g., Use Cases) captured as modified User Stories using the “ABC” approach. That is, if we do A, then we can expect B, which should lead to C. The “content” of these Epics should cover at least the following:
  • Policies and guidelines (including regulations) 
  • Stakeholder accountabilities and decision rights 
  • Requirements (business, stakeholder, functional, non-functional, transition) 
  • Data standards (rules and definitions including metadata) 
  • Data processes and controls (gap analysis, solution approach) 
  • Data risk management (legal, audit and compliance) 
  • Data repositories and technology systems (systems of record) 
  • Critical success factors (definition of “Done”) 
Data governance BPM requirements encompasses elicitation (collecting, creating), analysis (aligning, prioritizing), and validation (monitoring, enforcing) of rules involving data (e.g., business rules, data standards, data ownership, data classification, data quality, data usage, data access, authentication, entitlements, system of record, golden copy, etc.).

In the scaled Agile/Scrum structure contemplated in this article, the DGO, other than design, does not perform the work of developing and deploying the program, but participates by driving, guiding, and governing through various mechanisms.

Rules of Engagement are a key consideration in a robust data governance program. Who has the right to make what types of decisions and when, including protocols that should be followed, needs to be understood by all stakeholders. This can be formal or informal. Related to Rules of Engagement is Stakeholder Accountabilities which involves formalizing responsibilities for program participants. Issues with data can often be traced to unclear accountabilities somewhere along the data process flow. Any time new processes, practices or controls are introduced, stakeholder accountability needs to be assigned.

Performance Control is another key responsibility of the DGO, and is necessary to ensure that data governance activities are transparent and executed as effectively as possible. This task involves determining which metrics will be used to measure the work performed by program participants, keeping in mind the Agile principle of “barely sufficient” rather than “gold-plating” to stay practical and efficient.
“The more robust a process or tool, the more you spend on its care and feeding, and the more you defer to it. With people front and center, however, the result is a leap in productivity.”[8] 
Last but not least, Communications Management is critical in aligning stakeholders and evolving a common understanding. Because stakeholders represent people from different backgrounds and business domains, robust two-way communications is the most challenging aspect of a data governance program. As we shall see in the next section, a scaled Agile/Scrum approach is well designed to help facilitate two-way communications and accountabilities.

Scaling Agile/Scrum to Implement Data Governance 

The perception in most organizations is that an investment in greater humanity reduces the bottom line, and an increase in business performance takes a toll on the humanity of people at work. Agile inverts this relationship by recognizing that in the knowledge economy, creating healthy workplaces in which people are inspired to deliver solutions is critical to engaging talent,[9] delivering solutions, and gaining a competitive advantage.[10]

The Agile Manifesto[11] reads, in its entirety, as follows:
We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
  • Individuals and interactions over processes and tools 
  • Working software over comprehensive documentation 
  • Customer collaboration over contract negotiation 
  • Responding to change over following a plan 
That is, while there is value in the items on the right, we value the items on the left more. 

With the above in mind, successful data management boils down to individuals and interactions over processes and tools. Realize, data is meaningless unless transformed into information that people put into context and use in a meaningful way.

Dependency Management in a Large Agile Environment

Data stakeholders come from across the organization and include people and systems that create and use data, and those who set rules and requirements for data.

Some stakeholders may be concerned with all of the data in a particular domain, while others will only care about a limited data set over which they assume ownership. Likewise, a data architect may be concerned about metadata considerations, while audit and compliance care mostly with who has access and control. Another consideration is stakeholders being left out of the loop on data-related decisions. Other participants, such as technology, may serve as proxies for actual stakeholders.

Aside from these issues, the majority of enterprises are not disciplined in how they track information and do not assign accountabilities for managing specific data assets.

Be that as it may, organizations typically have data stewards along various points within the data process flow. Stewards are often not formally assigned accountability but rather take on the responsibility of managing data pursuant to their own needs. These are the people that, in normal course of action, are involved in the definition, production and usage of data. Not only are they involved, they are de facto the people making decisions about data, whether responsibility is formally assigned or not.

While data stewards play an integral role in data governance, the responsibility for conducting and managing stakeholder analysis, and making sure all data stewards are represented appropriately and accountable is the DGO. In addition, the DGO needs to articulate and advocate the value of data governance and stewardship activities, and provide ongoing stakeholder communication, access, record-keeping, and education (CARE). Utilizing soft-skill to influence alignment of knowledge and practices, the DGO/Product Owner serves as a liaison from stakeholders to data stewards and vice-versa.

The remaining question, then, is how can a DGO/Product Owner practically manage such a complex process, especially in an organization that embraces an Agile/Scrum approach.

Matrix organization model 

On of the more exciting challenges for Agile enthusiasts is scaling such practices across an enterprise. Having recently been involved in a project involving many of the same data management concerns described above, Kniberg and Anders’ (2012) white paper on “Scaling Agile @ Spotify with Tribes, Squads, Chapters & Guilds” is inspiring.

The model described represents a matrix organization (see Figure 6), and suggests a means by which to design a robust data governance and stewardship program.[12] These concepts can also be applied to a traditional enterprise for data governance.

 Figure 6. Based on Kniberg and Anders (2012). “Scaling Agile @ Spotify” 

The basic unit of development is a Scrum Team called a “Squad”.[13] Each Squad is a self-contained, cross-functional, self-organizing team with the skills needed to execute and be responsible for a long-term mission aligned to the goals of a business case. Because each Squad focuses on one mission related to a business case, they become experts in that area.

Squads are aggregated into collections called “Tribes” which work in related areas. Each Tribe has a Tribe Lead who is responsible for providing the best possible habitat for the Squads within that Tribe. Squads in a Tribe work best if physically located in the same office so as to promote collaboration between the Squads. Tribes hold gatherings on a regular basis where they show the rest of the Tribe what they have delivered, what they are working on, and what others can learn from what they are doing. The white paper describes Tribes as being “incubators” for Squads, which is designed to feel like a mini-startup.

The downside to autonomy is a loss of economies of scale. For example, a tester in Squad A may be wrestling with a problem that the tester in Squad B solved last week. To mitigate this issue, and to better integrate the enterprise and foster better communications, Chapters and Guilds are formed to glue the organization together. This structure provides for some economies of scale without sacrificing too much autonomy.

Figure 7. Based on Kniberg and Anders (2012). “Scaling Agile @ Spotify”

A Chapter is a family of people who have similar skills working within the same general competency area within the same tribe. Each Chapter meets regularly to discuss their area of expertise and their specific challenges. Leading the Chapter is a supervisor with traditional functional manager responsibilities called a Chapter Lead. As explained by the white paper, the Chapter Lead at Spotify is also part of a Squad involved in day-to-day work to help “ground” the role.

A Guild is a community of interest consisting of people that cuts across Tribes and the organization, and who want to share knowledge and practices. Each Guild has a “Guild Coordinator,” and may include all related Chapters working within a competency or functional area. However, anybody who is interested in the knowledge being shared can join any Guild.

In terms of a matrix organization, think of the vertical dimension as the “what” and the horizontal dimension as the “how”. The matrix structure matches the “professor and entrepreneur” model, where the Product Owner is the “entrepreneur” or “product champion” focusing on delivering a great product, while the Chapter Lead is the “professor” or “competency leader” focusing on technical excellence.[14] 

There is a healthy tension between these roles, as the entrepreneur tends to want to speed up and cut corners, while the professor tends to want to slow down and build things properly. The matrix structure ensures that each squad member can get guidance on “what to build next” as well as “how to build it well”. Both aspects are needed.

Caveat! The risk with this model is that the architecture of a system is exposed to issues and risks if nobody focuses on the integrity of the system as a whole. This concern is akin to the problem facing data governance across an organization.

To mitigate this risk, the white paper discusses a role at Spotify called “System Owner” which is recommended to consist of a developer-operations pair in order to benefit from both perspectives. The System Owner is not a bottleneck or ivory tower architect, but a “go-to” person(s) for any technical or architectural issues. Responsibilities include subject matter expertise, coordination, documentation, stability and scalability. To coordinate work on high-level architectural issues across multiple systems, there is also a Chief Architect role. This role reviews development of new systems to make sure common mistakes are avoided, and that systems are aligned with the architectural vision.

DGO Scrum of Scrums

Figure 8 illustrates the concept of leveraging the Spotify matrix organization, and scaling Agile/Scrum to implement data governance whereby the DGO acts in the capacity of a “Scrum of Scrums”. For example, the DGO Squad may be composed of Guild Coordinators (GC) from the data steward guild (yellow) and data stakeholder guild (green), Chapter Leaders (CL) involved in data management concerns, as well as the Chief Architect (CA). Note: in this example Systems Owners (SO) are represented vis-à-vis the data steward guild and operations architecture guild (blue).

Figure 8. Adapted from Kniberg and Anders (2012). “Scaling Agile @ Spotify”

One of the major concerns with large complex organizations is dependencies. In fact, a key reason for creating a data governance and stewardship program is to solve data dependencies that block or slow progress, as well as improve data quality and internal controls. A major responsibility of the DGO is to inventory and manage data dependency requirements, as well as manage traceability—that is, create and maintain a mapping of data relationships between stakeholders.

The key focus of the DGO Squad is to identify and resolve dependencies throughout the organization with respect to data management issues (red lines). A common source of dependency issues at many companies is development versus operations (blue lines) where a “handoff” occurs with associated friction and delays.

The white paper describes how Spotify minimizes this friction by having the operations’ job be primarily a support function (in the form of infrastructure, scripts, and routines) whereby the Development Squads release the code themselves. The issue we have with this approach is that it may not stand up to scrutiny under SOX, especially in a financial institution. Instead, we recommend that, akin to the DGO Scrum of Scrums, a Dev-Ops Scrum of Scrums smooth the road to production releases.

Figure 9. Adapted from Kniberg and Anders (2012). “Scaling Agile @ Spotify” 

The key in any organization is finding the right balance. On one hand the goal is to have teams to work as autonomously as possible in self-contained, cross-functional, self-organizing Squads. On the other hand, without practices to bridge each Squads’ work, it becomes problematic to eliminate cross-team dependencies. The model described above should help facilitate robust data governance practices and even lead to innovation.

Some Concluding Thoughts on Data Governance

The Dodd-Frank Act was enacted to reduce risk, increase transparency, and promote market integrity within the financial market system. Contemplate that…

Scrum is founded on empirical process control theory and utilizes an iterative, incremental approach to optimize predictability and control risk. Empiricism asserts that knowledge comes from experience, and making decisions based on what is known. The three pillars of empirical process control are: transparency, inspection, and adaptation.

The Dodd-Frank Act is effectively advocating continuous improvement. The key to continuous improvement is constant optimization of data into meaningful information. Quality data reduces risk, increases transparency, and promotes market integrity. How is this accomplished? By implementing a data governance and stewardship program.

Still…

Because information doesn’t exist in a vacuum, a robust data governance and stewardship program must, at its core, be about empowering people in an enterprise to create solutions. This is how raw data becomes meaningful information.

The effort touches upon a range of disciplines encompassing two facets. The first is tangible, sensible, and definable—getting stuff done. These tangibles include enterprise architecture, data architecture, data management, metadata management, information technology, corporate and project management.

The other facet is the human side—the intangibles. Understanding intangibles is about anthropology, communications, quality, risk, and governance. It’s about individuals and interactions over processes and tools.


References: 

Henrik Kniberg and Anders Ivarsson (2012). “Scaling Agile @ Spotify with Tribes, Squads, Chapters & Guilds” October 2012. Source: http://blog.crisp.se/2012/11/14/henrikkniberg/scaling-agile-at-spotify

David A. Becher and Melissa B. Frye (2010). “Does Regulation Substitute or Complement Governance?” (August 20, 2010). Journal of Banking and Finance, Forthcoming. Available at SSRN: http://ssrn.com/abstract=1108309

Eric Babinet and Rajani Ramanathan (2008). “Dependency Management in a Large Agile Environment” pp.401-406, Agile 2008.

Steve Sarsfield (2009). The Data Governance Imperative a Business Strategy for Corporate Data. Ely: IT Governance Pub.

Gwen Thomas. “The DGI Data Governance Framework” The Data Governance Institute.

Ken Schwaber and Jeff Sutherland (2011). “The Definitive Guide to Scrum: The Rules of the Game” Scrum.org, October 2011.

International Institute of Business Analysis (2009). A Guide to the Business Analysis Body of Knowledge (BABOK guide), Version 2.0. Toronto, Ont: International Institute of Business Analysis.

Project Management Institute (2008). A Guide to the Project Management Body of Knowledge (PMBOK guide), Fourth Edition. An American National Standard, ANSI/PMI 99-001-2008.

Footnotes: 

[1] Federal Register / Vol. 77, No. 68 / April 9, 2012 / Rules and Regulations (77 FR 21278) I. Background. “Title VII of the Dodd-Frank Act amended the Commodity Exchange Act to establish a comprehensive new regulatory framework for swaps. The legislation was enacted to reduce risk, increase transparency, and promote market integrity within the financial system…”

[2] Podcast: “Perspectives on Sarbanes-Oxley Compliance – Where Companies are Saving Costs and Achieving Greater Efficiencies” Source: http://www.protiviti.com/en-US/Pages/PodcastDetail.aspx?AssetID=16

[3] Sarsfield, Steve (2009). The Data Governance Imperative a Business Strategy for Corporate Data. Ely: IT Governance Pub.

[4] Ken Schwaber, Jeff Sutherland (2011). “The Definitive Guide to Scrum: The Rules of the Game” Scrum.org, October 2011.

[5] Roger W. Brown. “Introduction to Scrum V 1.3 Revised 2012” AgileCrossing.com.

[6] Mark Layton (2012). Agile project management for dummies. Hoboken, N.J.: Wiley. http://platinumedge.com/

[7] International Institute of Business Analysis (2009). A Guide to the Business Analysis Body of Knowledge (BABOK guide), Version 2.0. Toronto, Ont: IIBA.

[8] Ibid. Layton (2012).

[9] “In an Agile environment, the development team is ‘the talent’.” Ibid. Layton (2012).

[10] Gil Broza (2012). The human side of Agile: how to help your team deliver. Toronto: 3P Vantage Media.

[11] Agile Manifesto Copyright 2001: Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas. This declaration may be freely copied in any form, but only in its entirety through this notice.

[12] PMI PMBOK describes this structure as a “projectized organization”.

[13] Scrum recommended practices states that there should be one Product Owner and one Scrum Master per Scrum Team. Unfortunately, in practice, this is often not the case. Companies sometimes combine the role of Scrum Master and Product Owner, which is inadvisable. Other organizations spread Scrum Master and/or Product Owner amongst multiple teams, which is a slightly better practice. Figure 6 illustrates one Product Owner assigned full time to each Squad, with a Scrum Master assigned to two teams; however, we are not necessarily advocating this approach, although it may be considered better than other “real world” practices. Interestingly, as described in the white paper Spotify assigns a Product Owner to each Squad, but does not formerly assigned “squad leader” (i.e., Scrum Master). Rather, Spotify provides Squads with access to an agile coach who helps them evolve and improve their way of working, and can even help run meetings.

[14] Poppendieck, Mary, and Thomas David Poppendieck. (2010). Leading lean software development: results are not the point. Upper Saddle River, NJ: Addison-Wesley.

Monday, November 26, 2012

The CAPM Debate and the Search for "True Beta"

What has happened is that we’ve used these assumptions for so long that we’ve forgotten that we’ve merely made assumptions, and we’ve come to believe that the world is necessarily this way.”
                                                                        ~Resistance is Futile: The Assimilation of Behavioral Finance

Conventional investment theory states that when an investor constructs a well-diversified portfolio, the unsystematic sources of risk are diversified away leaving the systematic or non-diversifiable source of risk as the relevant risks. The capital asset pricing model (CAPM), developed by Sharpe (1964), Lintner (1965) and Black (1972) [zero-beta version], asserts that the correct measure of this riskiness is its measure known as the "beta coefficient" or just "beta".

Effectively, beta is a measure of an asset’s correlated volatility relative to the volatility of the overall market. Consequently, given the beta of an asset and the risk-free rate, the CAPM should be able to predict the expected return for that asset, and correspondingly the expected risk premium as well.

This explanation is textbook. However, unbeknownst to most, there has been a long running argument in academic circles on the CAPM and other pricing models, even within the milieu of traditional investments. Without going into the details of this debate, certain empirical studies have revealed "cross-sectional variations" in the CAPM questioning the validity of the model.

In response to Fama and French's (1992) challenge, Jagannathan and Wang (1996) theorized that “…the lack of empirical support for the CAPM may be due to the inappropriateness of some assumptions made to facilitate the empirical analysis of the model. Such an analysis must include a measure of the return on the aggregate wealth portfolio of all agents in the economy.”

Financial institutions have not been left behind by these evolving academic theories. Index creation and benchmarking has become standard fare. Since the introduction of 'exchange traded funds' (ETFs), a veritable industry has developed around the "multiple beta" concept. But by no means has the plethora of these instruments captured every aspect of the aggregate wealth portfolio of the global economy, although given new product development it would seem this is the undeclared objective.

Such backdrop is the principal context which gives impetus to the notion of "exotic betas". The term, a relatively recent addition to the investment lexicon which evolved from ideas advanced by proponents of alternative investments, suggests that certain alternative investment assets and/or strategies, representing commonly pursued market paradigms, can be identified, tracked and replicated employing a predefined passive approach/model similar to traditional index construction.

This leaves open the question as to whether institutions, through sophisticated financial engineering, can truly capture in a passive way all possible sources of return in the global economy. Or, does some aspect which the industry loosely calls alpha (i.e., skill-based returns) always remain outside the grasp of such institutions’ arbitrary models of beta?

Jagannathan - The CAPM Debate

References:
Black, Fischer (1972). “Capital Market Equilibrium with Restricted Borrowing” Journal of Business 45, July, pp. 444-455.

Fama, Eugene F.; French, Kenneth R. (1992). “The Cross-Section of Expected Stock Returns” Journal of Finance 47, June, pp. 427-465.

Jagannathan, Ravi; McGrattan, Ellen R. (1995). “The CAPM Debate” Federal Reserve Bank of Minneapolis Quarterly Review, Vol. 19, No. 4, Fall 1995, pp. 2-17

Jagannathan, Ravi; Wang, Zhenyu (1993). “The CAPM is Alive and Well” Research Department Staff Report 165. Federal Reserve Bank of Minneapolis

Jagannathan, Ravi; Wang, Zhenyu (1996). “The Conditional CAPM and the Cross-Section of Expected Returns” Journal of Finance, Vol. 51, No. 1, March, pp. 3-53.

Tuesday, November 13, 2012

Case Study: Roadmap to Dodd-Frank Compliance

Procrastination is opportunity's assassin. ~Victor Kiam, American entrepreneur

Last week, on the day of President Obama's re-election, CommodityPoint, a division of energy and utilities consultancy UtiliPoint International Inc., published a report on the current state of progress in implementing Dodd-Frank compliant processes and technology.[1] The study's survey responses indicated "widespread doubt... as to when the regulations will eventually come into force," and/or "significant amount of confusion as to the requirements and burdens".

CommodityPoint's warnings to the industry could not be more clear:
In reviewing the data, there appears to be a general lack of urgency on the part of many market participants...
[A] lack of movement by a significant number of market participants (especially in the ‘end-user’ segment) is creating a large backlog of work across the industry... this circumstance should be considered a significant risk as companies consider their compliance planning and efforts.

As with all regulations, companies exposed to Dodd-Frank rules will be considered guilty until they prove themselves innocent... continuously and consistently.
Clearly the time for "wait-and-see" is over. Firms need to apply a structured and proactive approach going forward despite the difficulty in anticipating how regulations will eventually evolve. Key is preparation. Firms that jumped started their initiatives based on rule proposals are going to be better prepared than those left scrambling after final rules are published.

To be blunt, notwithstanding that the industry is taking the battle to the courts, time is starting to run out...

According to Davis Polk's October 2012 Progress Report, 45 out of the 90 Dodd-Frank required Title VII rules have been finalized. However, this halfway mark doesn't tell the whole story. The CFTC has finalized 80% or 34 out of 43 rules with only 9 proposed rules having missed their deadline. The SEC, on the other hand, is behind the eight ball with 9 rules finalized and 11 proposed out of the 20 required for which deadlines have passed.[2]

One advantage that results from basing work on final rules is increased certitude. Firms are now better positioned to determine where gaps exist in their processes, prioritize activities required to comply with regulations, and convert requirements into implementation tasks. Key is having a methodology which considers the broad contours of areas impacted, along with the recognition that each new rule brings about new challenges. A robust methodology, in turn, generates the roadmap.

Establishing project scope

This case study applies IQ3 Group's strategy assessment framework as a method for scoping regulatory projects and forging
a pathway to compliance. The underlying approach involves topic decomposition until all relevant areas of investigation are defined in sufficient detail. The result of this analysis is then mapped to deliverables that need to be developed.

Figure 1


The CFTC has identified 8 categories and 38 areas where rules are necessary under Dodd-Frank Title VII and Title VIII.[3] A closer look, however, reveals 61 Federal Register (FR) publications of which 17 are still in the proposal stage with the balance representing final rules. The table below lists the 61 final and proposed rules within the categories established by the CFTC. Within each category, the rulemaking area is sorted by the effective date. Proposed rules are sorted by date of publication.

Figure 2


While the above table is helpful in providing a high level overview of Title VII and Title VIII CFTC rules, it still does not provide detailed descriptions of the rules, or a calendar of compliance dates which may be different than the effective date of the rule.

The following compliance matrix and calendar is a more complete overview and is also available in a spreadsheet format.

Dodd-Frank Act - Final and Proposed Rules Compliance Matrix and Calendar


Hypothesis and data collection

In order to focus the collection of data and begin to analyze and develop solutions, there must be a basis from which to drive an understanding of the issues. What are the potential operational gaps, problems or opportunities inherent in the rulemaking areas? What procedures and processes do we need to institute or re-engineer? What technology systems do we need to implement? What data will we need to support new processes and technology systems? Is compliance with these new regulations the responsibility of a particular department, or a firm-wide organizational commitment?

Formulating hypotheses---educated hunches to be tested---provides a focus for data collection and gives form to findings which ultimately leads to conclusions, and the development of recommendations. Subject matter expertise is often required to formulate relevant hypotheses. From hypotheses, specific questions can be developed that will help drive data collection.

Figure 3


Industry comments to proposed rules and corresponding CFTC responses within the FR publications is an excellent source of insight. The following discussion on affiliates that qualify for the end-user exception[4] is a good example:
Section 2(h)(7)(D)(i) of the CEA provides that an affiliate of a person that qualifies for the end-user exception… may qualify for the exception only if the affiliate… uses the swap to hedge or mitigate the commercial risk of the person or other affiliate of the person that is not a financial entity.
As the CFTC reiterates, an affiliate may elect the end-user exception, even if it is a financial entity, if the affiliate complies with the requirement that the swap is used to hedge or mitigate commercial risk; provided, however, that the affiliate is not a swap dealer or major swap participant. Nevertheless, Shell Energy North America (US) raises the issue that:
...potential electing counterparties that centralize their risk management through a hedging affiliate that is designated as a swap dealer or major swap participant may be unable to benefit from the end-user exception. As a result, many potential electing counterparties may need to restructure their businesses and risk management techniques, thereby losing the many benefits of centralized hedging.
Kraft, Philip Morris and Siemens Corp clarify that this concern relates to how treasury subsidiaries function:
...the Commission should exclude wholly-owned treasury subsidiaries of non-financial companies from the ‘‘financial entity’’ definition, to the extent that they solely engage in swap transactions to hedge or mitigate the commercial risks of an entire corporate group. These commenters noted in particular that the treasury subsidiaries may be, or are likely to be, "financial entities" ... because they are predominantly engaged in activities of a financial nature as defined in Section 4(k) of the Bank Holding Company Act.
In response, the CFTC states that it lacks discretion because Congress specifically defined financial entities (which cannot use the end-user exception) to include swap dealers and major swap participants. Further, Congress specifically outlines who may qualify as an affiliate eligible for the end-user exception. The specificity with which Congress defines these concepts constrains the CFTC’s discretion in this area. The CFTC, however, notes "it is important to distinguish where the treasury function operates in the corporate structure" and then establishes means by which concerns can be alleviated:
Treasury affiliates that are separate legal entities and whose sole or primary function is to undertake activities that are financial in nature as defined under Section 4(k) of the Bank Holding Company Act are financial entities as defined in Section 2(h)(7)(C)(VIII) of the CEA because they are ‘‘predominantly engaged’’ in such activities. If, on the other hand, the treasury function through which hedging or mitigating the commercial risks of an entire corporate group is undertaken by the parent or another corporate entity, and that parent or other entity is entering into swaps in its own name, then the application of the end-user exception to those swaps would be analyzed from the perspective of the parent or other corporate entity directly.
In other words, a parent company or other corporate entity predominantly engaged in manufacturing, agriculture, retailing, energy may elect the end-user exception for inter-affiliate swaps. The CFTC explains how:
If the parent or other corporate entity then aggregates the commercial risks of those swaps with other risks of the commercial enterprise and hedges the aggregated commercial risk using a swap with a swap dealer, that entity may, in its own right, elect the end-user exception for that hedging swap. The parent or other corporate entity in the example is not a ‘‘financial entity’’ as defined in Section 2(h)(7)(C)(VIII) of the CEA, because that entity is ‘‘predominantly engaged’’ in other, nonfinancial activities undertaken to fulfill its core commercial enterprise purpose. However, if the parent or other corporate entity, including, for example, a separately incorporated treasury affiliate, is a ‘‘financial entity,’’ then that entity cannot elect the end-user exception unless one of the specific affiliate provisions of the statute, Section 2(h)(7)(C)(iii) or Section 2(h)(7)(D), apply.
Generally speaking, the CFTC notes that Congress did not treat inter-affiliate swaps differently from other swaps in Section 2(h)(7) of the CEA. Accordingly, if one of the affiliates is not a financial entity and is using the swap to hedge or mitigate commercial risk, even if the other affiliate is a financial entity, the non-financial entity affiliate may elect the end-user exception and neither affiliate needs to clear the swap. Based on this analysis, such entities face a strategic choice...


Findings and conclusions


Assuming that a corporate entity engaged in commercial activities is structured to include a treasury subsidiary engaged in swaps which hedges the commercial risks of the corporate group, such subsidiary can: (i) continue to operate as a "financial entity" and if applicable register as a swap dealer or major swap participant; or (ii) seek to elect the end-user exception by restructuring where in the corporate structure swaps are transacted. But we are getting ahead of ourselves...

After collecting data from questions based on our hypotheses, the next step is synthesizing such data to derive findings and galvanize conclusions about what was learned. Findings and conclusions are defined as follows:
  • Finding—is a summary statement derived from raw data that directs our thinking toward solutions or opportunities regarding a problem.
  • Conclusion—is a diagnostic statement, based on the data and findings that explains problems or opportunities and is significant enough to warrant action

Figure 4



In performing an in-depth examination of the final end-user exception rule to the clearing requirement for swaps we can arrive at findings and conclusions appropriate to the context of a market participant that may fall within such category. To accomplish this task, IQ3 Group assembled a variety of decision flow charts including the "end-user exception" (see Figure 5 below).[5]

Below we step through the analysis of §39.6 "Exceptions to the clearing requirement":
Under §39.6(a)(1), a counterparty to a swap may elect the exception to the clearing requirement on condition that either: [1] under §39.6(a)(1)(i) it is not a "financial entity" as defined by CEA §2(h)(7)(C)(i)*; or [2] under §39.6(a)(1)(ii) it is using the swap to hedge or mitigate commercial risk as provided by CEA §2(h)(7)(A)(ii) or §39.6(b)(1)(ii)(B); or [3] provide, or cause to be provided information to a registered swap data repository (SDR) or, if no SDR is available to the Commission. A counterparty that satisfies this criteria and elects the exception is an "electing counterparty".

*Under §39.6(d), for purposes of CEA §2(h)(7)(A), a financial entity because of CEA §2(h)(7)(C)(i)(VIII) shall be exempt if: (i) it is organized as a certain type of bank [e.g., organized as a bank as defined in §3(a) of the Federal Deposit Insurance Act. See §39.6(d)(i)]; or (ii) has total assets of $10 billion or less on the last day of the entity's most recent fiscal year.

When electing the exception under CEA §2(h)(7)(A), one of the counterparties (the "reporting counterparty") shall provide, or cause to be provided information to a registered swap data repository (SDR) or, if no SDR is available to the Commission. Under §39.6(b)(3) each reporting counterparty needs to have a reasonable basis to believe the electing counterparty meets requirements for an exception to the clearing requirement.[6]

Under §39.6(b) the reporting counterparty will provide information in the following form and manner: (i) notice of the election of the exception; (ii) identity of the electing counterparty; and (iii) the following information [continues below next para], unless...

...such information has previously been provided by the electing counterparty in a current annual filing pursuant to §39.6(b)(2), which states that an entity under this section may report the information annually in anticipation of electing the exception for one or more swaps. Further, any such reporting shall be effective for 365 days following the date of such reporting, provided the entity shall amend such information as necessary to reflect any material changes to the information reported.

Under §39.6(b)(iii) the following information shall be provided by the reporting counterparty:

(A) Whether the electing counterparty is a "financial entity," and if yes, whether it is: (1) electing in accordance with §2(h)(7)(C)(iii) or §2(h)(7)(D); or (2) exempt from the definition of "financial entity" as described in §39.6(d).

(B) Whether the swap(s) for which the electing counterparty is electing the exception are used by the electing counterparty to hedge or mitigate commercial risk as provided in §39.6(c). [See §39.6(c) discussion below.]

(C) How the electing counterparty generally meets its financial obligations associated with entering into non-cleared swaps by identifying one or more of the following categories, as applicable: (1) a written credit support agreement; (2) pledged or segregated assets; (3) a written third-party guarantee; (4) the electing counterparty's available financial resources; or (5) means other than those described.

(D) Whether the electing counterparty is an entity that is an issuer of securities registered under section 12 of, or is required to file reports under section 15(d) of, the Securities Exchange Act of 1934, and if so: (1) the relevant SEC Central Index Key number for that counterparty; and (2) whether an appropriate committee of that counterparty's board of directors has reviewed and approved the decision to enter into swaps that are exempt from the requirements of CEA §§2(h)(1) and 2(h)(8).

The following discussion analyzes a key concept pertinent to §39.6(c) "Hedging or mitigating commercial risk":
A swap is deemed to hedge or mitigate commercial risk if such swap:

(i) is economically appropraite to the reduction of risks in the conduct and management of a commercial enterprise where the risks arise from §39.6(c)(i)(A),(B),(C),(D),(E), or (F);

(ii) qualifies as bona fide hedging for purposes of an exemption from position limits; or

(iii) qualifies for hedging treatment under (A) Financial Accounting Standards Board Accounting Standards Codification Topic 815, Derivatives and Hedging (formerly known as FAS 133) or (B) Governmental Accounting Standards Board Statement 53, Accounting and Financial Reporting for Derivative Instruments.

Additionally, a swap is deemed to hedge or mitigate commercial risk is such swap is:

(i) not used for a purpose that is in the nature of speculation, investing, or trading; and

(ii) not used to hedge or mitigate the risk of another swap or security-based swap position, unless that other position itself is used to hedge or mitigate commercial risk as defined by §39.6(c) or §240.3a67-4.

Figure 5


A key finding identified by IQ3 Group regarding how the CFTC approached writing Title VII rules is the CFTC's departure from a legacy approach that relied on the concept of "exclusion from the definition" and "exemption from the definition". As seen from the above analysis, if an entity transacts in swaps it falls under the definition, but can be "excepted from the definition". The burden of proof to elect such exception, however, is upon the entity, who for that reason must continue to collect and report required data. If called upon by the regulators, such recordkeeping is necessary to support the election of the exception.


Generating recommendations

Based on conclusions regarding problems and opportunities, practical recommendations can be generated, evaluated and finalized. The first step is specifying alternatives including next steps, and describing the intended results and benefits related to each alternative. Such analysis should take into account existing conditions, as well as barriers and resource constraints. Recommendations should cover the topics and outputs originally scoped out, and trace back to address root findings:

Figure 6


Each regulation that is promulgated brings about new challenges. The underlying impetus of Dodd-Frank regulations, however, is clear. By imposing "robust recordkeeping and real-time reporting regimes" regulators have signaled their intent on ushering stricter risk management across the financial system supported by robust data governance and straight through processing.

With that in mind, CommodityPoint's fulmination merits attention:
Give the potential legal and financial exposures of non-compliance... it is incumbent upon all levels of leadership, from risk managers to C-level executives, to create a culture of [Dodd-Frank] compliance within their companies. ...while the regulators' response will not be immediate, it will most likely be aggressive once in motion; and once a company is identified as one that has not been compliant in the past, that company will likely remain under CFTC scrutiny for a very long time.

Footnotes:

[1] Reames, P. and Bell, E. (2012). "2012 Dodd-Frank Market Survey and Report" CommodityPoint, sponsored by RiskAdvisory, November 2012

[2] Davis Polk & Wardwell LLP (2012). "Dodd-Frank Progress Report October 2012" Generated using the Davis Polk Regulatory Tracker™

[3] See: http://www.cftc.gov/LawRegulation/DoddFrankAct/Rulemakings/index.htm

[4] Federal Register / Vol. 77, No. 139 / Thursday, July 19, 2012 / Rules and Regulations (77 FR 42559)

[5] Decision flow chart based on Final Rule §39.6 "Exceptions to the clearing requirement" (77 FR 42590)

[6] The term "reasonable basis to believe" imposes a requirement upon the reporting counterparty that information from the electing counterparty supporting §39.6(b)(3) needs to be collected and maintained.

Saturday, November 3, 2012

Tower of Babel, Semantics Initiative, and Ontology

What's in a name? That which we call a rose by any other name would smell as sweet. ~ William Shakespeare 
The beginning of wisdom is to call things by their right names. ~ Chinese Proverb

At a symposium held by the Securities Industry and Financial Markets Association (SIFMA) in March 2012, Andrew G. Haldane, Executive Director of Financial Stability for the Bank of England, gave a speech titled, “Towards a common financial language”.[1] Using the imagery of the Tower of Babel, Mr. Haldane described how…
Finance today faces a similar dilemma. It, too, has no common language for communicating financial information. Most financial firms have competing in-house languages, with information systems silo-ed by business line. Across firms, it is even less likely that information systems have a common mother tongue. Today, the number of global financial languages very likely exceeds the number of global spoken languages.

The economic costs of this linguistic diversity were brutally exposed by the financial crisis. Very few firms, possibly none, had the information systems necessary to aggregate quickly information on exposures and risks.[2] This hindered effective consolidated risk management. For some of the world’s biggest banks that proved terminal, as unforeseen risks swamped undermanned risk systems.

These problems were even more acute across firms. Many banks lacked adequate information on the risk of their counterparties, much less their counterparties’ counterparties. The whole credit chain was immersed in fog. These information failures contributed importantly to failures in, and seizures of, many of the world’s core financial markets, including the interbank money and securitization markets.

Why is this? One would think that the financial industry would be in a great position to capitalize on the growth of digital information. After all, data has been the game changer for decades. But Wall Street, while proficient at handling market data and certain financial information, is not well prepared for the explosion in unstructured data.

The so-called “big data” problem of handling massive amounts of unstructured data is not just about implementing new technologies like Apache Hadoop. As discussed at the CFTC Technology Advisory Committee on Data Standardization held September 30, 2011, there is significant confusion in the industry regarding “semantics”.[3]

EDM Council - FIBO Semantics Initiative

The “semantic barrier” is a major issue in the financial industry, necessitating the creation of standards such as ISO 20022 to resolve.[4] For example, what some participants in the payments industry call an Ordering Customer, others refer to a Payer or Payor, while still others refer to a Payment Originator or Initiator. The context also plays a role here: the Payment Originator/ Initiator is a Debtor/ Payor in a credit transfer, while that Payment Originator/Initiator is a Creditor/Payee in a direct debit.[5]

It should therefore be apparent that intended use of systems is reliant on “human common sense” and understanding. Unfortunately, especially within the context of large organizations or across an industry, boundaries of intended use are often not documented and exist as “tribal knowledge”. Even if well documented, informal language maintained in policies and procedures can result in unintentional misapplication, with consequences no less hazardous than intentional misapplication.


Overcoming semantic barriers...

If your avocation[6] involves organizing information and/or modeling data and systems you invariably start asking epistemo-logical[7] questions, even though such questions may not be immediately practical to the task at hand: What is knowledge? How is knowledge acquired? To what extent is it possible for a given concept, either physical or abstract, to be known? Can computers understand meaning from the information they process and synthesize knowledge? “Can machines think?”[8]

Such questions are impetus to an ongoing debate about “the scope and limits of purely symbolic models of the mind and about the proper role of connectionism in cognitive modeling.” Harnad (1990) proffered such quandary as the Symbol Grounding Problem. “How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads?”[9] The meaning triangle[10] illustrates the underlying problem.

 
Figure 1 – Ogden and Richards (1923) meaning triangle 

Figure 1 is a model of how linguistic symbols stand for objects they represent, which in turn provide an index to concepts in our mind. Note too, that such triangle represents the perspective of only one person, whereas communication often takes place between two or more persons (or devices such as computers). Hence, in order for two people or devices to understand each other, the meaning that relates term, referent and concept must align.

Now consider that different words might refer to the same concept, or worse, the same word could have different meanings, as in our example of the term “orange”. Are we referring to a fruit or to a color? This area of study is known as semantics.[11]


Relating semantics to ontology

As the meaning triangle exemplifies, monikers—whether they be linguistic[12] or symbolic[13]—are imperfect indexes. They rely on people having the ability to derive denotative (ie, explicit) meaning, and/or connotative (ie, implicit) meaning from words/signs. If the encoder (ie, sender) and the decoder (ie, receiver) do not share both the denotative and connotative meaning of a word/sign, miscommunication can occur. In fact, at the connotative level, context determines meaning.

Analytic approaches to this problem falls under the domain of semiotics,[14] which for our purposes encompasses the study of words and signs as elements of communicative behavior. Consequently, we consider linguistics and semiosis[15] to come under the subject of semiotics.[16] Semiotics, in turn, is divided into three branches or subfields: (i) semantics; (ii) syntactics;[17] and (iii) pragmatics.[18]

Various disciplines are used to model concepts within this field of study. These disciplines include, but are not necessarily limited to, lexicons/synsets, taxonomies, formal logic, symbolic logic, schema related to protocols (ie, syntactics), schema related to diagrams (ie, semiosis), actor-network theory, and metadata [eg, structural (data about data containers), descriptive (data about data content)]. In combination, these various methods form the toolkit for ontology work.

Ontology involves the study of the nature of being, existence, reality, as well as the basic categories of being and their relations. It encompasses answering metaphysical[19] questions relating to quiddity, that is, the quality that makes a thing what it is—the essential nature of a thing.

Admittedly, there are divergent views amongst practitioners as to what constitutes ontology, as well as classification of semiotics and related methodologies. To be sure, keeping all these concepts straight in one’s mind is not without difficulty for those without formal training. Further, “ontology has become a prevalent buzzword in computer science. An unfortunate side-effect is that the term has become less meaningful, being used to describe everything from what used to be identified as taxonomies or semantic networks, all the way to formal theories in logic.”[20]

Figure 2 is a schematic diagram illustrating a hierarchical conceptualization of ontology, its relation to epistemology, metaphysics, and semiotics, as well as its relation to cognitive science. Figure 2 also shows how semiotics encompasses linguistics and semiosis.

 
Figure 2 – Conceptualization of epistemology, metaphysics, ontology, and semiotics


Knowledge representation and first order logic

Knowledge representation (KR) is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate systematic inferences from knowledge elements, thereby synthesizing new elements of knowledge. KR involves analysis of how to reason accurately and effectively, and how best to use a set of symbols to represent a set of facts within a knowledge domain.

A key parameter in choosing or creating a KR is its expressivity. The more expressive a KR, the easier and more compact it is to express a fact or element of knowledge within the semantics and syntax of that KR. However, more expressive languages are likely to require more complex logic and algorithms to construct equivalent inferences. A highly expressive KR is also less likely to be complete and consistent; whereas less expressive KRs may be both complete and consistent.

Recent developments in KR include the concept of the Semantic Web, and development of XML-based knowledge representation languages and standards, including Resource Description Framework (RDF), RDF Schema, Topic Maps, DARPA Agent Markup Language (DAML), Ontology Inference Layer (OIL), and Ontology Web Language (OWL).[21]

 
Figure 3 – Adapted from Pease (2011) [Figure 15] and Orbst (2012)

SUMO, an open-source declarative programming language based on first order logic,[22] resides on the higher end of the scale in terms of both formality and expressiveness. The upper level ontology of SUMO consists of ~1120 terms, ~4500 axioms and ~795 rules, and has been extended with a mid-level ontology (MILO) as well as domain specific ontologies. Written in the SUO-KIF language, it is the only formal ontology that has been mapped to all of WordNet lexicon.

Formal languages such as DAML, OIL, and OWL are geared towards classification. What makes SUMO unique from other types of modeling approaches (eg, UML or frame-based), is its use of predicate logic. SUMO preserves the ability to structure taxonomic relationships and inheritance, but then extends such techniques with an expressive set of terms, axioms and rules that can more accurately model geo-spatial, sequentially temporal concepts, both physical and abstract.

Nevertheless, KR modeling can suffer from the “garbage in, garbage out” syndrome. Developing domain ontologies with SUMO is no exception. That is why in a large ontology such as SUMO/MILO, validation is very important.[23]


Models of concepts are second derivatives

Returning to Ogden’s and Richards’ (1923) meaning triangle, the relations between term, referent and concept may be phrased more precisely in causal terms:
  • The matter (referent) evokes the writer's thought (concept). 
  • The writer refers the matter (referent) to the symbol (term). 
  • The symbol (term) evokes the reader's thought (concept). 
  • The reader refers the symbol (concept) back to the matter (referent).

When the writer refers the matter to the symbol, the writer is effectively modeling the referent. The method that is used is informal language. However, without a formal semantic system in which to model concepts, the use of natural language as a representation of concepts will suffer from the issue that informal languages have meaning only by virtue of human inter-pretation of words. Likewise, it is important to not confuse the term as a substitute for the referent itself.

In calculus the second derivative of a function ƒ is the derivative of the derivative of ƒ. Likewise, a KR archetype or replica of a referent (ie, a physical or abstract thing) can be considered a second derivative, whereby the concept is the first derivative, and the model of the concept is the second derivative. What can add to the confusion is the term labeling the referent, versus the term labeling the model of the concept of the referent. One's inclination is to substitute the label for the referent.

Figure 4 – Adapted from Sowa (2000), “Ontology, Metadata, and Semiotics” 

Thus, it is important to recognize that a representation of a thing is at its most fundamental level still a surrogate, a substitute for the thing itself. It is a medium of expression. In fact, “the only model that is not wrong is reality and reality is not, by definition, a model.”[24] Still, a pragmatic method for addressing this concern derives from development and use of ontology.


Unstructured data and a way forward…

Over the past two decades much progress has been made on shared conceptualizations and the theory of semantics, as well as the application of these disciplines to advanced computer systems. Such progress has provided the means to solve the problem of deriving meaningful information from the unstructured schema size explosion (ie, “big data”) overwhelming the banking industry, as well as the many other industries which suffer from the same issue.

The missing link underlying “the cause of many a failure to design a proper dialect... [is] the general lack of an upper ontology that could provide the basis for mid-level ontologies and other domain specific metadata dictionaries or lexicons.” The key then is use of an upper-level ontology that “gives those who use data modeling techniques a common footing to stand on before they undertake their tasks.”[25] SUMO, as an open source formal ontology (think Linux), is a promising technology for such purpose with an evolving set of tools and an emerging array of applications/uses solving real world problems.

As Duane Nickull, Senior Technology Evangelist at Adobe Systems explained, “[SUMO] provides a level setting for our existence and sets up the framework on which we can do much more meaningful work.”[26]


About SUMO:
The Suggested Upper Merged Ontology (SUMO) and its domain ontologies form the largest formal public ontology in existence today. They are being used for research and applications in search, linguistics and reasoning. SUMO is the only formal ontology that has been mapped to all of the WordNet lexicon. SUMO is written in the SUO-KIF language. Sigma Knowledge Engineering Environment (Sigma KEE) is an environment for creating, testing, modifying, and performing inference with ontologies developed in SUO-KIF (e.g., SUMO, MILO). SUMO is free and owned by the IEEE. The ontologies that extend SUMO are available under GNU General Public License. Adam Pease is the Technical Editor of SUMO.  

For more information: http://www.ontologyportal.org/index.html. Also see: http://www.ontologyportal.org/Pubs.html for list of research publications citing SUMO.

Users of SUO-KIF and Sigma KEE consent, by use of this code, to credit Articulate Software and Teknowledge in any writings, briefings, publications, presentations, or other representations of any software that incorporates, builds on, or uses this code. Please cite the following article in any publication with references:

Pease, A., (2003). The Sigma Ontology Development Environment. In Working Notes of the IJCAI-2003 Workshop on Ontology and Distributed Systems, August 9, 2003, Acapulco, Mexico.



Footnotes:
[1] Speech by Mr Andrew G Haldane, Executive Director, Financial Stability, Bank of England, at the Securities Industry and Financial Markets Association (SIFMA) “Building a Global Legal Entity Identifier Framework” Symposium, New York, 14 March 2012.

[2] Counterparty Risk Management Policy Group (2008).

[3] CFTC Technology Advisory Subcommittee on Data Standardization Meeting to Publicly Present Interim Findings on: (1) Universal Product and Legal Entity Identifiers; (2) Standardization of Machine-Readable Legal Contracts; (3) Semantics; and (4) Data Storage and Retrieval. Meeting notes by Association of Institutional Investors. Source: http://association.institutionalinvestors.org/ Document: http://bit.ly/QJIP4i

[4] See: http://www.iso20022.org/

[5] SWIFT Standards Team, and Society for Worldwide Interbank Financial Telecommunication (2010). ISO 20022 for Dummies. Chichester, West Sussex, England: Wiley. http://site.ebrary.com/id/10418993.

[6] The term “avocation” has three seemingly conflicting definitions: 1. something a person does in addition to a principal occupation; 2. a person's regular occupation, calling, or vocation; 3. Archaic diversion or distraction. Note: we purposefully selected this terms because it relate to pragmatics; specifically, the “semantic barrier”.

[7] e•pis•te•mol•o•gy, n., 1. a branch of philosophy that investigates the origin, nature, methods, and limits of human knowledge; 2. the theory of knowledge, esp the critical study of its validity, methods, and scope.

[8] Turing, A.M. (1950). “Computing machinery and intelligence” Mind, 59, 433-460.

[9] Harnad, Stevan (1990) “The Symbol Grounding Problem” Physica, D 42:1-3 pp. 335-346.

[10] Ogden, C., and Richards, I. (1923). The meaning of meaning. A study of the influence of language upon thought and of the science of symbolism. Supplementary essays by Malinowski and Crookshank. New York: Harcourt.

[11] se•man•tics, n., 1. linguistics the branch of linguistics that deals with the study of meaning, changes in meaning, and the principles that govern the relationship between sentences or words and their meanings; 2. significs the study of the relationships between signs and symbols and what they represent; 3. logic a. the study of interpretations of a formal theory; b. the study of the relationship between the structure of a theory and its subject matter; c. the principles that determine the truth or falsehood of sentences within the theory. 

[12] lin•guis•tics, n., the science of language, including phonetics, phonology, morphology, syntax, semantics, pragmatics, and historical linguistics.

[13] Use of term “symbolic” refers to semiosis and the term “sign,” which is something that can be interpreted as having a meaning for something other than itself, and therefore able to communicate information to the person or device which is decoding the sign. Signs can work through any of the senses: visual, auditory, tactile, olfactory or taste. Examples include natural language, mathematical symbols, signage that directs traffic, and non-verbal interaction such as sign language. Note: we categorized linguistics to be a subclass of semiotics.

[14] se•mi•ot•ics, n., the study of signs and sign processes (semiosis), indication, designation, likeness, analogy, metaphor, symbolism, signification, and communication. Semiotics is closely related to the field of linguistics, which, for its part, studies the structure and meaning of language more specifically.

[15] The term “semiosis” was coined by Charles Sanders Peirce (1839–1914) in his theory of sign relations to describe a process that interprets signs as referring to their objects. Semiosis is any form of activity, conduct, or process that involves signs, including the production of meaning. [Related concepts: umwelt, semiosphere]

[16] One school of thought argues that language is the semiotic prototype and its study illuminates principles that can be applied to other sign systems. The opposing school argues that there is a meta system, and that language is simply one of many codes (ie, signs) for communicating meaning.

[17] syn•tac•tic, n., 1. the branch of semiotics that deals with the formal properties of symbol systems. 2. logic, linguistics the grammatical structure of an expression or the rules of well-formedness of a formal system.

[18] prag•mat•ics, n. 1. logic, philosophy the branch of semiotics dealing with causalality and other relations between words, expressions, or symbols and their users. 2. linguistics the analysis of language in terms of the situational context within which utterances are made, including the knowledge and beliefs of the speaker and the relation between speaker and listener. [Note: pragmatics is closely related to the study of semiosis.]

[19] met•a•phys•ics, n., 1. the branch of philosophy that treats of first principles, includes ontology and cosmology, and is intimately connected with epistemology; 2. philosophy, especially in its more abstruse branches; 3. the underlying theoretical principles of a subject or field of inquiry.

[20] Pease, Adam. (2011). Ontology: A Practical Guide. Angwin: Articulate Software Press.

[21] A review of the listed KR approaches is outside the scope of this discussion. See Pease (2011), Ontology: A Practical Guide, ‘Chapter 2: Knowledge Representation’ for a more in-depth discussion/comparison.

[22] While OWL is based on description logic, its primary construct is taxonomy (ie, frame language).

[23] See Pease (2011), Ontology: A Practical Guide, pp. 89-91 for further discussion on validation.

[24] Haldane, Andrew G. (2009). “Why banks failed the stress test” Speech, Financial Stability, Bank of England, at the Marcus-Evans Conference on Stress-Testing, London, 9-10 February 2009.

[25] Duane Nickull, Senior Technology Evangelist, Adobe Systems; foreword to “Ontology” by Adam Pease.

[26] Multiple contributors (2009). Introducing Semantic Technologies and the Vision of the Semantic Web Frontier Journal, Volume 6, Number 7 July 2009. See: http://www.hwswworld.com/pdfs/frontier66.pdf