There's a number of fairly deep insights about what Oco does that explain a number of issues in our industry. There's all these stats about how 1/2 or more of BI/DW (data warehouse) projects fail. There's a reason for this: there's a big gap between the IT systems and staff who do BI/DW projects, and the business people who must use the data to make business decisions. The gap runs two ways. The IT folks don't know enough about the business, and the business people don't know enough about the IT systems.
A big part of the value Oco provides is our structured methodology for figuring out the business metrics needed, and our technology for finding them in the data. This allows us to close this gap.
What Oco does is what I call semantic integration.
Now "semantics" is my 2nd least favorite word. My least favorite is "ontology", which is only used to obfuscate from what I can tell. I only use the word "semantics" to distinguish from "syntax", i.e., by semantics I mean "more meaning than just syntax". I hope this becomes clear below.
What most projects (the kind that often fail) do is syntactic integration. IT folks are told business people need more data. So they set out to do what they can do, which is to collect together all the data of the organization in one data warehouse; however, they lack the business knowledge to consolidate disparate information into common concepts. They can access and move the data all into one database, because that's a syntactic problem.
I live in Red Sox Nation (yes I live near Boston, yes we have been sports spoiled for a few years now. Guilty as charged!), so let me illustrate this syntax vs. semantics issue with an example from baseball.
Suppose I have a data file. In it is a bunch of records about baseball player statistics. One of the fields is named "AVG". It is an integer. This is very likely to be the batting average, and we all know that batting average is a number between 0 and 1, typically rounded to 3 decimal places. Batting a thousand is 1.000. Typically this kind of data would be stored as an integer, not a decimal.
Syntactically, I can access the AVG field, and put it in a database column of my data warehouse and I can even do validity checking to make sure no value is negative or above 1000. This is all low-level syntactic stuff.
Now, moving up to the business or "semantic" layer. A real baseball fan or manager, who understands the business, knows that nobody bats above 0.500, and here's an interesting thing,... in fact a batting average isn't even computed for a player until they've had at least 12 at-bats. This allows one to deal with missing data in the data set, or at least some of the missing data. So I hope you see how I needed to understand the data in a more powerful way. I have to get the the business meaning of the data in order to understand how to deal with simple issues like whether it's ok for this data field to be missing.
Ok if you are following my argument, but it goes deeper. A more savvy baseball manager knows that batting average isn't such a great statistic, that on-base percentage is actually far more predictive of whether players help win games. This is real business knowledge. So, if I am trying to evaluate players, I need to compute, from whatever data I have, the on-base percentage. Knowing this I can take data from disparate systems which contains a variety of low level raw stats, and compute the on-base percentage from them. I also need agreement across my team, or league or whatever, that this statistic is one we should all measure in the same way, and when we say "on-base percentage" we should all mean the same thing by that.
This is what I mean by semantic integration. Without this agreement that on-base percentage is the metric of merit, you are stuck at the syntax layer wondering why you don't have compatible data coming in from each of the source systems into the data warehouse. That is, the IT person who hasn't been told to compute on-base percentage doesn't compute it. The data analyst trying to use the DW doesn't find useful semantic metrics like on-base percentage in the data. Rather they find the raw ingredients from which it could be computed, but not consistently or easily. They can't do the needed reconciliations to allow on-based percentage to be computed easily because they're operating below that level of knowledge.
Now flip over to the role of the business department trying to use a data warehouse. The data analyst can't analyze the data because they have to do the semantic integration first. All the data is syntactically integrated, but it's not exactly set up to allow computing the statistic of interest. A very complex integration must happen in the queries on the data warehouse to get to something meaningful to the business. This is why data warehouses spawn data marts by the way, the data mart provides some of the mismatch. It is used to fill in some of this missing business semantics.
This ability to understand the semantic integration that is needed, to facilitate a discussion among the business people who need the data and extract from them what are these metrics and then use them to drive the lower level syntactic-level integration - this is truly what makes Oco work. This is a key part of what makes our solutions valuable.