Using OCI data (pt1) – Getting costing data

My previous blogs have covered different ways you can get data out of OCI. This is really useful for small activities like taking backups, cloning NSG’s etc, but can the data in itself be of use? of course it can, and I’m going to cover some real-world usage cases to bring this to life including:

  • Getting more informative costing information
  • Notifying a user of an aged API key
  • Automatically disabling idle users

The process I’ve followed with each of the scenarios above is to first get and store the data, and then do something useful with it. In this case I built an APEX app on top of an Autonomous database.

The first scenario is the costing data. Oracle includes some very basic cost reporting functionality in the OCI portal.

The cost analysis report is great for being able to quickly see your spend, but it requires you to go into the system, and it’s not easy to get down to the detail behind this, like in the graph about, exactly which database was responsible for that additional spend between the 9th and 18th January.

Additionally, Oracle provide you with the raw data which underpins these graphs, in format of a series of CSV files

Each file covers a number of hours (approx. 4hrs for usage reports, and 6hrs for cost reports) giving you about 10 files per day. Each file contains an entry for every item which has a usage or cost, for every hour of the day. For example, a windows compute instance is a cpu, memory, OS license, boot volume and network card, giving you 5 entities, so 120 rows of costing, and 120 rows of usage data, per day. As you can imagine, this will be a large volume of data even for a small estate.

In order to get a day view, you would need to download all the files for that day, consolidate them into a single file, and then use something like Excel to view the data, and filter on it to find that database from above.. and then it really only starts to make sense when you can compare this data against another date and do some clever analysis like a crosstab to visualise the output.

However, once again the Oracle API’s come to the rescue, as you can download all these files using a background script. Oracle provides 2 command.

Fundamentally, I’ve used an Always Free Autonomous Database, which comes with a pre-built (and Oracle Managed) APEX instance. You get 1OCPU, and 1Tb or storage bundled in this always free instance, which is more than enough for this requirement.

The first command Oracle provides, gets a list of all the files available to download. Oracle retains each file for 6 months, and so you can check this list once a day and get all the files for yesterday quite easily

oci os object list –namespace-name bling –bucket-name $TENANCY –all –output table –query ‘data[].{name:name,”time_created”:”time-created” }’

Oracle have called the bucket “bling”, which feels like a developer was having a bit of a laugh, and you pass the bucket-name in as the OCID for your tenancy. I’ve added a query so this will return 2 columns, the file name and the date-time it was created.

I then converted this into a simple insert statement to store the data (making sure I don’t get duplicates) in a table called oci_files

NameNull?Type
 FILENAME* VARCHAR2(100)
 LOADED CHAR(1)
 DATE_CREATED DATE
 FILE_ROWS NUMBER
 LOADED_ROWS NUMBER
* Unique constraint to prevent duplicates
insert into oci_files values (‘<FILENAME>’,N’,'<DATE LOADED>’,0,0);

The LOADED field lets me know which file I’ve then subsequently loaded, and I’ll fill the file_rows and loaded_rows later so I know if the file has been loaded in it’s entirety.

The second process I run then loops around my oci_files table, finding any row where LOADED=N. For each for, I download the file, and then I’ve loaded each line of the file into a table:

oci os object get –namespace-name bling –bucket-name $TENANCY  –name <FILE> –file <OUTPUT>

This writes a file “<OUTPUT>” which I then just insert as-is into a table with a big varchar. I thought about using a LOB, but this is pure text, so using a big varchar saves me complexity later when processing it… and yes Oracle have called the namespace “bling”

NameNull?Type
FILENAME VARCHAR2(100)
PROCESSED CHAR(1)
DATA1 VARCHAR2(10000)

My data for the current file is now in my database, so executing a simple PLSQL procedure can extract the comma-seperated data, afield at a time, and then insert it into a table for the costing or usage data.

The other alternative would be to do this using a tool like SQL Loader. The problem with the files is that you cannot determine their exact row-length, as it concatenates to the end of each line a set of tags. The tags are consistent through the file, but if you modify the tag data (in your entire estate), then the next file will have different column lengths. So, for cost data you get this:

NameNull?Type
REFERENCENO VARCHAR2(150)
TENANTID VARCHAR2(150)
INTERVALUSAGESTART DATE
INTERVALUSAGEEND DATE
SERVICE VARCHAR2(150)
COMPARTMENTID VARCHAR2(150)
COMPARTMENTNAME VARCHAR2(150)
REGION VARCHAR2(150)
AVAILABILITYDOMAIN VARCHAR2(150)
RESOURCEID VARCHAR2(150)
BILLEDQUANTITY NUMBER(30,8)
BILLEDQUANTITYOVERAGE NUMBER(30,8)
SUBSCRIPTIONID VARCHAR2(150)
PRODUCTSKU VARCHAR2(150)
DESCRIPTION VARCHAR2(150)
UNITPRICE NUMBER(30,8)
UNITPRICEOVERAGE NUMBER(30,8)
MYCOST NUMBER(30,8)
MYCOSTOVERAGE NUMBER(30,8)
CURRENCYCODE VARCHAR2(150)
BILLINGUNITREADABLE VARCHAR2(150)
SKUUNITDESCRIPTION VARCHAR2(150)
OVERAGEFLAG VARCHAR2(150)
ISCORRECTION VARCHAR2(150)
BACKREFERENCENO VARCHAR2(150)
FILENAME VARCHAR2(150)

Followed by columns representing each tag, which can change from file-to-file, but is consistent within a file. The approach I used grabs the first 25 fields for a costing file, and (for now) ignores everything after the 25th field (BACKREFERENCENO) as this is an undetermined quantity of tag data (as an example, if you’ve used 2 customer tags, plus the standard 2 Oracle generated ones, then the cost file will be 29 columns long. If you then add a new tag anywhere in your tenancy, then every file generated from that point on will have 30 columns etc).

The final step in this is that I count the number of rows in the file (and subtract 1 for the header), and compare it against the number of rows I’ve inserted into my cost_data table. The 2 should be identical, and if not something has gone wrong and you can address this on a file by file basis, and then update the LOADED flag to Y so I know that file isn’t going to get loaded again.

This is just a “loop and repeat” process now, but I end up with all my cost and usage data in a couple of tables in my database.

I also then load the tag data by comparing the end of the file and just looping through the remaining columns, inserting a row in a tag table for each entity (identified by it’s OCID) which has a tag which means 3 tags against a compute instance end up with 3 lines in my tag table. This isn’t as easy as you need to accommodate for weird values which might have commas in them, and differentiate that from a comma separating a field.

That is “almost” it. The remaining issue is that everything we’ve loaded is referenced by it’s OCID, rather than it’s nice name which we would recognise. Using the structured query I’ve demonstrated previously, I loop through each unique OCID and then update that into a table in the database, noting that while the OCID for an item can’t change, it’s name could (so you need to check every unique OCID frequently).

oci search resource structured-search –query-text “query all resources where identifier='<OCID>’” –query “data.items[*] [].{identifier:identifier, Name:\”display-name\” }”

Now I’ve got 4 tables, cost_data, usage_data, tag_data, friendly_names, which I can query and gives me all my usage and cost data, with the associated metadata which I can use later to make it useful.

The next step is to create a simple APEX application, to allow me to easily access this data, in a user-friendly way.. which I will cover in my next blog.

Leave a comment