Processing COBOL Copybooks in Snowflake

Dated: May-2022

Photo by Shane McLendon on Unsplash

Over the years, I have lived to see application that were built and operating on systems like VAX/VMS, Sun system evolve into Windows and Unix systems, then to finally move into Cloud. However, Mainframes still continues to be the workhorse for many Financial instutions and also other verticals.

From these articles:

it is very much evident Mainframe will be present for a forseeable future. Hence it would not be a surprise to see Mainframe/Mid-Ranges as a data source in the Data Analytics space. Few years back, at a bank, we had cobol copybooks being ingested into Hadoop Data Lake for deriving insights from user transactions to validate Fraduelent Activities.

What are use cases using this data ?

What this got to do with Snowflake?

Snowflake can parse and process unstructured data, as it is a data-lake platform. The approach to do this has been demonstrated in my previous article “Processing MS-Access database files natively in Snowflake”. This would then mean that we can now natively parse and process COBOL Copybook files in Snowflake.

What are the current options?

As of today, there are various vendor products exists like Talend Data Mapper, Strimm etc.. There are also vendors like Model9, which offer capabilities to backup data onto AWS Cloud, as reflected in this article:

How to Enable Mainframe Data Analytics on AWS Using Model9

Are there are any open-source options?

As of today, I came across 2 opensource options:

  • COBOLJSONifier: A python based copy book parser.
  • JRecord: A Java based library used for parsing copybooks.

Demo?

To demonstrate this, I am sharing my Colab Notebook here. This was developed using Snowpark(Python) using the CobolJsonifier library. I used sample EBCIDIC file that was shared in the project, for this demonstration. And with current python file reading limitations, I was still able to facilitate the parsing the file and loading into Snowflake.

What have I learnt?

By processing the files natively, access to the copybook can be limited to only a few actors (Support, exporting tool and Snowflake); hence tighter control.

COBOLJSONifier: REDEFINES clause is not working.

JRecord: While this project seem to have some interest in the community, it is not organized efficiently and lack of documentation is a concern.

While I was able to parse and process the sample ebcidic files (Cobol copybook), I was limited in the data volume and variations of data record types.

Final Thoughts

Though I could not carry on with my experiement to the fullest extent, I would say at the offset the ability to ingest data from copybook into Snowflake is plausible. I am hoping a team who is looking at such a possibility to take on this effort and showcase the potential. Who knows, this does seem like an oppurtunity to build a service.

Venkatesh Sekar is a Senior Data cloud Architect at Snowflake. He is involved in helping Snowflake GSI Partners to be successful at thier client solution & implementation of #Snowflake — the Data Cloud.

--

--