- General Idea & Rationale
- Software Product / API
- IFRS Accounting statement parser
- Additional Content / Websites
- Monetization Possibility
- Other possible, similar, software products
- Additional web content (not related to software)
- Failed / non-continued efforts
- Educational Videos
Erstellt von Cornelius Koller
General Idea & Rationale
Most online 'creator' persona have some sort of 'core product' they are involved in. These can be things like beauty, sports, personal growth, tech or finance. Usually, this core content is then marketed and distributed in multiple forms on multiple platforms, resulting in multiple income streams from one primary value stream.
Software Product / API
IFRS Accounting statement parser
While publicly listed companies are obliged to publish their IFRS (group) accounting statements, they are not obliged to provide easy-to-use file formats like spreadsheets. Instead, most companies publish PDF files which are usually 300 -400 pages long, while <10 are relevant for the IFRS statements (Balance Sheet, Other comprehensive income, cashflow statement, equity changes and profit and loss statement). Generally speaking, these PDFs are not optimized for automatic data extraction (there may be a rationale behind this). While it is possible to buy the data from companies like Bloomberg, this is usually not possible for retail investors, researchers or ad-hoc queries.
Generally speaking, there are three sub-problems:
- Extract the relevant pages from a 300-400 pages document
- Extract the tables on the relevant pages
- Tag tables with the IFRS component they represent
The first issue requires a classification of the page contents. While it is absolutely possible to perform this operation with a machine learning model, I did not pursure this approach due to the amount of training data needed. Instead, I implemented a relatively simple algorithm that relies on a few assumptions about the document structure:
- The relevant pages have keywords on them, for example "Konzernbilanz" or "Aktiva"
- There is a chapter slide right before the IFRS accounting statements that has multiple keywords followed by numbers on it.
- Irrelevant pages have either no keywords on them, are before the relevant chapter slide and/or have multiple keywords on them.
While there is a number of tools to extract tables from PDFs, they usually rely on specific characters to separate the individual columns. This does not work properly for the accounting reports as they frequently use this layout (Deutsche Wohnen, annual report 2020):
We can see that there is only whitespace in between columns. This is easily readable for humans, but not for computers.
As I am not aware of any tool that is able to separate columns by whitespace, I decided to implement the algorithm myself (it is, however, not perfect, given the relatively short timespan during which I developed it). Basically, the algorithm utilizes the fact that PDF is a graphical format. This means that any text is contained in an object with x and y coordinates. from the information where text is we can derive the information where nothing is and generate column separators from that information. Then we can sort all text into the columns generated by the algorithm.
The output is then an excel file like this (derived from Deutsche Wohnen SE annual report 2020):
The functionality is available as an API, which can be made pay-per-use by selling API keys to potential users. An (unsecured) demo is running here: http://146.148.45.196/
Additional Content / Websites
Simply put, while the API products can be offered as-is, it is also feasible to integrate them into an own website and then put advertisements onto that website.
An example of a simple UI is depicted in this screenshot:
Monetization Possibility
While I do not want to plaster the site with advertisements, there is a possibility to do so in a limited way which does not invade a user’s privacy. The ad partner which could be leveraged for this purpose is Ethical Ads (https://www.ethicalads.io/) which just displays one advertisement on a given web page and does not use tracking cookies (which, apart from being more ethical, has the advantage that no cookie banner is required). Below is a screenshot of an ad (https://bats-core.readthedocs.io/en/stable/index.html):
Other possible, similar, software products
- Web-Scrape the Bundesanzeiger and make information conveniently available
- Web-scrape the BaFin Datenbank on Director's Dealings
→ Both of these have in common that the data is publicly available, but obfuscated, making in impractical for single users to systematically analyze it. This results in an opportunity for automation.
Additional web content (not related to software)
As a hobby I take photos, mostly birds. I decided to put a subset of these pictures on a dedicated Instragram account, which can be found here. However, I am not too optimistic that amateur photos of birds are a popular type of content on Instagram.
From an economic / side-hustle-perspective it is interesting though, as it, if monetized, can be declared as “Freiberufliche Kunst” which, opposed to a “Gewerbliche Tätigkeit” significantly reduces the amount of bureaucracy involved.
Failed / non-continued efforts
Educational Videos
While the idea to create educational videos seemed compelling, I decided not to pursue it any further after a few initial tries. The short video below is the result of one of these trials. While the content is very simple (I just show two commands), the production effort was rather high. Also, the click counts of technological content on, for example, YouTube are notoriously low.
Also, it's really not my thing.