Reflecting on the Job Application Tracker

January 5, 2025
ReflectionWebLong
Screenshot of source code for job application tracker

All the way back in 2022, I began working on this job tracking website. I began this work as my final project for the University of Helsinki’s Open Data Structures courses, although it aligned at a time where I was looking for employment (at a time when any employment was getting difficult to find) so it became a project that I could use in practice. Aside from these courses, the extent of my formal education at the time amounted to my high school computer science course; I hadn’t even begun my bachelor’s degree at that point! Needless to say, there were issues in the code quality, UX quality, QA quality, so on and so forth. In addition, when I did find employment and later began my bachelor’s, this project faced neglect - along with all the others I built at that time - and became severely out of date. I came back to this one because it has the most potential; the most I could improve on my other personal projects are the quality of the UI and perhaps general coding practices, but certainly not face any unique problems like I had when working on my job tracker.

When I first wrote the job tracker it was very simple: The database consisted of tables “users” and “jobs.” “Users” is fairly straightforward, while “jobs” contained everything: The actual information about the job, but also dates of each interview and who they were with; anything not contained in the set fields were to be put in the ‘notes’ section. On the frontend this was a WYSIWYG editor with basic formatting capabilities; sufficient, but not ideal. In the overhaul I describe below I made several adjustments, most notably the creation of separate interview instances, as well as support for files.

Background

The first step in overhauling the backend (aside from getting everything up-to-date) was to get my database structure up to snuff. I elected to start the database from scratch, so as to not deal with overly complex migrations. I also decided to become an early adopter of Sequelize V7. Although it’s still in alpha, the Typescript migration done in V7 comes as a benefit to me, and the decorators in model definitions were (to me, at least) easier to work with as opposed to the attribute objects used in V6. Admittedly, I could have easily used sequelize-typescript for these reasons, but by using V7 I can omit an additional package to maintain and, as an added bonus, get ahead of the curve in this new version with a project that likely won’t see much use beyond myself.

The models were straightforward, yet naturally saw the most changes. The only changes my User model experienced was the inclusion of a one-to-many relationship with Interview and later the two File models, as well as instance models for all the one-to-many relationships - these instance methods were included in all models that were the one in one-to-many relationships. My “Job” model, as well as any other reference to this model in both the client and server, was renamed to “Application” and underwent quite a lot of changes in structure. Firstly, references to interview times and contacts were removed in favour of a one-to-many relationship with the Interview model. I also removed the job description and compensation fields; the former of which can easily be served as a file and the latter is not a guarantee, and therefore better placed with the job description or within the notes. Additionally, the verbiage for the application date changed to applyDate, and fields for assessmentDate, offerDate, interviewDate, and rejectionDate were added (which are set to the date that the application assumed a specific status). Finally, I added a file column, which would store an array of base64 strings (more on that later). The Interview model was similar; it contains a foreign key to the application it belongs to (in addition to a user foreign key), the contact (kept ambiguous for interviewer, hiring manager, or recruiter), date and time of the interview, as well as a optional website (could be a zoom/teams/meet link, or a link to the company’s website). Just like the Application model, the Interview model contains a column storing an array of base64 strings.

With the models done, the services were the next step. The user and auth services went unchanged, while the application service was refactored to leverage association methods and handled the assignment of the various date attributes for new and updated applications. The interview service is nearly identical to the application service, bar the date assignment. There is also an additional method that returns all interviews for a given application; so as to reduce calls to the server, this isn’t used in the frontend. The only real change initially done to the routes was implementing the interview routes. The endpoint responsible for fetching interviews belonging to an application was placed in the application router, while the interview routes were nearly identical to their application counterparts (save for sorting and filtering - they are sorted from earliest to latest by default).

Similarly to their services and routes, testing for the users and authentication, as well as the token extractor (which too remained unchanged) persisted from the previous project. The application testing mostly saw changes in the structure of an application given the change in the model, but it also added a series of tests concerning the /applications/:id/interviews route. The interview testing, like its services and routes, mirrored that of the application testing for all comparable operations.

Frontend

The frontend for this project was easily the most work. Once I updated dependencies I began updating my custom types to match the models defined in the backend (ie updating the Application interface and adding the Interview interface, plus types to define the objects of a new application/interview to be sent to the server). I then got to adding the interview services and state which, you guessed it, were very similar to those for applications. Finally, by adding the getAllInterviews method to the hook responsible for the initial data fetch, I was ready to move on to revamping the UI.

Prior to beginning any coding (even in the backend), I created a basic wireframe for some of the core pages my program would have. They were created with mobile-first development in mind, so at this stage they primarily served as an idea of what elements I needed to include. I elected to nix, add, and change many of the pages. The landing page is gone, so when first visiting the page the user is directed to the login page or, if already logged in, the dashboard. Speaking of the dashboard, it is a new addition that contains a count of applications created for the current day, a pie chart showing the breakdown of applications by status, as well as counts for each status; in addition, it shows up to three applications in a table, with the same for interviews. If no interviews are available, the table is replaced with a link to the new interview page. Similarly, if no applications are available, all stats (pie, counts, and table) are replaced with a link to the new application page. The single application and new application pages didn’t change much; they were simply made to align with the new data being saved. The single interview and new interview were kept pretty similar to their application counterparts with changes based on the differing data; notably, information from the interview’s application is used on the single interview page, and a dropdown of applications is present on the new interview page. Finally, the applications page did not change, nor did the log in or register pages, and the interviews page matched the applications page with changes for differing data (the interviews also take information from its application for their list items), as well as the removal of filtering & sorting. I won’t go into much detail on building these pages, as most of it was simply trial-and-error in styling as far as the UI goes, but I did implement conditional rendering in a few places: The applications and interviews pages provide a link to the form page for their respective type, the new interview page provides a link to the new application page if no applications are present, and the single application page provides a link to the new interview page if no interviews have been added for it.

Before I did a lot of this building, however, I abstracted some logic from existing pages. Namely, I moved the fetching of data on page load to a hook called useFetch. The fetchData function is still quite long - I did later write an api endpoint to fetch all the data in one call - but it considerably simplifies an already large app file. I also moved most of the matching logic for id slugs into functions within a useFind hook. The usage of the useMatch hooks stayed in the app file, but the finding of a specific application or interview looks much neater as a result.

I did, of course, write more custom hooks and expand on existing custom hooks as the frontend took shape. Some were small, such as formatting an application’s status or abstracting input management; others were larger, such as formatting a date object into short, long, time, datetime, and datetime input formats or converting files to and from base64. One of these longer hooks is the aforementioned useFind. When building out the single application, and later the various interview pages, I realized I needed a way to use the relation between applications and interviews. The most logical option for me was to filter the state for the instances I needed. I decided to go this route - as opposed to making additional server calls - to limit loading times. Searching through an array takes time, but even passing this information from the server would take the time it takes to look through an array plus the latency from server to client - this isn’t awful when I’m the one using it, but it still provides an improvement in operation time - plus this results in less chatter. As a result, I made a function within useFind to filter for all applications containing a passed applicationId value, as well as a function to get the application based on a passed applicationId value.

In a similar vein, I decided to move the filtering and sorting of applications to the frontend in their own custom hooks. In addition to the speed improvements, keeping a “working copy” of applications contained to the applications list allows changes to be made to it without it persisting when moving to another page (for example, the dashboard). Obviously, keeping an entire copy of the applications list isn’t ideal (perhaps implementing an infinite scroll with incremental inclusions will help?), but neither is extra server calls; based on the anticipated usage of this app (read: Me) and the users’ (me) needs, extra memory used is preferable to additional calls to the server. There do come times, however, when a middle ground can be reached. Initially, my dashboard’s stats came from saving the length of the result of different filter callbacks. These were shared with the stat boxes as well as the pie chart, but the filters repeated every time the dashboard was revisited with or without a page load. That’s a lot. After the program was completed and deployed I realized the error in my ways and developed functions in the server to send the tallies needed. From there those tallies are saved in state and are incremented and decremented as applications are added, updated, and removed. Instead of several O(n) operations an indeterminate amount of times, there are 5 O(n) operations once per page load (which is the most efficient option in this circumstance) and at most two O(1) operations as the data becomes out of date.

My unit testing was mostly straightforward, but an issue did arise: A lot of components and even hooks heavily relied on my global state, which itself is completely reliant on its own component and a hook. I’m not sure if it’s possible to get it to work (would useRender and useRenderHook work together so I could add seed data?), and I’m also sure I have some work to do in the way of decoupling my components from state management/business logic beyond what I’ve already done. For the time being, though, I decided to test the components and hooks I could, knowing that I would still be doing end-to-end testing.

End-to-End

Working on end-to-end testing was enlightening to say the least. The first trouble I encountered was running a page load logged out to the register page. Instead of staying on that page, it redirected back to the login page. I eventually realized this was due to the fallback redirection when trying to fetch a user’s id from localstorage. Adding a simple check to only redirect if the path was not the log in page or registration page fixed this issue. Barring the need to add ids to from fields (oops) and test ids elsewhere, the next issue I faced was regarding my file uploading.

When I first decided to include files in this program, I came to the decision to store them with the rest of the application/interview they went with as a base64 string. I came to this decision for a few reasons: First, only pdf files would be added, and as I’m not intending to scale this program any time soon, assumed that these pdfs would be text only and fairly basic - the kind you’d get after downloading a simple document from word or google docs; second, I am not too keen on the idea of using a file storage provider - primarily due to hosting fees and secondarily due to the fact that, once again, the files are relatively small and there won’t be a very large amount of them due to this being mainly for personal use. This approach only seemed to work when uploading one file. The request became too big when trying to upload even two pdf files. So, I went back to the drawing board. My research mentioned the overhead in base64 versus binary representation, so I knew I should change that. I also knew breaking the large transaction (application/interview plus its files) into several smaller ones (ie each file is a separate transaction joined by a one-to-many relationship to their application/interview) would help. Splitting files into their own tables would also aid in performance on the database level, since only some situations (editing one and viewing one) would require the files to be present in the front end - meaning only a fraction of files are actually retrieved at a given time as opposed to all whenever a full list of applications/interviews are fetched.

Where I truly went wrong, however, was how I represented the file’s data. Instead of base64, I encoded the file data as a binary string which resulted in the opposite of my goal and gave even larger sizes. I couldn’t even upload one file on its own as it’s just too big. More research (and reviewing a postgres wiki page I skimmed when I first decided on adding files many months and other responsibilities ago) led me to where I went wrong in the first place: preferring strings over blobs. Blobs store data in binary where one word is 8 bits. Base64 uses 6-bit words. When you’re dealing with a lot of words (which is true for any complex file, even a small pdf) that increase of words needed to account for the 2 less bits per word adds up. As for why the binary string resulted in even more overhead: I misunderstood the meaning of a binary string. It is still bound by using ASCII characters, so when dealing with something like a pdf where the data isn’t entirely text-based, things get messy. The conversions work, but the resultant size is massive.

I think a big part in why it took so long for me to realize the error in my second approach was in my testing. I wasn’t using files in my integration testing (where the size constraints come into play) that were representative of what I’d be using. I was using a minimal base64 string of a pdf file converted to a binary string for my backend integration testing. It worked there, but when it came time to start on end to end testing and I was using a representative file, the size became an issue yet again. Going forward I’ll still keep the minimal file data for my backend testing simply for my sanity, but I’ve definitely learned my lesson on the importance of thorough research and representative testing.

As for fixing the issue, however, I had to find a compromise. I wasn’t entirely willing to abandon application/json in favour of formdata or another format (although migrating to form data is something I want to do in the future), and since buffers can’t be sent in json, I did still have to use base64; sending each file individually, although taking longer, does evade the “request too large” issue. From there the base64 is converted to a buffer when inserted into the database, and returned as a buffer. I did have some issues with the returned buffer that I tried brute-forcing through, only to find that I was corrupting my own data. The issue was that Sequelize does not return a plain buffer, but rather an object containing the buffer along with another attribute to say it’s a buffer (crazy, I know). The workaround was actually rather easy; I just had to define a custom method in my file model to return only the buffer. The buffer is then converted into base64 before being sent to the frontend. Finally, my file uploading works - without corrupted data.

That wasn’t the end of my problems, however. When writing the testing for editing an application or interview, the previously uploaded files weren’t available in the form, causing them to be deleted. It took a lot of trial-and-error, checking the files prop being passed and comparing to the state, figuring out where the heck they’re being lost. Turns out, since a page load is triggered when starting a test, the data fetching is also triggered. The application/interview itself loads in time, but not the files. The solution was pretty simple: Only load the pages once the data is loaded. I set a simple loading state up in useFetch, with the router (and all the pages) are only loaded once that state is set to true. I grabbed a basic css loader from css-loaders.com/ so it at least looks nice, and I was on my way.

CI/CD and Deployment

I was hoping updating my pipeline wouldn’t be too difficult. I first tackled end-to-end testing as I (later found to be erroneously) thought the pre-existing jobs would be fine as-is. My initial approach to the job was as follows: Start up postgres and redis images as in backend testing, making sure to run health checks; start the backend in test mode; start the frontend in dev mode; run end-to-end testing. I also threw in tag management at that time. None of the predeployment jobs succeeded thanks to some linting errors that my ESLint plugin failed to catch (this is why we include linting in predeployment!). I fixed the errors and reran the pipeline only to find my tests were still failing! The frontend was throwing a linting error I did not encounter when running the lint command in my own development environment, while the backend was failing to even start. Some research told me the issue within my frontend was caused by an out of date environment, so I updated the versions of all the third-party actions within my pipeline. The issue in my backend was actually a quite silly oversight; when I upgraded from Sequelize v6 to v7, I also changed the initialization from using the database url to its destructed components (the elements of it I needed, at least), yet I didn’t update my pipeline to to send those destructured components as env variables, so instead it was sending the full url. Once I fixed that (and updated a test in the frontend to match updated seed data) both the frontend and backend predeployment jobs succeeded. Unfortunately, the end to end testing did not.

I noticed when I was waiting on that final pipeline run where the predeployments succeeded was that the end to end testing was taking incredibly long; I knew it wouldn’t be fast considering the size of my test suite, but when I looked at the logs of the job, I realized it was stuck on starting the backend. Puzzled, I looked into it and realized where I went wrong; since a given job in the pipeline only runs in one instance of cli, that means only one foreground process can happen at once. Since I didn’t set my backend nor frontend to run in the background, the backend was running indefinitely doing nothing. I also learned that cypress provides an action to facilitate end-to-end testing, I included that in addition to adding the background flag to my backend and frontend start commands. Finally, I managed to start the cypress test suite yet it failed just as it began. The issue? It couldn’t find my web app. Innocently, I thought the frontend simply had no time to start, so I added a wait-on command to the cypress action to no avail. Apparently, the newer version of Node, including the one I was using, has difficulties recognizing localhost urls. One workaround was to use the 0.0.0.0 IP address in place of localhost. I initially tried to force this IP using the –host flag when starting my frontend and swapping it for local host in my wait-on, but that still didn’t work. Poking around the Vite documentation I learned of the host argument within the config. I added the IP address there, removed the wait-on command, and finally the end-to-end testing worked! All that was left was to deploy - and oh boy was that a doozy.

I initially attempted deployment as I had written before but that didn’t work; I had assumed it was due to the same oversight I faced when I was working on predeployment, so my first step was to try parsing the database url I had in Fly.io (that I had long since forgotten) into its components. That too did not work. I ended up recreating my production database taking care to write down all the details on paper before saving them as secrets, yet I was still facing issues. I was already latching on to any support forum I could find with a remotely similar problem and attempting their fixes: Declaring the host name, ensuring I could access the database via cli, and explicitly disallowing ssl. Nothing was working. Finally, I came across a forum that matched my issue and had attempted everything I did. The answer? The Node environment was undefined, therefore Sequelize was always using localhost as the host. I defined the Node environment as production in the dockerfile used for deployment, but trying to print it to the console does indeed show undefined. I declared a production Node environment within my fly.toml file and my app finally deployed!

The final issue in the deployment of my program lay in the migrations. Although they worked in development, for reasons I didn’t quite look into they were ignored in production. Reading Umzug documentation and conducting my own experiments with a local production build found that CommonJS is required for migration files. I then converted all of my migration files from ESM to CommonJS, deployed, and relished in a finally complete and deployed web application.

As a final change in my pipeline, it now requires a tag added to the commit message to deploy the current version. Previously, deployments were automatic unless a tag was added to the commit message to skip. This change allows me to push development dependencies without deploying the app, so deployments without any production-level changes are done.

Final Notes

Although the completed application bears resemblance to what I built years ago, the changes serve as a reminder of how much I have learned and grown. Undoubtedly another two or three years from now I will look back to this and see how far I have come even still. I recognize this application may not be perfect, and I am eager to learn more to make it even better. For now, there are still things left to be done such as migrating to React 19 and Express 5, implementing memoization, and reducing the frequency of server calls, among other things. I will without a doubt write on my experiences handling such tasks, but for now I shall relish in this project being in a working, production-ready state (even if it is just for me).