This project focuses on the classification of stars using data from the Gaia DR3 catalog, filtered to include stellar properties apparent magnitudes of photometric bands. The goal is to classify stars into their respective spectral types (O, B, A, F, G, K, M) based on these features.
We aim to develop a machine learning model capable of accurately classifying stars using their physical characteristics. The dataset includes stars of all spectral types with various features that will be leveraged to train a model and evaluate its performance on unseen data.
For this task, we use a Artificial Neural Network (ANN) to handle the classification of stars into spectral types. The ANN is designed to:
Below is a basic idea of our classification model. Further details on the architecture of the ANN is given in the Architecture section of the report. We chose an ANN for this as the input features we used were all numerical and an ANN was best to find patterns in such data.

We used a Support Vector Machine (SVM) for the baseline model. Below is the confusion matrix. It received an accuracy of around 70%.

The accuracy result for your primary model was approximately 84% out performing our baseline SVM model by 14%. The model’s performance metrics on the 200th epoch: Train error= 0.1644, Train loss= 0.4544, Validation error= 0.1613, Validation loss: 0.4500. Below are the Error and Loss Curves for Training and Validation.

We used the predicted spectral class from test data to get temperatures for each star and created our very own HR diagram based on it, as it is a great way to showcase and validate our model predictions.

We can clearly see the similarities between our HR diagram and an official one in the image below. There is a line down the middle that resembles the main sequence, which is where most stars tend to be, highlighted in blue. There is a grouping of high luminosity low-temperature stars highlighted in red, which are the sub giants, and an even higher luminosity and lower temperature grouping, which are the giants highlighted in yellow. This shows us that our model was able to predict classes that closely resemble an actual HR diagram, thereby proving that the model is accurate.
