Clustering heterogenous groups based on similarity of heterogeneity

#1
Disclaimer: I am not a stats major, and would love it if people shred my question to bits if it contains any obvious logical flaws. I am not a native English speaker, but I try my best to be concise.

The reason why I am writing here is to get to the proper statistical lingo/jargon to better communicate the procedures that I want to do when I talk to potential statistician collaborators. I do not want to sound dumb.

From a large biodiversity database (from India), I am trying to identify different types of ecosystems by doing the following (in order):

1. clustering species by geographical position (latitude/longitude) at which they were spotted.

2. for each cluster, collecting varied metadata for each species ('predator', 'pollinator', 'camouflage', 'pink' etc.) for which such data exists. Hence, 'heterogeneous clusters'. I will call the above characteristics as 'attributes' of the clusters, and the collection of these attributes as the 'content' of the clusters. Each attribute also gets 'frequency' in each cluster ( if there are 3 predators in that cluster then the frequency of attribute 'predator' is 3). I know I am deviating from the statical definition of the word frequency, but I cannot think of another word right now.

3. somehow generating statistical models of different types of ecosystems by grouping the above clusters based on the content similarity. Similarity should be measured not only in terms of the number of attributes of clusters, but also attribute frequencies within cluster.

For the part 3, is there a statistical lingo that better communicates what I am trying to do? Are there any techniques that will help me get through the initial portions of this analysis? Will machine learning help?

Thank you for your help.