Geographical patterns in the flora of Cambridgeshire (v.c.29)

Cambridgeshire data collected for the BSBI’s Atlas 2020 project include 347,496 records at monad (1 km) or finer resolution. We used these data to cluster taxa by spherical k-means to produce 21 clusters of taxa with similar patterns of distribution. Some of the clusters correspond to well-defined habitats such as chalk grassland, ancient woodland, traditional fenland, and saline riversides and roadsides. Other clusters were less expected, corresponding to arable clayland, washland (the Ouse and Nene washes), waste ground and garden escapes. There was a cluster of ubiquitous species and another of common arable weeds. The distributions of the clusters are displayed as coincidence maps. Some species are intermediate between two clusters. These can be recognised by their relatively poor goodness of fit to any one cluster. The clusters differ markedly in ecological attributes and whether they include rare or threatened species. We interpret these differences using Ellenberg values and the vascular plant Red List for England.


Introduction
For several years, two of us (MOH, CDP) have been interested in methods of clustering species distribution data. This has resulted in analyses at the scale of the European continent (Finnie et al., 2007), Britain and Ireland Preston et al., 2013), our local county (Preston & Hill, 2019) and meadows in a 25 km 2 area of Germany . On 31 December 2019, data collection for BSBI's Atlas 2020 project was completed, and the Cambridgeshire vice-county recorder (JDS) thought it would be interesting to look at patterns of distribution in the county. Such an analysis would complement the information in Alan Leslie's monumental Cambridgeshire flora (Leslie, 2019), where distributions are given only as lists of hectads.

Methods
Cambridgeshire's local Atlas 2020 data consist of 396,261 records of taxon occurrence for the period 2000-2019, held in a MapMate database. There are 48,408 records at tetrad (2 km) resolution and 357 records at quadrant (5 km) resolution.
The analysis was done on the 347,496 records at monad (1 km) or finer resolution. Records at finer resolution were reduced to monad scale for analysis.
As in our previous studies, we looked for patterns of distribution by dividing the species into clusters. Clustering was by spherical k-means , which is computationally demanding but logically simpler than the methods used in our earlier (Finnie et al., 2007;Preston et al. 2011) studies. The only information required is a list of species and locations where they are found-i.e. 'what?' and 'where?'. It is possible in principle to use dated records-i.e. 'what?', 'where?' and 'when?'-and the bryophyte clustering by Preston & Hill (2019) did just this. However, when dated records were used for vascular plants in Cambridgeshire, the resulting clusters were partly phenological and partly geographical. We therefore used purely geographical data. We experimentally made analyses with several different parameter settings. Most of these had small but irritating defects. In one analysis there was a cluster that combined hedge and wayside species as well as a cluster with only hedge species. In another analysis there was a cluster that combined species of chalk waysides with those of waste ground. Most clusters were essentially the same as those in our chosen analysis, but in the earlier analyses one or two clusters were unsatisfactory. These are not reported here. Our final choice of parameters was very similar to that used for British and Irish vascular plants by Preston et al. (2013), i.e. perpendicular clustering in 21 clusters with species weights 0.7 , had used 20 clusters with species weights 0.5.).
We had originally intended to make 20 clusters. However, it turned out that with 20 clusters, there was a knife-edge balance between splitting a cluster comprising species of waste ground and disturbed sandy ground, or splitting a cluster comprising species of pond-margins and those of washlands. We therefore chose to use 21 clusters, as we thought both of these splits were desirable.
The Cambridgeshire Atlas 2020 dataset includes records of 2405 taxa of vascular plants and charophytes. These were boiled down to 1245 taxa, by excluding all but 28 hybrids, ignoring most infraspecific taxa, and assigning aggregates such as Chenopodium album agg. to the species of that name. Two frequently-cultivated subspecies, Euphorbia amygdaloides subsp. robbiae and Lamiastrum galeobdolon subsp. argentatum, were distinguished from wild-type E. amygdaloides and L. galeobdolon. Cultivars of Daucus carota and Pastinaca sativa were excluded. Taxa found in fewer than 15 monads were excluded unless they were also listed in the plant attribute dataset PLANTATT (Hill et al., 2004). PLANTATT lists all native and archaeophyte species in Great Britain and Ireland, together with 261 alien neophytes. Nomenclature follows Stace (2019).
The monad dataset was further trimmed by excluding all monads with fewer than 20 species. Of the 1245 taxa analysed, 316 were found in fewer than 15 monads. These were designated as 'minor taxa' and given weight 1 in the analysis, whereas those in 15 or more monads were given weight 1000. As a result, minor taxa had no influence on the clustering, although they were given a place along with the 929 major taxa. The final dataset for clustering consisted of 209,650 records of 1245 taxa in 1865 monads. There were on average 112 taxa per monad.
Finally, we plotted distribution maps of the clusters, using tetrad occurrence records, which are comprehensive for the whole of the county. With the 1245 taxa used for the cluster analysis, there are 156,277 such records in 667 tetrads, corresponding to an average of 234 taxa per tetrad. With the 929 major taxa, there are 154,995 records, giving an average of 232 major taxa per tetrad. Only the major taxa are counted in the distribution maps.

Results
Cambridgeshire is a low-lying county in eastern England. The fens in the north occupy about half the land area (Fig. 1a) and Cambridgeshire's highest point, 128 m, is in hills to the south-east. Much more detail is given by Leslie (2019) and Preston & Hill (2019). The highest taxon densities are in Cambridge and Ely (Fig. 1b). The richest tetrad with 767 taxa is TL45J, which includes the British Antarctic Survey, workplace of JDS. The centre of Peterborough is in tetrad TL19Z, only 2% of which is in v.c.29, with 102 recorded taxa.  (Table 5) As in our previous analyses, the clusters are named by the species to which they are most closely aligned. These are called 'key species' (Table 1). The order of the clusters is defined by a hierarchy combined with an ordination. These are not shown here. Species in clusters 1-7 are dry, those in clusters 8-13 are ruderal and viatical, those in clusters 14-15 are widespread, and those in clusters 16-21 are wet. For each cluster, the three most closely aligned species are listed in Table 2. Table 1. Twenty-one clusters of species in Cambridgeshire, defined by their occurrence in monads, but reported by their occurrence in tetrads. The column labelled Tetrad sum gives the sum of tetrad occurrences for the major taxa in each cluster over the whole county. Full names of the key species are listed in Table 2 No Upland grass and arable In the Cambridgeshire context, 'upland' means non-fenland, i.e. > 5 m altitude. The distribution of clusters 1-4, the four non-woodland clusters is shown in Fig. 2. Cluster 1, Chalk grassland, is found at scattered sites along the diagonal band of chalk bedrock across the county, especially east of Cambridge. Here it is the remnant of an extensive sheepwalk that was ploughed up in the 19th century. The best site is the Devil's Dyke near Newmarket, which is a steep-sided Anglo-Saxon earthwork quite unsuitable for the plough. With an average of only 41 tetrads per major taxon, chalk grassland is one of the most localized habitats in the county and has notable minor taxa such as Himantoglossum hircinum, Linum perenne, Pulsatilla vulgaris and Seseli libanotis.
The species of Cluster 2, Chalk wayside, are much more widespread, averaging 106 tetrads per major taxon, and are distributed widely over the Cambridgeshire chalk, both to the south-west and north-east of Cambridge. Most, including the three most characteristic species, are scattered in other parts of the county, with a hotspot on old railway sidings in the north of the county near March. The main habitat is rough grassland and grassy banks, but there are several arable weeds such as Fumaria densiflora, F. parviflora, Legousia hybrida and Roemeria hispida. The minor taxa of this cluster are less numerous and less notable, although they include Orchis anthropophora, present in a disused chalk quarry. Cluster 3, Permanent grassland, is even more widespread, with 176 tetrads per major taxon and only four minor taxa. Almost all the species are herbaceous, the exceptions being Rhamnus cathartica, Rosa × dumalis, R. rubiginosa and Rubus caesius. Of the 49 herbaceous species, 40 are perennial. The others are the legumes Lathyrus nissolia, Melilotus officinalis and Trifolium campestre, the parasitic Odontites vernus and Rhinanthus minor, the Gentianaceae Blackstonia perfoliata, Centaurium erythraea and C. pulchellum, and the umbellifer Chaerophyllum temulum. Some of these are more frequent on disturbed ground than permanent grassland, but they are all rare in the fens. With only 17 major taxa, Cluster 4, Clayland arable, is the second smallest. Its occurrence is mainly on the clay soils to the west of Cambridge, with small outposts on the boulder-clay uplands to the south and east. Although 12 of the 17 major taxa are characteristic of arable fields, Cirsium eriophorum, Cruciata laevipes, Crepis biennis and Ervum gracile are found mainly on disturbed ground and field margins. Crataegus heterophylla is in Cluster 4, but almost all occurrences are planted. Rare plants in this cluster are Lathyrus aphaca and Lythrum hyssopifolia.

Woodland and garden escapes
The distributions of clusters 6-9 are shown in Fig. 3. Cluster 5 has similarities to cluster 6 but is considered later with the other clusters of particularly widespread species. The species of Cluster 6, Spinney & shaded hedge, are common, with 218 tetrads per major taxon and only four minor taxa. Most of them are characteristic of hedge bottoms and spinneys. Several of the most frequent are woody plants, for example Corylus avellana, Ligustrum vulgare and Rubus ulmifolius, which all occur in more than 400 tetrads. Herbaceous plants found in more than 400 tetrads are Alliaria petiolata, Geranium robertianum, Geum urbanum and Veronica chamaedrys. Two of the minor taxa in this cluster are Genista tinctoria and Inula helenium, the latter introduced to places in the countryside.
Cluster 7, mainly plants of ancient woods, is one of the most distinctive and localized, with an average 51 tetrads per major taxon. There are 43 major taxa and 24 minor taxa. Several of the minor taxa, notably Avenella flexuosa, Carex pallescens, Chrysosplenium oppositifolium and Oxalis acetosella are rare in Cambridgeshire although frequent in other parts of Britain. A few characteristic plants of ancient woods have also been planted elsewhere in the county so that their distribution is assigned to cluster 6 (Primula vulgaris) or 8 (Galium odoratum, Sorbus torminalis).
Cluster 8, consisting mainly of garden escapes that are found in Cambridge but not extensively elsewhere in the county, has 117 major taxa, 46 more than the next largest, Cluster 11, Waste ground, with 71 major taxa. This is a consequence of a policy of recording all species that have seeded themselves outside gardens and an intense concentration on Cambridge city. Its key species, Crocus tommasinianus, flowers in early spring and may perhaps have been missed elsewhere in the county.
The second species, Mycelis muralis, is still relatively more frequent in Cambridge than elsewhere in the county, although according to Leslie (2019) it has recently been reported more widely. The third species, Lonicera pileata, has 16 of its 33 monad occurrences (48%) from Cambridge. By contrast the closely related Lonicera nitida has 30 out of 133 (22%) of its monad occurrences there. L. nitida is poorly aligned to cluster 8 though still included in it, with alignment to clusters 6 and 9 nearly as large. Records of both species include planted hedges and garden throwouts, as well as bird-sown bushes. Even less closely aligned though still included is Cardamine flexuosa, of which Leslie (2019) writes "It is tempting to suggest that C. flexuosa is a rare native of damp habitats that has been spread latterly through human activity". Other presumably native species in cluster 8 are Asplenium trichomanes, Polypodium vulgare agg., Poa infirma (newly arrived), Polystichum setiferum (newly arrived), Saxifraga tridactylites, Trifolium micranthum, Viscum album and the rare Catabrosa aquatica and Epipactis phyllanthes. These are a very small proportion of the 179 species in the cluster. Cluster 9, consisting mainly of garden escapes with a wider distribution in the county, has 78 species. As with cluster 8, it includes a few species that are native somewhere in the county: Digitalis purpurea, Epilobium montanum and Salvia verbenaca, together with the ferns Asplenium adiantum-nigrum, A. ceterach (arrived 1967), A. ruta-muraria and A. scolopendrium. All of these plants are associated with human habitation in Cambridgeshire.

Disturbed and saline habitats
The distributions of clusters 10, 11 and 13 are shown in Fig. 4. Clusters 10 and 13 have nearly twice as many minor taxa as major taxa, a larger proportion than any of the others.
Cluster 10 consists mainly of species of open sandy ground. Except for Calluna vulgaris, Castanea sativa, Colutea arborescens, Cytisus scoparius and Ulex europaeus, all are herbaceous. Many of them are annuals or biennials. A few, such as Carex echinata, Juncus bulbosus and Stellaria alsine grow on wet ground near Gamlingay. These species are not recognizable as a distinct group because they are minor taxa, too rare to be included in the clustering process and grouped with the species of disturbed sandy ground because they occur on the Woburn Sands (Lower Greensand). Another small group in this cluster consists of plants introduced to land adjacent to building sites, for example Eriophorum angustifolium in south Cambridge and Scirpus sylvaticus and Sisyrinchium bermudiana in west Cambridge. Pteridium aquilinum is also in cluster 10, but it is not well aligned to the cluster as it occurs also in several of the ancient woods and in urban sites. Cluster 10 has three centres of distribution, Cambridge city, the Breckland fringe near Newmarket in the east, and Gamlingay in the west. The Newmarket and Gamlingay centres are there because of naturally sandy soils. The concentration of cluster 10 species in Cambridge is the result of sand and gravel being brought into the city for construction.
Cluster 11, Waste ground, has a rather wider distribution than cluster 10. The distribution includes northern sites resulting from brickpits near Peterborough, a disused railway marshalling yard at March and quarrying and construction in the city of Ely. The only woody plants are Alnus cordata, Hippophae rhamnoides, Populus trichocarpa, Rosa rugosa and Rubus tuberculatus. Almost all of the 27 species with alignment (cosine) greater than 0.45 are annuals or biennials, the exceptions being Asparagus officinalis, Sedum acre and Verbena officinalis.
Cluster 13, comprising species of saline habitats is a small one with only 10 major taxa, but also 19 minor taxa. Stretches of road are clearly visible in the pattern of its distribution, including the A14 and A1307 near Cambridge, the A142 near Ely and the A605 near Peterborough. All but three of the 29 species are halophytes, the exceptions being Cynodon dactylon, Ranunculus sardous and Sambucus ebulus. The first two are certainly salt-tolerant but the last of these is included because by chance its single location is close to the A1307 at Linton. Many of the minor taxa are very rare in Cambridgeshire, being confined to the naturally saline banks of the River Nene north of Wisbech.

Widespread species
The distributions of clusters 5, 12, 14 and 15 are shown in Fig. 5. These are the clusters with most of the very common species in the county, averaging 399, 280, 606 and 295 tetrads per major taxon.
Cluster 5, Verge, is broadly similar to Cluster 6 (Spinney & shaded hedge) but the species are more light-demanding and are much more strongly represented in the fens. It has no minor taxa. Most of the species are herbaceous annuals and perennials preferring relatively short turf, and occur in permanent grassland as well as on verges. 26 are herbaceous perennials, 17 are annuals or biennials, and 9 are woody plants. The most frequent species, found in more than 500 tetrads, are Cerastium fontanum, Festuca rubra, Holcus lanatus, Medicago lupulina, Rosa canina and Trifolium repens. The trees are Acer campestre, Malus domestica, Prunus domestica and Quercus robur, which are often planted on roadsides.
Cluster 12, Roadside, is perhaps the least distinctive cluster. Only Bellis perennis occurs in more than 500 tetrads. Its distribution is similar to that of cluster 5, but with a lower density in the north-western fens. This region is the most sparsely inhabited part of the county, and many Cluster 12 species are found in the proximity of houses and gardens. Examples include Aegopodium podagraria, Aesculus hippocastanum, Buddleja davidii, Erophila verna, Ilex aquifolium, Iris foetidissima, Pentaglottis sempervirens, Sagina procumbens, Taxus baccata and Viola odorata. Of the 58 species, 17 are annuals or biennials, 22 are herbaceous perennials, 7 are bushes and 10 are trees.
Cluster 14, Ubiquitous, includes 36 species that were found in more than 600 of the 667 tetrads. The 30 most frequent of these are listed in Table 3. Fourteen species were found in 500-599 tetrads, and two, Arctium lappa (389 tetrads) and Ballota nigra (493 tetrads) in more than 300 tetrads. The cluster includes one minor taxon Carex vesicaria, whose two sites on the southern edge of the fens fit into no standard pattern. The major taxa comprise 20 annuals or biennials, 20 perennial dicots, 6 perennial grasses and 6 woody species. There are two trees, Acer pseudoplatanus and Fraxinus excelsior, and the climber Hedera helix.
Of the 59 species in the Arable cluster 15, 50 are either arable crops or arable weeds, all except Solanum tuberosum and Sonchus arvensis being annuals. Seven of them occur in more than 500 tetrads: Chenopodium album, Lepidium coronopus, Matricaria discoidea, Papaver rhoeas, Polygonum aviculare, Sonchus arvensis and Tripleurospermum inodorum. Hordeum murinum and Lactuca serriola also occur in more than 500 tetrads, but are less closely associated with arable fields, occurring on tracksides and disturbed ground. Other species normally found by roads, tracks and ditches are Armoracia rusticana and Symphytum × uplandicum. Neither of these is well aligned to cluster 15. Amoracia rusticana is almost equally aligned to the Ubiquitous cluster 14 and S. × uplandicum to the Waste ground cluster 11. Cluster 15 also includes four tree species that are widely planted as windbreaks in the fens: Cupressus × leylandii, Populus alba, P. × canadensis and P. nigra.

Species of moist and wet habitats
The six clusters of species found in moist or wet habitats are shown in Fig. 6 (clusters 16-19) and Fig. 7 (clusters 20 and 21). Cluster 16, Traditional fenland, includes species of fen and fen meadow, many of which are very rare in the county. It has 40 minor taxa and 37 major taxa. Most species in the cluster are herbaceous perennials that grow on unshaded wet ground.
Many, including the key species Carex panicea, are of low stature, although C. paniculata and Cladium mariscus are not. Only one major taxon, Frangula alnus, is a shrub and only Isolepis setacea is an annual. Perhaps surprisingly, both Cirsium palustre and Neottia ovata belong in this cluster; both are not well aligned to it and are nearly as well aligned to cluster 7, Ancient wood. Of sites in tetrads with more than 10 major taxa (Fig. 6a) only one, a former gravel quarry with open marshy grassland, is in the north of the county. Wicken Fen, occupies four tetrads and is truly a traditional fen. Chippenham Fen near Newmarket is an excellent wetland, but was not historically managed as a fen (Preston & Hill, 2019). Most other sites with more than 10 major taxa are variously termed fens, moors or meadows, and are remnants of marshes that have been partially drained or are managed as wet meadow. One site is an ancient wood with some wet meadow and another is a restored wetland which was formerly arable.
The species of cluster 17, Wet grassland (Fig. 6b), are much more widespread than those of cluster 16. There are 37 major taxa and only two minor taxa. The species are generally coarser than those of cluster 16 and a good many occur either in or by water. There is only one annual, Juncus bufonius. The woody plants are Populus × jackii, P. tremula, Ribes nigrum, Salix purpurea and S. × reichardtii. The rest are herbaceous perennials, of which Carex acutiformis, Glyceria × pedicellata, Helosciadium nodiflorum, Lemna minuta, Nymphaea alba and Veronica beccabunga are more or less strictly aquatic. There is much overlap with cluster 18. The main difference in distribution is that cluster 17 is more frequent in the south of the county, while cluster 18 is more frequent in the north.

Figure 6. Distribution of clusters 16-19, showing numbers of major taxa recorded in each tetrad: (a) Traditional fenland, (b) Wet grassland, (c) Reedbed & ditch bank, (d) Pond margin & streamside
Cluster 18, Reedbed & ditch bank, comprises most of the taxa that can be found in and by almost every ditch and drain in the fens, but are scarce or absent on higher ground in the south, especially on the chalk. There are no minor taxa. The cluster includes four annuals Brassica nigra, Galeopsis bifida, Ranunculus sceleratus and Torilis arvensis, which have their distribution centred on the fens together with the biennial Carduus nutans. Eight woody taxa with a mainly fenland distribution are also included: Alnus glutinosa, A incana, Populus 'Balsam Spire', Salix alba, S. cinerea, S. viminalis, S. × fragilis and S. × smithiana.
Cluster 19, Pond margin & streamside, is more strongly aquatic, with a concentration of records near Peterborough, from ditches, washland and disused brickpits, and from the Ouse and Cam valleys further south. Of the major taxa, 34, including Chara aspera and C. vulgaris, are aquatic. There are four less aquatic major taxa: Galium palustre, Juncus effusus, Juncus subnodulosus and Juncus tenuis. The cluster also includes Salix pentandra, which according to Leslie (2019) is always planted. Four of the minor taxa are also aquatic, but Oenanthe silaifolia and Sagina nodosa, both nearly extinct in the county, are more terrestrial. The distribution of the species in cluster 20 (Fig 7a), Washland, is strongly concentrated on the Ouse and Nene washes, which are extensive grasslands that are flooded for much of the winter. There is a subsidiary concentration in Wicken Fen. The cluster is similar to cluster 19, and indeed was not distinguished from it in some of the trials with 20 clusters. Alopecurus geniculatus and Oenanthe aquatica, Potamogeton pusillus and Thalictrum flavum are particularly close to cluster 19 because they not so concentrated on washland. Unlike cluster 19, most of the cluster 20 species are not aquatic, the exceptions being Potamogeton pusillus, P. trichoides and Utricularia vulgaris. There are no woody species. The key species Rorippa palustris and several others are annuals that grow on drying mud.
The map (Fig. 7b) of cluster 21, Riverine aquatic, shows a similar pattern to that of cluster 20 but with the addition of the rivers Ouse and Cam above and below Ely. This is the most aquatic of all the clusters, the only non-aquatic major taxa being Impatiens capensis, Rumex hydrolapathum, Salix triandra, Sonchus palustris and Stachys palustris. According to Leslie (2019), S. triandra is almost always planted, particularly along the county's main watercourses. This explains why it is in cluster 21.

Clustering by monads
The use of monad data to define the clusters and of tetrad data to plot them was an adaptation to the structure of our Atlas 2020 dataset. The tetrad data were complete for the county, and we tried clustering them directly. In the event, the tetrad scale was just a bit too coarse to pick up some of the interesting geographical differences between the species.

Goodness of fit
In general, the commoner species are better aligned to the general trend for each cluster. This relationship can be seen by plotting the cosine measure of alignment against the square root of the number of monads where the species is present (Fig.   8). In theory, a species with monad count x, should have cosine goodness of fit Goodness of fit = √( Σ j a ij 2 x ) / Σ j a ij where a ij is the count of species in cluster i in monad j. This theory applies to a 'standard' species, for which the probability of being found in a monad is proportional to the total number of cluster i species in that monad. If a species has a marked preference for the richest (for that cluster) monads it will have a higher goodness of fit. If it is more frequently found in less suitable monads it will have a lower goodness of fit.
For a very widespread cluster such as Arable weeds (Fig. 8a), the relation between goodness of fit and square root of frequency is almost perfect. That means that these species occur as a group, differing mainly in their frequency and not having monads where they are strongly nested together. The most frequent species lie slightly below the line, and the trees Cupressus × leylandii, Populus × canadensis and P. nigra also lie below it. These are very close to the Ubiquitous cluster. In the other direction, Amaranthus bouchonii, A. retroflexus, Chenopodium ficifolium, Chenopodiastrum hybridum, Descurainia sophia and Persicaria lapathifolia are more strongly aligned to the Arable cluster than the average. The Washland species (Fig. 8b) are less close to the theoretical trendline. The two species with highest cosine are Bidens tripartita and Rorippa palustris, which are noted particularly from the Ouse Washes (Leslie, 2019). In the other direction, Potamogeton pusillus, P. trichoides, Oenanthe aquatica, Stellaria aquatica and Thalictrum flavum are quite close to the Pond margin & streamside cluster and have lower goodness of fit than the theoretical 'standard'.
The Ancient wood species (Fig. 8c) are even less clustered about their trend line. The rarer species are mostly above it, while the commonest species are below it. Indeed Mercurialis perennis (the commonest species, present in 220 monads) and Orchis mascula (a rare species, present in 27 monads and strictly confined to ancient woods) have nearly equal cosines, 0.67 and 0.68 respectively. The commonest species are all quite closely aligned to the Spinney & shaded hedge cluster, whereas habitat specialists such as Paris quadrifolia and O. mascula show no such affinities.
The Chalk grassland cluster (Fig. 8d) shows a similar division into specialists and generalists. The three commonest species, Briza media, Bromopsis erecta and Linum catharticum, have affinities with both the Chalk wayside and Permanent grassland clusters, as do the less frequent Clinopodium acinos, Cynoglossum officinale and Lithospermum officinale. Along the top edge of the scattergram are the habitat specialists Asperula cynanchica, Koeleria macrantha, Tephroseris integrifolia, Thesium humifusum and Thymus drucei.

Comparison with Ellenberg values
Few vice-counties can have less environmental variation than Cambridgeshire. It shares its low altitude with the other counties of south-east England, but it differs from most in the overwhelmingly calcareous nature of its geology and soils. This is apparent from the weighted mean Ellenberg values for the clusters (Table 4). The Ellenberg values for the 20 British and Irish clusters given by Preston et al. (2013) are unweighted means, and so they are not exactly equivalent, but seven of the national clusters have mean R <6, compared to one (Open sandy ground) in Cambridgeshire. The county is also dominated by arable land, and farmers have endeavoured for centuries to improve the fertility of its soils. Only four of the clusters have Ellenberg N <4, compared with 11 in the national analysis. The least fertile Cambridgeshire clusters are Chalk grassland, Open sandy ground, Traditional fenland and (rather surprisingly) the relatively widespread Permanent grassland cluster. The highest Ellenberg N values are for species in the Ubiquitous, Arable weed, Reedbed & ditch bank, Washland and Riverine aquatic clusters. The high nutrient levels of the aquatic habitats are notable. Earlier research has shown that all aquatics with a preference for nutrient-poor conditions have been lost from the River Cam and its associated habitats in Cambridge city (Preston et al. 2003).
The Ellenberg L values for two Cambridgeshire clusters, Ancient wood (5.0) and Spinney and shaded hedge (5.3), are much lower than those for any of the British and Irish clusters. This reflects the absence of any woodland habitat clusters in the earlier analysis. Woodlands are too widely distributed at the hectad scale of that analysis for any woodland pattern to be detectable. In contrast, in a county as sparsely wooded as Cambridgeshire the woodland pattern is very distinctive at the monad scale. Similarly the analysis of the bryophytes of Cambridgeshire in tetrads (Preston & Hill, 2019) identified Woodland and Shade clusters which have similar distributions to the Ancient wood and Spinney clusters in this study.
Occurrence of threatened species in the clusters A total of 96 species in the analysis are treated as Vulnerable (VU), Endangered (EN) or Critically Endangered (CR) in England by Stroh et al. (2014). These range from the classic rarities for which Cambridgeshire has long been noted, such as Hypochaeris maculata, Melampyrum cristatum and Selinum carvifolia, to seven species that have been recently introduced. Four of the latter were planted in landscaped housing developments (Eriophorum angustifolium, Leersia oryzoides, Lysimachia thyrsiflora, Scirpoides holoschoenus). Orchis simia was introduced with turf to a garden. Carex depauperata has spread within the Cambridge Botanic Garden from cultivated plants. Lolium temulentum was sown in an arable field.
The Ancient woodland cluster is exceptional as it is a cluster of predominantly native species occurring in the county in few tetrads but with very few threatened species (3%), only Melampyrum cristatum (a woodland edge or 'circumboscal' species) and Neottia nidus-avis. The other clusters with a low proportion of threatened species are widespread clusters or are dominated by non-native species. It is notable that despite the presence of threatened arable weeds in both the Chalk wayside and Clayland arable clusters, there are none in the Arable cluster itself, which is largely composed of very common species.
Cambridge city in a Cambridgeshire context Cambridge and its immediate surroundings have been recorded more thoroughly in all the major floras of Cambridgeshire because their authors have always lived in the city. This pattern is apparent in the map of localities where John Ray recorded plants in the 1650s (Oswald & Preston, 2011) and it has been as true of bryophytes as it is of flowering plants (Preston & Hill, 2019). In the 20th century grid square recording (hectad then tetrad) ensured at least a minimum of coverage of the entire county. The impact of this can be gauged from the report of a meeting held in Wisbech in 1959 to obtain records for the first Atlas of the British Flora. This was "a part of Cambridgeshire hardly visited botanically in the past 100 years. Though many botanists have lived in Cambridge during those years, few have reached further north than Ely because communications are bad and the attraction of the chalk and the remains of the Fens which fringe it are so great" (Perring 1961).
One might expect that coverage would have become more even in recent years, as the road network in the county has improved. However, there are complicating factors. In 1959 only native species and well-established aliens would have been of interest to botanists, and this limited the number of species which could be recorded in the Cambridge area, especially as the concentration was on hectad records. Since 2000 the most active flowering plant recorders have recorded all garden escapes, however casual, and some have also recorded street trees. This means that there is little limit to the number of potential records in the city. Reviewers of the recent Flora have been struck by the detailed attention that the urban flora of Cambridge has received (Broughton, 2020, Sanford, 2020. It is perhaps less apparent in this paper because minor taxa have been given a low weight and many very rare neophytes were excluded from the dataset analysed. Table 5 compares the average number of species in the tetrads of the 'NatHistCam' Cambridge area with those from upland and fen tetrads. The 'NatHistCam' area is an 8 × 8 km square which has been recorded intensively from 2010 to 2019 by members of the Cambridge Natural History Society (Hill, 2016). It includes a range of habitats including chalk in the south-east, Cambridge city, the River Cam and its riparian commons in the centre and Gault Clay soils with some gravel deposits in the west. The flora of this urban area comprises species from a wide range of clusters including those of grasslands, roadsides, waste ground, shaded habitats and the commoner arable species, as well as garden escapes.
Remarkably, the average number of species in the Cambridge tetrads is higher than that in the other upland tetrads and in the fen tetrads for every single cluster. This must in part reflect the diligent recording (at monad level) of the Cambridge area, but also the lack of diversity in many of the tetrads in the wider countryside where the land-use is predominantly arable. The discrepancy between the mean tetrad totals between Cambridge and the county is of course greatest for the Garden escapes Cambridge cluster, where the mean value is almost 9 times as high, followed by Open sandy ground and Garden escapes general. The innumerable gardens in the city, and the imported sand used in recent building developments, mean that the city tetrads are generally richer in these species, although the intensive recording in Cambridge has doubtless exaggerated the difference. Waste ground is also relatively well represented. The discrepancy between city and county is least for the Ubiquitous species, which are so frequent and easily identified that almost all are known from all tetrads, followed by two other clusters of common species, Verge and Arable, and by the Reedbed & ditch bank and Riverine aquatic species, where the available habitat limits the number potentially present in Cambridge, however assiduous the recording.  are scattered in shady places. Circaea lutetiana is frequent in private gardens and V. riviniana is common there and often cultivated. Lonicera periclymenum and Rosa arvensis are sometimes planted. Lonicera periclymenum may also be bird-sown from cultivated plants in gardens. Finally, it should be noted that most of the plants that are missing or very rare in Cambridge are water plants such as Myriophyllum verticillatum, Nymphoides peltata and Spirodela polyrhiza. However Cirsium palustre, Lithospermum officinale and Sison segetum are also notable for their absence.

1.
Plant distributions in Cambridgeshire display characteristic patterns, which can be summarised by grouping them into species clusters. Goodness of fit to a cluster is measured by the cosine of the angle between a given species distribution and the average distribution of the species in the cluster. Within clusters, cosines show an approximately linear relation to the square root of the number of monads in which a species is found.

2.
Floristic data collected at the monad scale show clear distinctions between habitats in Cambridgeshire despite the relative uniformity of the county's environment. The assignment of rare taxa to species clusters was generally correct, but, as in the case of the (planted) Eriophorum angustifolium, it was sometimes misleading.

3.
Most of the clusters mapped at the tetrad scale show distinctive distributions. However, several clusters with very common species look broadly similar at the tetrad scale, even though their component species are ecologically distinct.

4.
The patterns revealed by the coincidence maps are followed in varying degree by the species in a cluster. Some species such as Lithospermum officinale are intermediate between two or more clusters. Others such as Asperula cynanchica and Orchis mascula are strict habitat specialists, which can be recognised by their higher than expected goodness of fit.

5.
Most of Cambridgeshire is alkaline and highly fertile, with the result that Galium aparine and Urtica dioica are the two commonest species. When judged by Ellenberg values, the only cluster with calcifuge species is Open sandy ground. There are five relatively infertile clusters (Ellenberg N < 5.0), of which Chalk grassland is the most extreme. 6.
English threatened species are most numerous in Chalk grassland, Open sandy ground and Traditional fenland. Ancient woodland has only two such species. 7.
Cambridge city is generally the richest part of the county, typically with more than twice as many species per tetrad as the rest of the county. However, numerous aquatic plants of fenland and specialists of ancient woodland are absent from the city area.