Bacterial protein domains with a novel Ig-like fold target human CEACAM receptors

Streptococcus agalactiae, also known as group B Streptococcus (GBS), is the major cause of neonatal sepsis in humans. A critical step to infection is adhesion of bacteria at mucosal surfaces. Though several GBS adhesins have been identified, the host receptor targets of these adhesins remain unknown. We report here that surface-expressed β protein from GBS binds to human CEACAM1 and CEACAM5 receptors. A crystal structure of the complex showed that the IgSF domain in β represents a novel Ig-fold subtype called IgI3, in which unique features allow binding to CEACAM1. Bioinformatic assessments revealed that this newly identified IgI3 fold is not exclusively present in GBS. Instead, the IgI3 fold is predicted to be present in adhesins from other clinically important human pathogens. We confirmed the interaction between CEACAM1 and the predicted IgI3-containing adhesin in two different streptococcal pathogens. Overall, our results indicate that the IgI3 fold could provide a broadly applied mechanism for bacteria to target CEACAMs.

Introduction sequence conservation amongst 57 GBS proteins identified through BLAST search (data not 201 shown), indicating that intrastrain variation in CEACAM5 binding is most likely due to 202 differential  expression. 203

Expression of human CEACAMs on epithelial cells enhances GBS adhesion 204
As CEACAM engagement leads to enhanced cellular adhesion of other microorganisms, we 205 hypothesized that CEACAMs could represent a novel cellular adhesion mechanism for GBS. 206 To analyze whether -expressing GBS can utilize human CEACAMs for cellular adhesion, 207 we tested the binding of strain A909 to well-characterised CEACAM-expressing HeLa cells 208 ( Fig. 4A) (Bos et al, 1998). Consistent with rCEACAM binding, a higher percentage of the 209 A909 inoculum was recovered from the CEACAM1-and CEACAM5-expressing HeLa cells 210 in comparison to all other HeLa cell lines (Fig. 4A). Increased binding of GBS to human 211 CEACAM1-expressing HeLa cells was confirmed by confocal microscopy (Fig. 4B). To 212 ensure that the CEACAM-observed binding was not influenced by the background of the cell 213 line, we assessed GBS adhesion to CEACAM-expressing CHO cells. (Hollandsworth et al,214 2020) Also in this case, a higher percentage of the GBS inoculum was recovered from the 215 CEACAM1-and CEACAM5-expressing CHO cells in comparison to all other CHO cell 216 lines (Fig. 4C). For unknown reasons, there was considerable variation in GBS adhesion to 217 the CEACAM1-and CEACAM5-expressing CHO cells. This variation could not be 218 attributed to unstable CEACAM expression, as our cell lines expressed CEACAM at 219 consistent levels ( Supplementary Fig. 5A and 5B). 220 To confirm that adhesion of the GBS strain A909 to the CEACAM1-expressing CHO cell 221 was dependent on  protein, we also assessed binding of the bac deletion mutant and 222 complemented strain in this same system. GBS adherence was abolished by mutation of the 223 bac gene, and the phenotype could be recapitulated by a complemented mutant (Fig. 4D). 224 This result was confirmed by confocal microscopy (Supplementary Fig. 5C). In agreement 225 with the results obtained with purified proteins (Fig. 2F), the binding of -expressing GBS 226 to CEACAM1-expressing CHO cells was inhibited by blocking the CEACAM1-N domain 227 with a specific mAb ( Supplementary Fig. 5D). Moreover, pre-incubation of A909 with 228 rCEACAM1-N, but not rCEACAM8-N, impaired adhesion to CEACAM1-expressing CHO 229 transfectants ( Supplementary Fig. 5E). This indicates that the CEACAM1-N and -IgSF 230 domains were responsible for the cellular adhesion phenotype. 231 To rule out the possibility that differences in adhesion of A909 wildtype and bac strains to 232 CHO cells reflected interstrain variation in growth rates during adhesion at 37 o C, we 233 developed an alternative assay in which we assessed adhesion of FITC-labelled GBS strains 234 to detached cell lines during incubation at 4 o C. A higher percentage of CEACAM1-235 expressing CHO cells were FITC-positive upon incubation with A909 in comparison to 236 control cells (Fig. 4E and 4F). However, this was not observed for CEACAM5-expressing 237 CHO cells, likely because the N domain of CEACAM5 was cleaved during the detachment 238 process as demonstrated by the inability of an anti-CEACAM5-N mAb to bind to these cells 239 (Supplementary Fig. 5A and 5B). As expected, adhesion of FITC-labelled A909 to the 240 CEACAM1-expressing CHO cell was abolished by mutation of bac ( Fig. 4G; 241 5F). Collectively, the data therefore indicate that binding to human 242 CEACAM receptors, via  is a novel cellular adhesion mechanism for GBS. 243 Crystallography reveals that the IgSF domain in  adopts a novel Ig fold, the IgI3 fold 244 To gain insights into -IgSF binding mechanisms, we aimed to solve its structure. We solved 245 the structure in complex with CEACAM1-N at a resolution of 3.0 Å (Fig. 5A). The asymmetric 246 unit contains two molecules of the (-IgSF)-(CEACAM1-N) complex. The -IgSF domain 247 has the characteristic features of an Ig fold, principally a pair of β sheets built of anti-parallel 248 β strands that surround a hydrophobic core (Fig. 5B). IgSF domains can be classified into 249 (variable) V-set, (constant) C-set or (intermediate) I-set, with differentiation based on the 250 number and placement of -strands between the conserved cysteine residue disulphide 251 bridge ( Fig. 5C) (Wang & Springer, 1998;Wang, 2013). The V-set Ig domain contains ten β 252 strands with four strands found on one sheet (ABED) and six strands on the other (A′ 253 GFCC′C′′). C-set Ig domains lack the A′ and C′′ strands, and are further grouped into C1 or 254 C2 based on presence or absence of the D strand, respectively. I-set Ig domains lack the C′′ 255 strand and are classified into I1 or I2 based on presence or absence of a D strand, 256 respectively. The -IgSF domain has two β-sheets labelled ABED and A′GFC, with sheets 257 connected by the BC, EF, CD and AA′ loops (Fig. 5B). Therefore, -IgSF has an I-set fold 258 topology that most closely resembles an I1-set domain (Fig. 5C). However, -IgSF lacks 259 cysteines and disulfide bridges that are characteristic for I-set folds. Furthermore, the -260 IgSF domain possesses a truncated C strand that is directly followed by 1.5-turn -helix 261 (Fig. 5C). These features were not observed in the structurally characterised IgI1 domains 262 in the DALI or PDBeFOLD databases that -IgSF most closely resembled, including 263 macrophage colony stimulating factor 1 (MCS-F) and intracellular adhesion molecules 3 264 (ICAM-3) (Fig. 5D, 5E & 5F). Therefore, the topology of -IgSF domain represents a  265   previously unrecognized IgI fold subtype, denoted here as I3-set Ig (IgI3) domain.  266 Accordingly, the domain in  will now be referred to as -IgI3. The unique features of this 267 domain are i) the absence of cysteine residues, ii) absence of C′C′′, and iii) a truncated C 268 that is directly followed by a 1.5-turn -helix. Of note, the unique -IgI3 region stretching 269 from C to D strand possesses protruding hydrophobic residues, such as F42, located between 270 C and -helix, L46 located in the -helix, and V53 located in the CD loop ( Fig 5G).  Table 4) of the complex of -IgI3 and CEACAM1-281 N at 3 Å resolution showed that -IgI3 binds to the A′GFCC′ face of CEACAM1 through 282 residues located in the C to D strand region including the -helix (Fig. 6C). Fig. 6B shows 283 the interacting residues on the surfaces of the two molecules. Specifically, L46 in the α-284 helix of -IgI3 interacts by van der Waals forces with F29 and L95 (distances 3.11Å and 285 3.61Å) located in the C and G strands of CEACAM1, respectively (Supplementary Fig. 6A; 286 Supplementary Table 5). In addition, the protruding -IgI3 residue V53 is within close 287 contact of I91 (distance 3.16Å, F strand) and S32 (distance 3.36Å, C strand) of CEACAM1, 288 D55 is within close contact of Y34 (C to C' loop) of CEACAM1, and F42 is in close contact 289 of L95 (distance 2.87Å, G strand) of CEACAM1 (Supplementary Table 5 Based on the co-crystal data, we hypothesised that -IgI3 residue L46 was critical for 293 contacting CEACAM1-N via F29 and L95 (Fig. 6C, Supplementary Fig. 6A & 6B). 294 Additionally, -IgI3 residue V53 appeared critical for contacting CEACAM1-N residue I91, 295 and -IgI3 residue F42 was critical for contacting CEACAM1-N residue L95. We generated 296 alanine mutations in -IgI3 at these positions as well as at several other sites that contact 297 CEACAM1. ITC binding studies of these -IgI3 mutants to 'wild-type' unglycosylated 298 CEACAM1-N reveal that the mutant -IgI3 L46A failed to bind to rCEACAM1-N. Three 299 additional mutants, -IgI3 L52A , -IgI3 V53A and -IgI3 D55A , had reduced affinity to bind 300 rCEACAM1-N (KD = 234±16, 562±44, 690±52 nM, respectively) (Supplementary Fig. 7A; 301 Supplementary Table 6). In contrast, -IgI3 F42A bound with higher affinity (KD = 16±15 302 nM). In addition, we tested the ability of unglycosylated rCEACAM1-N to bind DB coated 303 with the -IgI3 variants (Fig. 6D). In this analysis, rCEACAM1-N displayed significantly 304 reduced binding to DB coated with -IgI3 L46A and -IgI3 V53A , and significantly enhanced 305 binding to DB coated with -IgI3 F42A . Together, these data indicate that the L46, L52, V53 306 and D55 residues of -IgI3 are critical for the binding to CEACAM1. These residues were 307 observed to be conserved in 57  protein sequences (data not shown). 308 For CEACAM1-N, the crystal structure of the complex indicated that -IgI3 interacts with 309 the dimer interface, contacting several residues which are important for CEACAM1 310 homodimerization and for binding to other bacterial adhesins. We hypothesised that residues 311 F29, I91 and L95 of CEACAM1 are critical for contacting -IgI3  Table 6). Additionally, CEACAM1-N Q89A also lacked ability to bind -IgI3. 316 Two mutants, CEACAM1-N Q44A and CEACAM1-N L95A , resulted in a 10-fold decrease in 317 binding affinities (KD = 996±116 and 1350±460 nM, respectively) whilst two further 318 mutants, CEACAM1-N V96A and CEACAM1-N N97A , displayed only a modest decrease (4-319 fold) in binding affinities (KD = 370±4 and 490±120 nM, respectively). In addition, we tested 320 the ability of -IgI3 coupled to streptavidin to interact with the unglycosylated wildtype and 321 mutants rCEACAM1-N-coated DB (Fig. 6E). Binding was significantly reduced for 322 CEACAM1-N F29A , CEACAM1-N Q89A and CEACAM1-N I91A , as well as for CEACAM1-323 N Q44A and CEACAM1-N L95A , confirming that residues F29, Q89 and I91 are major targets 324 of -IgI3 binding. I91 has also been identified as a critical CEACAM1 residue for interaction 325 with M. catarrhalis (Conners et al, 2008), Neisseria spp. (Villullas et al, 2007;Virji et al, 326 1999), H. influenzae (Hill et al, 2001) and Fusobacterium spp (Wang, 2013). Though Q89 327 in CEACAM1 is critical for interaction with -IgI3 and other bacterial ligands, Q89 was not 328 in close contact with any -IgI3 residues. It is possible that mutation of CEACAM1 residue 329 Q89 forms a gap that I91 attempts to fill that subsequently prevents its interaction with in 330 -IgI3 residue V53. In summary, contact of residue L46 in -IgI3 with CEACAM1 residue 331 F29, residue V53 in -IgI3 with CEACAM1 residue I91 provide the critical interactions. 332 Additional stability is gained through -IgI3 residues F42 and D55. 333

Comparison of -IgI3-& HopQ-bound CEACAM1 structures 334
Comparison with the HopQ-CEACAM1 complex, the first shown structure of a bacterial 335 adhesin bound to CEACAM1, revealed that the same set of residues involved in dimerization 336 of CEACAM1 are contacted by both -IgI3 and HopQ ( Supplementary Fig. 6C). However, 337 HopQ is structurally completely unrelated to that of -IgI3 We predicted the tertiary structure of 11 representative homolog sequences based on the -363 IgI3 crystal structure. These structures maintained the overall IgI3 fold, including an -helix 364 and loop located between the truncated C strand and the D strand ( Supplementary Fig. 8), 365 except clade XV sequence from G. vaginalis that lacked the C strand. These data suggest 366 that the IgI3 structure is broadly distributed in bacterial cell wall-anchored adhesins? We 367 focused further analysis on clade II that maintains key IgI3 structures despite sharing only 368 40% amino acid identity with -IgI3 To determine whether the IgI3 domain in R28 interacts with CEACAM1, we purified rR28-382 IgI3 protein domain from E. coli and tested interaction with rCEACAM1. Beads coated with 383 biotinylated R28-IgI3 and -IgI3, but not HSA, interacted with rCEACAM1 (Fig. 7D). In 384 the reverse assay, beads coated with rCEACAM1 interacted with R28-IgI3 and -IgI3, but 385 not HSA, coupled to streptavidin (Fig. 7E). We also confirmed that rCEACAM1 bound to 386 an R28-expressing S. pyogenes strain (Fig. 7F). In addition, rCEACAM1 bound to an Alp3-387 expressing strain of GBS, which was expected given that R28 and Alp3 proteins are identical 388 in sequence. Taken together with the structural predictions described above, these data 389 indicate that IgI3 folds from a wider range of Gram-positive bacterial pathogens can interact 390 with human CEACAM1, but individual IgI3 domains may interact with human CEACAM1 391 differently. 392 To dissect whether -IgI3 and R28-IgI3 interact differently with CEACAM1, we simulated 393 the docking of IgI3 domains onto human CEACAM1 and examined the free energy 394 calculations to ascertain the key IgI3 residues in the complex formation ( Supplementary Fig.  395 9C). Residues with binding free energy contributions lower than -2.0 kcal/mol and greater 396 than 2 kcal/mol are identified as key residues and unfavourable residues, respectively. 397 Simulation of -IgI3 and CEACAM1 docking correctly predicted residues F42, L46 and 398 V53 were key determinants of CEACAM1 binding, whilst D55 was an unfavourable 399 determinant (Fig 7G; Supplementary Fig. 9D). The binding free energy of the (R28-IgI3)-400 (CEACAM1-N) complexes was strong (-31.02±6.70 kcal/mol, n = 50 simulations). In 401 contrast to -IgI3, residues located in the -helix of R28-IgI3 were not identified as key 402 determinants of CEACAM1 binding (Fig 7G; Supplementary Fig. 9E). Instead, critical R28-403 IgI3 residues (K48, I54, I56 and K59) were located in the CD loop and the D strand only. 404 Mapping of key residues onto the surface structure suggest that -IgI3 and R28-IgI3 target 405 CEACAM1 through different faces on the IgI3 fold ( Supplementary Fig. 9D and 9E) Additional biochemical analysis demonstrated that the interaction is of high specificity and 413 high-affinity. We solved the crystal structure of the Ig-like fold in  and revealed it represents 414 a novel Ig-fold structure related to the I-set. The Ig-like fold identified in  is characterized by 415 an absence of cysteine residues and the presence of a truncated C strand that is directly followed 416 by a unique 1.5-turn -helix. As this Ig-like fold of the I-set has unique features, it was 417 designated IgI3. The absence of cysteine residues has been reported in Ig folds, (Halaby & 418 Mornon, 1998) but has not been documented in IgI folds to date. In addition, the partial 419 The  protein of GBS is commonly expressed by serotype Ia, Ib, II and V strains (Lindahl et 472 al, 2005). It was recently shown that high β protein expression levels are associated with 473 increased virulence of GBS clinical isolates. (Nagano et al, 2006) As  protein binds the 474 inhibitory receptor CEACAM1, as well as the inhibitory receptors Siglec-5 and -7 that are 475 expressed on leukocytes, we speculate that dual-or multi-engagement of these inhibitory isothermal titration calorimetry (ITC) and crystallization were expressed using pET21d vectors 518 in E. coli and purified as previously described (Bonsor et al, 2015a). 519 Expression and purification of bacterial proteins 520 ,  and Rib proteins were purified from GBS cultures as previously described (Lindahl et  to 95 o C for 10 mins. Lysates were separated by SDS-PAGE in a 12.5% polyacrylamide gel at 544 270V, blotted onto nitrocellulose membranes and probed with anti-CEACAM1 (clone C51X/8) 545 mAb. Membranes were probed using rabbit anti-mouse-IgG-HRP and developed using ECL 546 substrate. 547

Measurement of  protein expression by GBS strains 548
Rabbit antiserum was raised against  protein as previously described. (Lindahl et al, 1990) Six 549 x 10 6 of mid-logarithmic phase bacteria were incubated with heat-inactivated 0.1% rabbit anti-550  (Fig 5B and 5C). We denote this as the I3-set 650 domain, and the domain with  protein as -IgI3. This was used to successfully position 651 sidechains, lock the registry and provide extra restraints during refinement in REFMAC5 with 652 ProSMART. The (-IgI3)-(CEACAM1-N) complex has been deposited to the PDB with the 653 entry code, 6V3P. Atom contacts in the (-IgI3)-(CEACAM1-N) complex interface were 654 identified using NCONT (CCP4) with a cut-off of 4.0Å (Winn et al, 2011). 655

Bioinformatics analysis 656
BLAST analysis of the -IgI3 amino acid sequence was performed to identify homologs in 657 bacterial proteins, and subsequently aligned using ClustalW. Phylogenetic analysis was 658 performed using Maximum Likelihood approach and 1000 bootstrap replications in MEGA 659 (Kumar et al, 2016) , and the resulting tree displayed with interactive tree of life version 4 660 (Letunic & Bork, 2019). The structure of IgI3 homologs was predicted using the -IgI3 661 structure as input in SWISS-MODEL (Webb & Sali, 2016). Only models passing a QMEAN 662 score of < -4.00 were further analyzed. Docking of -IgI3 structure or the R28-IgI3 predicted 663 structure to CEACAM1-N was simulated 50 times using ZDOCK server (Pierce et al, 2014), 664 in which CEACAM1-N residues F29, Q44, Q89, I91 and N95 were included as contact sites. 665 The free energy contribution of each simulation was interpedently calculated using MM/GBSA 666 analysis on the HAWKDOCK server (Weng et al, 2019). 667

Data Availability 668
The co-crystal structure of -IgI3 and CEACAM1-N is available at the PDB with ID: 6V3P. 669 All other data supporting the findings of this study are available within the paper and its 670 supplementary information files and are available from the corresponding author on reasonable 671 request. 672