Performance evaluations in interviews are central to employment decisions.We combine two field experiments, administrative data and video analysis tostudy the sources of gender gaps in interview evaluations. Leveraging 60,000 mock interviews on a platform for software engineers, we find that code quality ratings are 12 percent of a standard deviation lower for women. This gap persists after controlling for an objective measure of code quality. Providing evaluators with automated performance measures does not reduce gender gaps. Comparing blind to non-blind evaluations without live interaction reveals no gender gap in either case. In contrast, gaps widen with longer personal interaction and are larger among evaluators from regions with stronger implicit gender bias. Video analysis shows that women apologize more; and interviewers are more condescending and harsher with them. Both correlate with lower ratings. Our findings highlight how interpersonal dynamics can introduce bias into evaluations that otherwise rely on objective metrics.