Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.
Protein structure comparison aims to measure the structural similarity between two different proteins. It is a core infrastructure for structural biology and provides support for protein structure prediction , protein-protein docking , structure-based protein function prediction , etc. Considering the number of experimentally solved protein structures is increasing rapidly in the Protein Data Bank (PDB) and the accuracy of protein structure prediction has improved dramatically in recent years, e.g. AlphaFold2 approach , it is highly desired to develop fast and accurate protein tertiary structure comparison methods which could benefit structural homology discovery and other downstream structure-based analysis .
This project develops new data-driven techniques for processinglarge-scale video streams that exploit the structure and redundancy instreams (captured over days, months, and even years) to improve videoanalysis accuracy or reduce processing costs. In initial effortsfocused on capture and processing of egocentric video streams, projectactivities also include processing other forms of video streams, suchas stationary webcams, or sports broadcasts. While the focus of thisresearch is the design of core algorithms and systems, success standsto enable the development of new classes of applications (in domainssuch as navigation, personal assistance, health/behavior monitoring)that use the extensive visual history of a camera to intelligentlyinterpret continuous visual data sources and immediately respond tothe observed input. A further output of this research is thecollection and organization of two video datasets: an egocentric videodatabase from the life of a single individual, and the collection of anumber of long-running video streams (each from a single camera).
Scanner: Efficient Video Analysis at Scale. Early on in our efforts to process large amounts of egocentric video (e.g., the KrishnaCam project), we learned that we lacked system infrastructure support for this task. Simply put, many grad students did not have the system implementation skillset to do computer vision research on video at scale. The Scanner project (which is jointly supported by (and in fact the focus of) IIS-1539069, began as the result of this observation. A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field's ability to explore new applications that leverage big video data. In response, we have created Scanner, a system for productive and efficient video analysis at scale. Scanner organizes video collections as tables in a data store optimized for sampling frames from compressed video, and executes pixel processing computations, expressed as dataflow graphs, on these frames. Scanner schedules video analysis applications expressed using these abstractions onto heterogeneous throughput computing hardware, such as multi-core CPUs, GPUs, and media processing ASICs, for high-throughput pixel processing. We demonstrate the productivity of Scanner by authoring a variety of video processing applications including the synthesis of stereo VR video streams from multi-camera rigs, markerless 3D human pose reconstruction from video, and data-mining big video datasets such as hundreds of feature-length films or over 70,000 hours of TV news. These applications achieve near-expert performance on a single machine and scale efficiently to hundreds of machines, enabling formerly long-running big video data analysis tasks to be carried out in minutes to hours.
The compute power of the latest NVIDIA GPUs makes graph analytics much faster. Moreover, the internal memory speed within a GPU allows cuGraph to rapidly switch the data structure to best suit the needs of the analytic, rather than being restricted to a single data structure.
Some reported analysis procedures convert triglycerides into picolinyl esters, -acyl pyrrolidines, and DMOX derivatives (4,4-dimethyloxazoline) to enable detection by mass spectrometry. These derivatives are separated using a low-polarity column because of its high thermal stability . Gas chromatography-mass spectrometry (GC-MS) has been widely used for the structural analysis of FAs. However, the identification and localization of some structural features, such as hydroxyl groups, epoxy groups, branched chains, rings, and double bonds, are problematic. The derivatization of triglycerides as FA methyl esters could be used to identify certain types of branching but not additional methyl branches, double bonds, or other types of unsaturation because the structural information obtained from the mass spectra of functionalized unsaturated FA methyl esters is typically insufficient . The generated spectra do not provide sufficient information about these structures [20, 21], primarily because of the ionization of the double bonds in unsaturated FAs that occurs during electron impact .
Given the importance of FAs in foods and their health implications, this work considered two widely consumed foodstuffs, milk and oil. These foods possess similar triglycerides (TAG), which are essential for supplying FAs in the human body. The goal of this work was to assess three analysis methods for fats (mainly triglycerides) that are rapid and inexpensive and constitute valid alternatives to more sophisticated systems, such as liquid chromatography/electrospray ionization-mass spectrometry (LC/ESI-MS) , two-dimensional liquid chromatography/gas chromatography (LCxGC) , or silver reversed-phase and silver ion high-performance liquid chromatography-mass spectrometry (RF-HPLC-MS) . As stated above, the screening methods that are currently used to study fats are based on analysing triglycerides in terms of the triglyceride distribution in fat and analysing FAs after esterification with glycerol and release via transesterification.
To obtain a rapid analysis and preliminary assessment of the authenticity of the butter, a simple graphical overlay was constructed. The graphical comparison between the butter sample and the reference sample helped to exclude the possibility of an adulteration value of approximately 5%, which was the minimum threshold for any type of added fat. As shown in Figure 4, the adulteration of butter with 10% lard resulted in a significant change in the GC profile of pure butter (chromatogram not shown). This adulteration was particularly evident for triglyceride families with high molecular weights.
Separation of the triglycerides in butter on a high-temperature capillary column with a 65% phenyl methyl silicone stationary phase (RTX 65-TG) allowed application of the Precht method through integration of the triglyceride families, as well as refinement of the GC analysis, because the quantitative determination was performed for individual peaks. This separation was useful for obtaining immediate information from the method comparison chart: butter samples analysed in terms of concentration and GC analysis that were identical to a standard were compared to the simple graphical overlay. The graphics overlay of chromatograms provides visual evidence of any peak differences between the reference samples and adulterated samples; specifically, deviation in the GC profile of butter compared to an authentic standard indicates adulteration. The method proposed in this article allows detection of extraneous fats fraudulently added to butter in a simple, rapid, and precise way. Moreover, comparison of literature data on analysed butter samples revealed a narrow variation of characteristic values , indicating a close similarity between butter types from various parts of Italy and Europe or between samples taken in different seasons. These conclusions are in contrast to previous data reported in the literature assessing the variability of butter types relative to the provenance or supply of cows.
Figures 7(a) and 7(b) show gas chromatograms of pentyl esters (a) and methyl esters (b) of FAs from the same sample of butter. Numerous studies regarding the appropriate analysis conditions for FAs have revealed the weaknesses of the transesterification of triglycerides as methyl esters due to their high volatility, especially for butyric and caproic acids. For pentyl esters, the volatility of short-chain FAs is negligible, and quantitative determination is consequently more accurate than with methyl esters [36, 37]. For the pentyl esters of FAs, the initial and final column temperatures had to be increased slightly, but the other chromatographic conditions remained unchanged. The transesterification process was very simple and rapid and required the use of sodium metal for the preparation of the catalyst. Sodium pentanoate was prepared in pentanol. When using sodium metal, we took all necessary precautions required for its manipulation, in addition to the normal laboratory safety rules. Sodium metal is widely used in chemical laboratories and can be stored under paraffin indefinitely without causing explosions. The use of metallic sodium requires avoiding contact with water. In our procedure, metallic sodium came into contact with n-hexane and 2-phenyl ethanol and was completely consumed by the end of the reaction. 2b1af7f3a8