In modern production platforms, large scale online learning models are applied to data of very high dimension. To save computational resource, it is important to have an efficient algorithm to select the most significant features from an enormous feature pool. In this paper, we propose a novel neural-network-suitable feature selection algorithm, which selects important features from the input layer during training. Instead of directly regularizing the training loss, we inject group-sparsity regularization into the (stochastic) training algorithm. In particular, we introduce a group sparsity norm into the proximally regularized stochastical gradient descent algorithm. To fully evaluate the practical performance, we apply our method to Facebook News Feed dataset, and achieve favorable performance compared with state-of-the-arts using traditional regularizers.